Submitted by spicer2 t3_11jvqtb in dataisbeautiful
Comments
kompootor t1_jb507nj wrote
This is just an awesome idea all around, a strong visualization, and I look forward to seeing more expansions on it. Now question/critique/suggestion:
Crucially, over what years is the data taken? On the IAAF Toplists you can get a season cross-section of athletes' PRs (Personal Record, i.e. the athlete's PB that got officially recorded), or you can get all-time PRs through apparently 1899 (although prewar data on this will be terrible, especially for women), of which perhaps either the PRs from the year of the record until now are a better pick per event, or maybe it's better to compare the cross section of PRs at the year the record was won (I don't know offhand). Regardless, the data date range(s) should be put on the graph, and I really think the year of each record should be added in parentheses for each event as well, since that also hints at how big of a statistical outlier that record may have been.
Are those events that were selected with highest z-scores chosen with respect to mens' or womens' events exclusively, or a mix of the two? As it's ambiguous enough about it (it doesn't say "Top 20 events with most dominant records" or something) it seems safe to eyeball a mixed set, but it would still be nice to note.
I agree with the general sentiment that it would be nice to have the list sorted by z-score, but of course that's impossible to do for both men and women in this visualization while keeping parity/sanity of events and thus neatness. It is possible, however, in another graph format, that one may consider playing with in future (as I'm not sure how effective it would be comparatively): You take a 2D x-y graph with mens' event records on the x-axis and womens' records on the y-axis, each overloaded for different record unit types (such that you will have adequate spacing between dots if you just plotted mens' records as dots on the x-axis). Then each event gets a corresponding (x,y) point with a label; the z-scores are indicated by the label and the point having a shape sized correspondingly in the x and y (or else be simply two small bars). Then to read mens' records ascending you follow the dots left to right, and for womens' you follow the dots bottom to top. Just one possibility that someone could do with a dataset like this.
If you do another chart, I'd personally also be interested in some of the most vulnerable athletics records to be put up in the same chart, for comparing something of a baseline. Another idea for comparison, but not as useful and so better for a separate chart, would be an identical visualization using the data between the years when a very famous WR was held, such as one set by Jesse Owens at the 1936 Olympics, or Roger Bannister's 4-minute mile.
kompootor t1_jb54i2q wrote
Sidebar comment from the above in case anyone is interested further: the prewar data that is easily available is terrible, but a lot of it is still out there, poorly summarized in disparate sources. For womens Olympics history, the 1922 Women's World Games, aka the Women's Olympic Games (but confusingly for those trying to research it took place at a very similar time and place to the 1922 Women's Olympiad, with several of the same athletes, and yet several events having just slightly different lengths). Afaict the sources that will be most likely to hold the final incomplete data in the medalist table are a bunch of old contemporaneous Russian-language sports magazines that would most likely be in a national museum or archive in Ukraine or Russia or perhaps another state that has had Russian as a major language. Another interesting thing to look at is the athletes. Mary Lines shows up everywhere as a multi-sport international athlete of the time, but she has a very sparse bio on Wikipedia, as what's widely available on her seems to be poorly cited and/or difficult to otherwise plausibly verify. But a lot of these women (pseudo-)Olympians potentially have very interesting stories, especially those who did not start as, or who were not currently professional tennis players (tennis was basically the only respectable "get-sweaty" sport for women at the time, but stuff like archery and lawn sports were also big).
The politics and logistics of the WWG and early women's Olympics are fascinating too, since the regular Olympics at the time was not at all the quintessential transnational institution it is today. They all struggled with just the basics of funding at all levels, even just to get the necessary grants to bring all of the world's (aka Europe's) top athletes to a single location, so either the women's games could have been viewed by the IOC as a potential popularity/legitimacy booster for the Olympics, or it could be viewed as a competitor for scarce resources and thus an existential threat.
That's my pitch for some obscure sports history. And if you want to do further reading on this or any such topic, I strongly recommend complementing your learning with cited edits to Wikipedia -- that's how I was able to type almost all of the above (and on much much more) from memory still, even though my edits to these topics come from several years ago. Protip: in most cases don't engage in arguments on the site -- just walk away.
RussGOATWilson t1_jb69ltj wrote
This is interesting but can you do the same thing for prior records? E.g. Michael Johnson's 400m record z-score compared to van Niekirk's?
GinandTonicandLime t1_jb6f83q wrote
Hopefully in before legions of Australians ask about Don Bradman
Karnex97 t1_jbxltjg wrote
Hey OP can you do the same thing except only account for results set after 1990. The reason for that is that before 90s doping was legal. Of course there will always be some doped times on the list even after 1990 but percentage would probably drop significantly.
Rugfiend t1_jb4wnq1 wrote
Absolutely incredible that the long jump world record has only been broken once in 55 years (poor Carl Lewis - virtually unbeaten at long jump in a decade, did finally beat Beamon's record, one jump after Powell just beat it)
LSeww t1_jb4j87c wrote
It's "Niekerk" not "Viekerk" You had one job op
3trackmind t1_jb4mlqz wrote
You should change the title to say Track & Field. So many more athletic events…
JPAnalyst t1_jb4tzqv wrote
The term “athletics” with an “s” at the end is synonymous with track and field events. By definition, that term excludes rowing, judo, baseball, basketball, archery, and any other non track and field event. OP used the term properly and accurately.
https://en.wikipedia.org/wiki/Athletics_at_the_Summer_Olympics
3trackmind t1_jb4x1v3 wrote
Thank you very much. I had no idea on the distinction when the “s” is added.
I’m one of those apes from the U.S. At least Wikipedia gives me a bit of cover:
> In much of North America, athletics is synonymous with sports in general, maintaining the historical usage of the term. The word "athletics" is rarely used to refer to the sport of athletics in this region. Track and field is preferred, and is used in the United States and Canada to refer to athletics events, including race-walking and marathon running (although cross country running is typically considered a separate sport).
Flashy-Mcfoxtrot t1_jb4rhmk wrote
It is all of the current athletics disciplines from the olympic program. Which ones are missing? The only one i can think of is 50km walk, but i expect that to be because it has been scrapped after Tokyo.
[deleted] t1_jb4zbuk wrote
When I looked at this I think about the evolution of performance over time and whether we will eventually hit an apex of training, nutrition and the physical limits of the human body. Is there a ceiling to performance that we will see or is it plausible that we’ll just continue to see it improve?
I don’t mean this as an open ended qualitative question, more so about the performance data and what it tells us directionally is most likely to occur.
HallucinogenicPeach t1_jb52lwh wrote
I also wonder whether there’s a ceiling to how far we can go without performance enhancers. I think about it with technology and other areas too - there has to be a limit surely.
neurodiverseotter t1_jb5lius wrote
There is, in both cases. We have already identified some of the biological factors that make certain athletes excel: a mutation that makes the body more resistant to lactate acidosis for example, or a certain bone/tendon structure specifically optimized for running and so on. We also know that there a threshold when additional muscle doesn't add to more efficacy anymore due to problems in blood circulation and so on. Drugs and surgery can only change some biological parameters, physical aids Like shoe forms have a limit of optimization as well. But people dominating their fields usually do so due to biological advantage. In modern high professionalized sports, there's little "fairness" involved.
kcocesroh t1_jb6qhq9 wrote
I'd love to see a separate version of the Olympics where performance enhancers and other drugs are allowed.
Enter at your own risk, but lets see what humans are truly capable of...
Mm_Donut t1_jb6dln0 wrote
On the women's side, the 100m thru 800m are all tainted with strong suspicions of performance enhancing drugs.
[deleted] t1_jb4nxl6 wrote
[removed]
Raemnant t1_jb53rpq wrote
Needs some Strongman and Powerlifting love in here
kcocesroh t1_jb6qwl8 wrote
Those 4+ Standard Deviations are fucking insane. Even the 3+ are incredible, but Jesus Christ these people are ridiculous.
Nica-E-M t1_jbbp1wg wrote
Now do it for swimming events!...
Michael Phelps Michael Phelps Michael Phelps Michael Phelps Michael Phelps Michael Phelps Michael Phelps Michael Phelps Michael Phelps Michael Phelps Michael Phelps Michael Phelps
PandaMomentum t1_jb53olg wrote
Sydney McLaughlin is the greatest, most dominant, gold medal winning athlete you've never heard of.
inventionnerd t1_jb54pia wrote
"Never heard of". She's literally the face of US Olympics right now.
JPAnalyst t1_jb584zn wrote
We’ve heard of her.
But thanks for the link. I plan on reading that article.
glokz t1_jb5e2rh wrote
That's strange, her 400m hurdles record is just 0,5 seconds longer than her 400m best run.
Locke_and_Lloyd t1_jb6o6nc wrote
She'll probably break 400m WR this upcoming season if she runs one.
[deleted] t1_jb4k1o0 wrote
[deleted]
3trackmind t1_jb4n53m wrote
I agree because of your point two. However, having the event listed twice has no effect on how you might or might not order the data.
excitato t1_jb4qgow wrote
The listing isn’t dependent on the men’s order, the events are just ordered in a seemingly logical sequence. Otherwise the hurdle sprint would be at the bottom instead of the middle.
PlanetRo t1_jb4tejb wrote
I agree I also would like to see the events listed on both sides.
Or the events to be listed in the center and the two bars to diverge from the center. I think that would have rooted the event to be at the center but it might compel the user to compare men and women's data. Not sure how making users compare is good or , this case at least.
[deleted] t1_jb4p2gf wrote
[deleted]
JPAnalyst t1_jb4pwvk wrote
Jesus. Can we pick more nits than that? This sub can be obnoxious sometimes. All people want to do is shoot holes in other peoples OC.
LanewayRat t1_jb4rsxs wrote
It’s what we a here for isn’t it? We are all striving for beautiful data, well presented and clearly described. My apologies if that’s distressing for you.
JPAnalyst t1_jb4sr25 wrote
>It’s what we a here for isn’t it?
Is it? To critique and find things wrong with charts? I don’t see that as a stated purpose of this sub. That’s not why we are here, maybe it’s why you are here.
LanewayRat t1_jb4xfcu wrote
Hilarious that you were so busy attacking me that you didn’t notice I’d actually made an error 😂
JPAnalyst t1_jb4xvhz wrote
Your definition of hilarious and my definition of hilarious are quite different.
I didn’t notice, nor do I care what your error was. I didn’t care then and I don’t care now.
XO_WHORE_Llif3 t1_jb4so5y wrote
The entire point of the data is how far ahead the record holder is than everyone else. Most dominant is undoubtedly the correct term for this measurement. And if you don’t understand the term most dominant, the subtext right beneath the title explained the concept perfectly.
LanewayRat t1_jb4x5cq wrote
Oops, thanks I get it now. It’s not very obvious though as can be seen from other comments.
HippyChaiYay t1_jb59a2t wrote
except they’re called athletes not athletics
rinikulous t1_jb6yfv3 wrote
Athletics (with an s) is a reference to a set of athletic events that is typically called Track & Field. It is referring to the events, the not competitors participating in the events.
spicer2 OP t1_jb4hznt wrote
Tools used: Python, Excel
Source: IAAF Toplists
Methodology/other bits:
A question I’ve wondered for a long time is if there’s a good way to measure how dominant athletics records are between events. I remember reading once years ago that, with the help of some statistical trickery, Paula Radcliffe’s (now-broken) marathon world record was considered to be one of the most impressive human feats, up there with Bob Beamon’s long jump, but I never tracked down the original source or methodology behind it.
I’m sure most people here know what a z-score is but I’ll show my working for full transparency. It essentially tells you how many standard deviations from the mean a given score or “exceptional” a given score is. It’s really handy for letting you make comparisons across categories that use different measurements.
To be clear about the sample I used - I took the 100 best competitor’s times, not the 100 best times overall. So Usain Bolt’s score is gauged against Asafa Powell, Tyson Gay etc’s best times, not all of those and also his own. The main reason I did that was because I was most interested in how dominant the individuals were, not the times themselves.
On that note, I really like this chart as it shows just how good Usain Bolt was. I also like that it confirms the 100/110m hurdles is such a tight and unpredictable event where you don’t really get athletes that consistently sweep the board for medals, as you do in others.
PS: I gathered the data for this in January but sat on it for a bit, so some of this may be slightly out-of-date.