kompootor

kompootor t1_jcpq4mf wrote

Could you please link to your source page directly, so it's obvious where you got your data? People here obviously will have questions about the methodology.

Also, what is the bin size on your chart?

(For example, seeing as there was no FDIC in 1934, and local banks would not have been on any kind of registry, I'm sure we'd all like to see whether the dataset attempts to do an adjustment for this or leaves the data raw. Also, the standard for a "bank failure" has changed significantly since the FDIC and associated regulations were established. In case people haven't noticed, SVB is still around and now solvent -- so is it a failure comparable to 1934?)

6

kompootor t1_jcgmy47 wrote

You're right, they're different industries and different markets, so it's not a perfect parallel, and I completely neglected that fact.

By numbers, in January 2023 of 58m culled, at least 40m were egg-laying hens (the public dataset does not have very standardized distinctions for many flocks, so it looks to be an estimate based on what's known).[NBC 2023-01-18] One explanation for the discrepancy: "Chickens grown for meat can be less prone to infection as they are slaughtered after about six weeks, but bigger, older birds and egg-laying hens [who live longer] have been severely affected. " [Bloomberg 2022-12-19] Also I should have looked up other indicators like turkey prices, which have risen steadily, except their rise begins in Dec 2021 (prior to the first reported outbreak) and continues to soar, overall almost as steeply as the price of eggs, without fluctuation to date.

So it's definitely not as simple as I thought, and I shouldn't have just put it down to some speculative bubble, since nobody else is (although USDA reports don't even seem to address the price crash in January -- I can't imagine what else at least that spike could have been, but I'm no commodities trader.) Good call-out.

[To be clear, this is what I am claiming now: I have a decent suspicion that the peak and crash in egg prices in January was due to a not-insignificant element of speculation some time during the months leading up. My supporting evidence for this, or links to qualified experts (unlike me) who might have a similar suspicion, is nonexistent -- I can't find anything worthwhile. Hopefully as I detailed how the sources I found countered my initial reasoning, something of it might be informative to others interested in this topic.]

2

kompootor t1_jcg8g15 wrote

Yes, it links to the data. I recommend you amend the newsletter to include the source as well, if you want people to take your publication seriously. (I know it's just a simple 2D line graph of the data, but that's perfectly ok for a professional visualization -- what's not ok is not linking to the data directly if available.)

Your title, or something, needs to clarify that it's either using CPI or adjusted for inflation -- either works. BLS also publishes data on "average price" in USD, which is not adjusted, so what you have is ambiguous at best -- though my initial assumption was that your graph wasn't adjusted, because a lot of times data that's not adjusted does not explicitly specify that it's "not adjusted" -- see e.g. any graph or dataset from IMF. So that really needs to be specified.

1

kompootor t1_jcg0ofo wrote

The high was a response to culling (as well as a rising high in the past 5 years due to US market trends due to things like cage bans and a growing organic market share -- small next to the culling response, but the market hadn't totally responded yet). The historic high was speculative.

How can you tell? Compare the growth in the wholesale price of chickens to that of eggs. Both eggs and chickens had no response when avian flu was first reported in Jan 2022, then rapidly rose with the first culls in March, which continued until another flurry of news stories about culls in Oct-Nov (a Google search is best to see the general distribution of news story dates in 2022, but I don't think I can link my own results now). But chicken prices didn't respond then, because culls had been continuous, while eggs did, which I suspect was market speculation -- that's confirmed because egg prices crashed in January 2023 (back to where they are "supposed to be"), while chicken prices are steady. That's finance QED afaik.

[Edit: My opinion on this is significantly less confident -- see continued comments below.]

2

kompootor t1_jcfz8tu wrote

When you list a source, whether in your graph or in the post here (I'd say especially in the post, but it should be in the graph too), a user must be able to verify the data. I cannot find the source data, and I followed the link to the CPI site.

Furthermore, the source is definitely not the site on which you originally post the graph -- for one thing, that is not "OC". If it's from a newsletter, that's your secondary citation, whereas you still have to make the primary citation to the original data so that, again, we can verify the numbers, who calculated them, their methodology (definitions, date range, their own data sources, etc.), among other things.

4

kompootor t1_jc8z8fq wrote

I heard the first sport Fosbury pursued seriously was boxing; the track & field team was just for cross-training. But as his career in the ring seemed more uncertain, he suddenly realized: he could make more money with a flop than with a hit!

[I know this is one of many jokes here about the flop. It is justifiable to keep talking about it as it did revolutionize the sport -- most sports don't get revolutionized. I don't think it's bad taste that people here make jokes (but c'mon, give it some effort) about Fosbury and the flop on his death as it's his clear historical legacy (apart from friends and family, comments from/about whom have been rightfully upvoted to the top). I didn't know the guy personally, obviously, but I've discussed the flop in conversation, in coaching, and especially in teaching science. So my contribution to this remembrance thread will be a joke I made up about the flop -- at least it's (hopefully) more entertaining than my physics lecture about it.]

1

kompootor t1_jbrq4yy wrote

You really need to have vertical ticks indicating January for each year (and make that clear). Importantly, this will indicate that the dataset only runs through January 2023 (otherwise, that information must be indicated explicitly somewhere).

Also, if the thesis of the visualization is about the recent floods, then the time scale seems overly large, since you can pretty well capture the prior upward trend and volatility information by cutting off at 2018 or so. If however you were to include markings shading other major disasters and political turmoil, that would justify the time scale, and it would lend support to your thesis if, for example, you find other major events that don't correspond to dramatic changes in these economic indicators, and mark them as well. (You should be impartial in choosing events however. Significant events like the oil crash from 2014--2016 that sent Nigeria into a spiralling recession in 2016, should not be ignored.)

2

kompootor t1_jb54i2q wrote

Sidebar comment from the above in case anyone is interested further: the prewar data that is easily available is terrible, but a lot of it is still out there, poorly summarized in disparate sources. For womens Olympics history, the 1922 Women's World Games, aka the Women's Olympic Games (but confusingly for those trying to research it took place at a very similar time and place to the 1922 Women's Olympiad, with several of the same athletes, and yet several events having just slightly different lengths). Afaict the sources that will be most likely to hold the final incomplete data in the medalist table are a bunch of old contemporaneous Russian-language sports magazines that would most likely be in a national museum or archive in Ukraine or Russia or perhaps another state that has had Russian as a major language. Another interesting thing to look at is the athletes. Mary Lines shows up everywhere as a multi-sport international athlete of the time, but she has a very sparse bio on Wikipedia, as what's widely available on her seems to be poorly cited and/or difficult to otherwise plausibly verify. But a lot of these women (pseudo-)Olympians potentially have very interesting stories, especially those who did not start as, or who were not currently professional tennis players (tennis was basically the only respectable "get-sweaty" sport for women at the time, but stuff like archery and lawn sports were also big).

The politics and logistics of the WWG and early women's Olympics are fascinating too, since the regular Olympics at the time was not at all the quintessential transnational institution it is today. They all struggled with just the basics of funding at all levels, even just to get the necessary grants to bring all of the world's (aka Europe's) top athletes to a single location, so either the women's games could have been viewed by the IOC as a potential popularity/legitimacy booster for the Olympics, or it could be viewed as a competitor for scarce resources and thus an existential threat.

That's my pitch for some obscure sports history. And if you want to do further reading on this or any such topic, I strongly recommend complementing your learning with cited edits to Wikipedia -- that's how I was able to type almost all of the above (and on much much more) from memory still, even though my edits to these topics come from several years ago. Protip: in most cases don't engage in arguments on the site -- just walk away.

3

kompootor t1_jb507nj wrote

This is just an awesome idea all around, a strong visualization, and I look forward to seeing more expansions on it. Now question/critique/suggestion:

Crucially, over what years is the data taken? On the IAAF Toplists you can get a season cross-section of athletes' PRs (Personal Record, i.e. the athlete's PB that got officially recorded), or you can get all-time PRs through apparently 1899 (although prewar data on this will be terrible, especially for women), of which perhaps either the PRs from the year of the record until now are a better pick per event, or maybe it's better to compare the cross section of PRs at the year the record was won (I don't know offhand). Regardless, the data date range(s) should be put on the graph, and I really think the year of each record should be added in parentheses for each event as well, since that also hints at how big of a statistical outlier that record may have been.

Are those events that were selected with highest z-scores chosen with respect to mens' or womens' events exclusively, or a mix of the two? As it's ambiguous enough about it (it doesn't say "Top 20 events with most dominant records" or something) it seems safe to eyeball a mixed set, but it would still be nice to note.

I agree with the general sentiment that it would be nice to have the list sorted by z-score, but of course that's impossible to do for both men and women in this visualization while keeping parity/sanity of events and thus neatness. It is possible, however, in another graph format, that one may consider playing with in future (as I'm not sure how effective it would be comparatively): You take a 2D x-y graph with mens' event records on the x-axis and womens' records on the y-axis, each overloaded for different record unit types (such that you will have adequate spacing between dots if you just plotted mens' records as dots on the x-axis). Then each event gets a corresponding (x,y) point with a label; the z-scores are indicated by the label and the point having a shape sized correspondingly in the x and y (or else be simply two small bars). Then to read mens' records ascending you follow the dots left to right, and for womens' you follow the dots bottom to top. Just one possibility that someone could do with a dataset like this.

If you do another chart, I'd personally also be interested in some of the most vulnerable athletics records to be put up in the same chart, for comparing something of a baseline. Another idea for comparison, but not as useful and so better for a separate chart, would be an identical visualization using the data between the years when a very famous WR was held, such as one set by Jesse Owens at the 1936 Olympics, or Roger Bannister's 4-minute mile.

5

kompootor t1_jayy21o wrote

The title and thesis of the infographic are, to me, clear: that the number of annual automotive traffic deaths exceeds the largest amount of deaths of ever from a single disaster event in each category.

Though perhaps, now that the issue is raised, it would be more poignant to take something like the worst year of the deaths for each category, instead of the worst single event; the only one of the list I'd expect to get markedly worse from this amendment would be flooding, but it would pre-empt this possible objection. You could, if you like, denote the difference between the worst single event and other events that year with a slightly different color shade in the same box area.

50

kompootor t1_jax8tav wrote

Possible/probable (almost certain? unless you corrected for it) confounding: as wind speed will correlate (causatively, but the wind isn't just local) to temperature gradients, you will find correlation to time of year and time of day, both of which correlate highly to light and temperature which correlate to power demand.

Also, as I always note: you should include credit to yourself, date of graph creation, and cited data sources, in text on the image itself, since jerks like to copy reddit images everywhere without backlinking.

5

kompootor t1_jaw34x2 wrote

Yeah, maybe a greyscale topographical overlay or something. Obviously temperature, humidity and/or precipitation, and elevation (and also wind if you parameterize it right I guess) all correlate highly in the Alps, which is why it would be cool to see a visualization that could bring out areas in which they don't, should those areas exist. If those areas don't exist in any meaningful way, then there's no point I suppose.

1

kompootor t1_jatqqr3 wrote

Very very cool.

Do you think there's be a good way to also convey altitude proper (or something else that may be interesting like humidity or precipitation) on the same graph without completely ruining the look?

I get that this was just a fun thing to print out for this sub, and everyone appreciates it including me -- this is as much to pick on everyone here as much as you, and is to keep in mind for future posts and anything you want people to notice. The source says the dataset is CC-BY, and that means attribution must be made on derivative works. Obviously nobody will hunt you down, but it's still important, especially for a visualization this cool. So that idiots online don't endlessly reproduce this image without attribution (which is common on Reddit), you'll want to include text of this sort in the image:

  • Name/organization/site/URL, to assign author credit (optional)
  • Original publication linkback of visualization (e.g., this reddit thread -- optional)
  • Your copyright/copyleft (such as CC-BY-SA-4.0, since the original dataset is not copyrighted and not share-alike.
  • Date or year of visualization creation
  • For good graph practice in general: Date or date range of data collected; and you should put the title text on the graph, and also label the x- and y- axes as longitude and latitude.
  • Primary citation: You can do "Authors, "Title" (Date)" or just "10.5194/essd-13-2801-2021" or whatever, as long as it's attributed.
3

kompootor t1_jap34vr wrote

>"They would never treat our national football team -- the Black Stars -- like this,"

If that's true, it puts that team in decent shape historically -- at least in the context of older tragedies. (Btw we all know that Paralympic national teams get crap treatment compared to Olympic and able-bodied-event national teams, and further that national teams of little-watched Olympic sports get little to no support compared to those of popular sports, and both get minuscule support compared to those of professional sports.) I don't know football history at all, but I do remember a documentary on hooliganism covering the Heysel disaster and had footage of Liverpool's bus under attack. It didn't seem like any of the clubs got any sympathy. A more recent bus attack worth comparing, however, was the 2009 terrorist attack on the Sri Lankan cricket team. Pakistan pulled out the stops to save what little face they could; the only details I got about the aftermath in Sri Lanka is that they all spent time in hospital immediately after, but I didn't see if that meant counseling (though I'd presume so for those who weren't physically injured).

I'm not sure about a small or unpopular team getting attacked like this to compare this to, to give the quote some context, but I'm sure there must be examples of college clubs in the U.S. getting robbed in international competition or something. I know in colleges that unless you were an official varsity sport or had an endowment from a board member, a sports club could expect zero support from the school for anything beyond the fundamentals. After all there's a finite amount of money and admin staff -- maybe that's an appropriate comparison?

23

kompootor t1_jaop8jn wrote

This is actually really cool, in terms of just literary analysis. If you're the first person to put up a visualization on colors in songs/albums like this then definite ultramega kudos -- and I think albums might be even better than songs for comparing across artists and careers.

To critique the visualization itself, I feel like there is a better way to show the evolution of a set over time that doesn't feel like they're all kind of in their own little corner, independent, on a white background (which makes the grey hard to see btw). If the emphasis is indeed on evolution, or perhaps a fan might, knowing Swift's bio, coordinate the use of colors to her perceived emotional state, then placing the colors vertically and connecting them smoothly along a horizontal time axis like a rainbow of thickening and thinning bands might work. Just a thought -- I'm sure you or others will have better ideas.

And good job of course for including the citation information on the image itself. (I'd recommend including a year or date or backlink, though, which helps people find your original post if they want.)

17

kompootor t1_jalfo7j wrote

The graph's x-axis is just the funds sorted in order of their values, right? That fact that that's not the kind of x-axis that you can label and draw points on should indicate to you that the visualization is not ideal.

I think in your title assertion you're confusing "power law" with the "long tail". To verify this relationship you will need an actual x-y plot. The way to get there, from what you are describing, is with binning.

As others have pointed out something like a power law should be on a log-scale graph. When you do bins in an example like this you definitely want to try putting it on a log-log graph. But if you're not sure what you're looking for or demonstrating, you should output both log- and linear-scale graphs for your viewer.

2

kompootor t1_jaieb2x wrote

The pull-box in the lower-left has a quote not from the source:

>There appears to be no significant difference in food wastage between developing and developed countries, suggesting that most countries can implement similar actions against food waste.

Nothing like this is said in the UNEP report, and as u/Recolino points out, waste in developing countries is going to be due more to a large areas that lack refrigeration, industrial preservatives, and hardy strains of crops. Obviously that is a completely different problem, and a far more urgent one, than in the developed world.

This illustrates why in the best visualizations you should clearly indicate if there are some parts that are taken or summarized directly and precisely from your source, and another part is your own summary or synthesis or additional calculations. In this case the textbox is obviously your own words, but the first three are just basic numbers that could easily be checked (though be careful as some of the numbers do not have generally clear definitions, such as continent averages, if they are not explicitly enumerated in the source). A direct quote on solutions from the source is also verifiable. A statement in your own words, however, could possibly summarize the source text, but requires a much closer reading to verify than Ctrl+F, and could also be taken from a specific cited paper within if that source was not made explicit. All of this verifiability (and your own words are verifiably yours, as long as you explicitly denote it as such) goes to making a visualization usable outside of internet memes.

3

kompootor t1_jacul4t wrote

By beginning with "Two cannibals were...", you preview the punchline; which is fine enough since it's quite funny. But here's a possible amended setup:

> Two shipwrecked castaways are sitting down to their first real meal in weeks. The first one says: ...

This way, it's not completely obvious the punchline will be about cannibalism. I'm sure you could tighten the wording and further conceal it on the setup, making it even more effective.

10

kompootor t1_jabxp18 wrote

Those who think the ad was poorly received in the Native American community at the time should read the article. Those who think that an NA group has the rights where an NA group did not before should read the article. Those who think the actor overall was poorly received in the NA community at the time should read the article.

Also, what is with these first decades of the 21st century and people demanding everyone's genetic pedigree to meet a moral standard of purity as if it's the first decades of the 20th century? You don't have to accept people in any role they want today, but I don't see how relative moral outrage to acts in the past within past norms will help with improving anything absent a time machine.

1

kompootor t1_jaa9kb4 wrote

The visualization is imo ineffective because I'm not sure there is anything too surprising that's related on the horizontal (time) axis, which is given prominence. If the racial disparity in this metric is what is of interest to people (which it almost certainly is), then as it does not vary significantly by time -- or at least, a variation that's somehow important to point out is not made obvious -- any more effective representation would not show time on an axis.

(The staggering of the Native American line is almost certainly an artifact of a small N, particularly compared to the other race segments. The confirmation would be to see whether or how those dramatic rises and falls track any other statistics in that population or areas.)

The WaPo dataset is used in a ton of criminology and other social science research papers devising metrics to shed some light on police shooting as a phenomenon. If you're interested in doing more visualizations on this topic, poke around Google Scholar and see what they've come up with in terms of interesting statistics.

(As this is a data visualization sub, I'm commenting on the visualization and providing feedback. The criminology research on this is vast and complex -- pointing out that one metric doesn't illuminate the problem or represent everything is not particularly useful.)

1

kompootor t1_ja9jkkp wrote

>On average our experts predicted that 39 percent of the time spent on a domestic task will be automatable within ten years.

From the paper's abstract (Lehdonvirta etal 2023. As always, the headlines seem to capture it just right.

>Japanese male experts were notably pessimistic about the potentials of domestic automation, a result we interpret through gender disparities in the Japanese household. Our contributions are providing the first quantitative estimates concerning the future of unpaid work and demonstrating how such predictions are socially contingent, with implications to forecasting methodology.

This was the purpose of the paper, not the survey or the 39% number. It's to improve the methodology of these kinds of surveys and show that there is cultural bias in respondents that must be weighted.

1