Comments

You must log in or register to comment.

DenL4242 t1_j60ko3y wrote

If Mt. Everest were a cow, it would be the largest cow on earth.

250

pookiedookie232 t1_j627vhb wrote

Without a banana for scale I can't really understand the significance of this

32

Olhapravocever t1_j629gnz wrote

If my grandma was a bike....

20

RepulsiveLeather8504 t1_j615i88 wrote

It´s complicated enough as it is.
I don´t see any reason to include your mother in this.
We are trying to be nice here.

7

YetiGuy t1_j63hlcl wrote

Your argument is so far off though. I mean at least a YouTube and a library are comparable.

Let me fix that. If Mt Everest was a popsicle, it’d be the largest popsicle in the world. /s

1

tomiwa1a t1_j6b80iu wrote

I don't think it's fair to say that comparing Youtube to a Library is like comparing Mt. Everest to a Cow. For one thing, there is actually a pretty clever way to estimate the amount of text on Youtube and compare it to the amount of text in a library.

Maybe, if I explain how we made the graph you'll see that it's more apples to apples than mountains to cows:

  1. We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
  2. We found how many words are spoken per hour of human conversation source: virtualspeech
  3. We calculated the number of words in the average book source: jericho writers

Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube

You can see the details of those calculations here: https://docs.google.com/spreadsheets/d/1UbekWhTLJKQj6ZLipg1R269CQ8g0ACDbzPRDFN14inc/edit#gid=52223737

1

[deleted] t1_j6041c5 wrote

[deleted]

109

actvdecay t1_j61a324 wrote

Right. Library of what exactly?

Name general categories… or what would be the backbone of Organization- cats?

30

Putoigituresse t1_j61nqnd wrote

I’m actually crazy impressed by how low Reddit is on that list. I have a hard time believing library of the congress has more text than all of Reddit, Twitter, youtube, and Wikipedia combined

20

Dyzerio t1_j62392s wrote

Doesn't a lot of classified stuff get put in the library of Congress? 100 page long reports could fluff that up but also not exactly sure what the units are

11

rose1983 t1_j6376zx wrote

Wikipedia is largely a directory with (mostly) very good summaries.

For a lot of Wikipedia articles there are hundreds of books written on the subject, so I can believe that.

3

vtTownie t1_j63ajxn wrote

Ya idk how this stuff was sourced (no link so I’m not even gonna bother) but reddit had 303m posts in 2020 and the LOC only has 175m cataloged items

1

CeeMX t1_j6528wz wrote

Much likely count of videos compared to the books in the library, which is a weird metric, as books contain much more content than a video and on the other hand the amount of data would put YouTube on rank 1 by far

1

tomiwa1a t1_j69prfr wrote

Good point, here’s we got this information.

  1. We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
  2. We found how many words are spoken per hour of human conversation source: virtualspeech
  3. We calculated the number of words in the average book source: jericho writers

Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube

You can see the details of those calculations here: https://docs.google.com/spreadsheets/d/1UbekWhTLJKQj6ZLipg1R269CQ8g0ACDbzPRDFN14inc/edit#gid=52223737

Edit: I also have a question about the last thing you said > there’s so much more content than that though

What other content is there?

1

NovaticFlame t1_j61as5y wrote

How is this beautiful? There’s not even a y axis FFS

88

Notaprumber t1_j60ekmp wrote

95% of YouTube is reposted garbage, or some dude pointing up with his face below a video

28

KingNFA t1_j63ebts wrote

99.99% are videos with zero views and just a few seconds

3

Hentai_Yoshi t1_j61l1pg wrote

Bro, did you never pay attention is school? Label your y axis. Like millions of what?

22

worriedshuffle t1_j62s8xu wrote

16

tomiwa1a t1_j6b8wnz wrote

Can you please clarify? what do you mean by it isn't clear how books on Youtube is calculated?

If you check this range you can see how we arrived at our numbers:

  1. We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
  2. We found how many words are spoken per hour of human conversation source: virtualspeech
  3. We calculated the number of words in the average book source: jericho writers

Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube

1

worriedshuffle t1_j6bmjs3 wrote

Phenomenal calculation. You assume every minute of YouTube contains nonstop speech at the average word rate. Obviously this is false.

Second, in comparing quantity of speech you say nothing about quality. Libraries don’t contain every single book in existence. Most books are trash. YouTube does contain tons of trash.

1

simonezchen OP t1_j6afztf wrote

Good point, you're right we should've labelled the y-axis. It's "Number of Books" as we calculated the numbers approximately to that unit in sheets.

0

BlizzardArms t1_j61c6rd wrote

This data just makes me wonder what we’d know if Alexandria hadn’t burned

14

KingNFA t1_j63e9dn wrote

About history probably more, about science probably nothing more

7

pookiedookie232 t1_j627xeh wrote

I feel like PornHub should be on this list...

6

ZeusTheRecluse t1_j62mm1t wrote

I'm left wondering:

  1. if youtube is only third, what the hell is in the Library of Congress and British Library... seriously...

  2. Why have i never heard of the Library and Archives Canada (I. Am. Canadian).

  3. Wikipedia sooooo small???? damn, wow....

4

tomiwa1a t1_j6ba59m wrote

  1. The other interesting piece is that Library of Congress was founded in 1800 (though a fire caused it to restart it's collection in 1815).

Youtube was founded in 2005.

So in just 17 years, Youtube has amassed a collection of information that is 57% the size of the world's largest library which has been accumulating it's collection for over 200 years.

​

  1. I'm also Canadian. Hadn't heard of it either until we did this report. We probably haven't heard it because we likely won't need to use any of it's resources. Public libraries already do a really good job for most of our day to day needs.

  2. Wikipedia's small size makes sense given that contributions are heavily restricted and have such a high bar. Imagine if every Youtube video had to be approved by a editors before or every author had to have their books approved by editors before publishing.

1

Demolisher94 t1_j638fuw wrote

If my grandma had wheels, she would be a bicycle!

3

Andulias t1_j63dgrr wrote

And if my grandmother had wheels, she would've been a bike.

3

MurdrWeaponRocketBra t1_j6093np wrote

This is really cool. I'm trying to understand how this works... would you have to store transcripts of all 800 million videos on YouTube? How often does this transcript database get updated?

2

tomiwa1a t1_j6bagzz wrote

Thanks! The transcripts get added on-demand when users request to search for a video. It wouldn't make sense to index the entire database given it's large size. We're also able to get the transcripts pretty quickly, so there's no need to pre-cache the transcripts if a user has never asked for it before.

​

A more detailed overview of how it works can be found here:

  1. https://www.reddit.com/r/OpenAI/comments/10j3gzy/comment/j5jh0wo/?utm_source=share&utm_medium=web2x&context=3
  2. https://atila.ca/blog/tomiwa/atlas
1

Lethlnjektn t1_j61mlm8 wrote

The library of Alexandria sheds a tear because of 999,999,990 terrible hours of "content" on YouTube

2

miskathonic t1_j633f7k wrote

The Library of Alexandria had maybe 100,000 books worth of scrolls containing ??? written at a time when the smart people thought disease was caused by bad air

There probably was some dope shit, but there's an order of magnitude more educational content on YouTube than burned in the LoA

3

Lethlnjektn t1_j63vrzq wrote

I gave YouTube 9 hours of useful information. I’d say most mechanics, electricians, and similar forms of trade would agree.

1

JoffeJoffer t1_j64moop wrote

Tbf, that would be the case for a significant portion of the British Library as well.

Bad air

1

Delta4o t1_j62yelk wrote

Reddit would be a library where every other book would be a NSFW question from askreddit

2

ezenn t1_j63dwai wrote

I get a feeling that with the growing number of subscribes in this subreddit, the quality of posts are decreasing. What is the quality of data here and what does it tell us?

2

Ikbeneenpaard t1_j63l4gb wrote

If Reddit were a library, it would be a shitty library.

2

M3NTAL-313 t1_j63rdhe wrote

Can your AI Search algo index timestamps for stars and sexacts from a library of 100K+ p0rn videos? DM me if so...

2

rikspik t1_j60g2yt wrote

I would have guessed youtube to be second, after wikipedia. Looks like I was way off then. How do you compare? Pageviews?😁

1

Purplekeyboard t1_j62z8oy wrote

Your AI search engine doesn't seem to work. I tried searching on multiple things from youtube videos, like "I gotta have more cowbell", and it produced results which didn't in any way relate to what I searched on.

1

tomiwa1a t1_j6bapiz wrote

The reason that happens is because unless someone has previously submitted a youtube video with "I gotta have more cowbell" we won't have it in our index.

​

>The transcripts get added on-demand when users request to search for a video. It wouldn't make sense to index the entire database given it's large size. We're also able to get the transcripts pretty quickly, so there's no need to pre-cache the transcripts if a user has never asked for it before.A more detailed overview of how it works can be found here:

  1. https://www.reddit.com/r/OpenAI/comments/10j3gzy/comment/j5jh0wo/?utm_source=share&utm_medium=web2x&context=3
  2. https://atila.ca/blog/tomiwa/atlas

See: earlier comment

1

HieronymusGoa t1_j62zqq3 wrote

...id be a very shitty librabry ^^ and i love youtube.

1

rose1983 t1_j637lqz wrote

If YouTube was a library, it should be named Sturgeon’s Library.

1

3022_Dispatch t1_j637m0j wrote

The next time someone shows me a data table as definitive proof of some ridiculous idea they hold, I’m going to share this post

1

insane9001 t1_j63e1z1 wrote

What is the Y axis? Surely that must be a requirement for posting graphs in this sub

1

tomiwa1a t1_j6bbj7o wrote

The Y Axis is number of books. I agree with you though, That was an oversight on our part. I also don't like when graphs don't have a labelled Y-Axis. Next time we'll add them.

1

anynonus t1_j63rj7d wrote

If the atlantic ocean was a bath it's be the biggest bath in the world

1

Zenzayy t1_j63v7rs wrote

Nice axis title, dweeb. Why even post this here?

1

EICONTRACT t1_j64jmwj wrote

Doesn’t google already give you time stamps of your search as long as it’s chapetwred?

1

BradMH88 t1_j64ohh8 wrote

I feel like we’ve all let Reddit down. Look how small it is. It’s time to increase our Reddit participation. This is just embarrassing. I have to imagine there are more random safes or something to generate mini hysteria.

1

Chramir t1_j64xwtj wrote

They made a estimate of how many words are there in every youtube video uploaded. That estimate is calculated by the total runtime of all the videos multiplied by average word count in a conversation per given time. And the total words are devided by the number of words in a average book. To get a 'books size'.

I don't know, but that just seems kinda iffy. First youtube videos are rarely a back and forth conversation. And secondly it's like pointing to a skyscraper and saying it's like a big sandcastle because sand is used in concrete.

Edit: grammar and added the 'word count' estimate explanation.

1

tomiwa1a t1_j6bb8e8 wrote

Exactly! This is how it works.

I agree it's not perfect, but remember, Youtube itself is not a library so any comparisons to real libraries will require some degree of approximation. You can think of it as an approximate estimate or my preferred term, a Fermi Estimate.

1

Lyndon91 t1_j650wlw wrote

Don’t get how it makes sense. Is the book equivalent to the video once it’s been transcribed?

1

Lirlya t1_j65g2ta wrote

Your missing Hella lot of librairies in your data source

1

actvdecay t1_j61a7j7 wrote

I wonder what a AI chat bot trained on YouTube library would say…

0