Comments
pookiedookie232 t1_j627vhb wrote
Without a banana for scale I can't really understand the significance of this
Olhapravocever t1_j629gnz wrote
If my grandma was a bike....
thisoldmould t1_j62bxx8 wrote
If my grandma had wheels, she would’ve been a bike.
[deleted] t1_j62cysv wrote
[removed]
Ok_Beat_9588 t1_j63pstt wrote
If your aunt had balls she’d be your uncle, but she doesn’t so she’s not
ThePreciseClimber t1_j62x11h wrote
You could ride her like Harley Quinn.
RepulsiveLeather8504 t1_j615i88 wrote
It´s complicated enough as it is.
I don´t see any reason to include your mother in this.
We are trying to be nice here.
YetiGuy t1_j63hlcl wrote
Your argument is so far off though. I mean at least a YouTube and a library are comparable.
Let me fix that. If Mt Everest was a popsicle, it’d be the largest popsicle in the world. /s
tomiwa1a t1_j6b80iu wrote
I don't think it's fair to say that comparing Youtube to a Library is like comparing Mt. Everest to a Cow. For one thing, there is actually a pretty clever way to estimate the amount of text on Youtube and compare it to the amount of text in a library.
Maybe, if I explain how we made the graph you'll see that it's more apples to apples than mountains to cows:
- We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
- We found how many words are spoken per hour of human conversation source: virtualspeech
- We calculated the number of words in the average book source: jericho writers
Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube
You can see the details of those calculations here: https://docs.google.com/spreadsheets/d/1UbekWhTLJKQj6ZLipg1R269CQ8g0ACDbzPRDFN14inc/edit#gid=52223737
[deleted] t1_j6041c5 wrote
[deleted]
actvdecay t1_j61a324 wrote
Right. Library of what exactly?
Name general categories… or what would be the backbone of Organization- cats?
walkingmelways t1_j63796c wrote
Well cats are liquid, so it’d be litres.
ShutterDeep t1_j63d491 wrote
Measured in feline ounces
Putoigituresse t1_j61nqnd wrote
I’m actually crazy impressed by how low Reddit is on that list. I have a hard time believing library of the congress has more text than all of Reddit, Twitter, youtube, and Wikipedia combined
Dyzerio t1_j62392s wrote
Doesn't a lot of classified stuff get put in the library of Congress? 100 page long reports could fluff that up but also not exactly sure what the units are
[deleted] t1_j62a90h wrote
[deleted]
rose1983 t1_j6376zx wrote
Wikipedia is largely a directory with (mostly) very good summaries.
For a lot of Wikipedia articles there are hundreds of books written on the subject, so I can believe that.
vtTownie t1_j63ajxn wrote
Ya idk how this stuff was sourced (no link so I’m not even gonna bother) but reddit had 303m posts in 2020 and the LOC only has 175m cataloged items
Plushhorizon t1_j64oxms wrote
What about the entire internet?
CeeMX t1_j6528wz wrote
Much likely count of videos compared to the books in the library, which is a weird metric, as books contain much more content than a video and on the other hand the amount of data would put YouTube on rank 1 by far
tomiwa1a t1_j69prfr wrote
Good point, here’s we got this information.
- We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
- We found how many words are spoken per hour of human conversation source: virtualspeech
- We calculated the number of words in the average book source: jericho writers
Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube
You can see the details of those calculations here: https://docs.google.com/spreadsheets/d/1UbekWhTLJKQj6ZLipg1R269CQ8g0ACDbzPRDFN14inc/edit#gid=52223737
Edit: I also have a question about the last thing you said > there’s so much more content than that though
What other content is there?
[deleted] t1_j6afh3t wrote
[deleted]
NovaticFlame t1_j61as5y wrote
How is this beautiful? There’s not even a y axis FFS
jakubkonecki t1_j630eva wrote
- How big is your library?
- 45 M
miskathonic t1_j63380h wrote
To be fair, that's like 14 stories
tomiwa1a t1_j6b8gcr wrote
Y Axis is the number of books. You're right though, the Y Axis should definitely have been there.
You can see the details of those calculations here: https://docs.google.com/spreadsheets/d/1UbekWhTLJKQj6ZLipg1R269CQ8g0ACDbzPRDFN14inc/edit#gid=52223737
Notaprumber t1_j60ekmp wrote
95% of YouTube is reposted garbage, or some dude pointing up with his face below a video
KingNFA t1_j63ebts wrote
99.99% are videos with zero views and just a few seconds
Hentai_Yoshi t1_j61l1pg wrote
Bro, did you never pay attention is school? Label your y axis. Like millions of what?
Remarkable_Coast_214 t1_j637in8 wrote
millions of cattle read it per day
worriedshuffle t1_j62s8xu wrote
Y axis isn’t even labeled and this is called beautiful data
tomiwa1a t1_j6b8wnz wrote
Can you please clarify? what do you mean by it isn't clear how books on Youtube is calculated?
If you check this range you can see how we arrived at our numbers:
- We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
- We found how many words are spoken per hour of human conversation source: virtualspeech
- We calculated the number of words in the average book source: jericho writers
Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube
worriedshuffle t1_j6bmjs3 wrote
Phenomenal calculation. You assume every minute of YouTube contains nonstop speech at the average word rate. Obviously this is false.
Second, in comparing quantity of speech you say nothing about quality. Libraries don’t contain every single book in existence. Most books are trash. YouTube does contain tons of trash.
simonezchen OP t1_j6afztf wrote
Good point, you're right we should've labelled the y-axis. It's "Number of Books" as we calculated the numbers approximately to that unit in sheets.
worriedshuffle t1_j6aky3k wrote
You mean audio books?
BlizzardArms t1_j61c6rd wrote
This data just makes me wonder what we’d know if Alexandria hadn’t burned
KingNFA t1_j63e9dn wrote
About history probably more, about science probably nothing more
Affectionate-Iron385 t1_j61w8q7 wrote
yes, and the library with the most amount of 💩
pookiedookie232 t1_j627xeh wrote
I feel like PornHub should be on this list...
ZeusTheRecluse t1_j62mm1t wrote
I'm left wondering:
-
if youtube is only third, what the hell is in the Library of Congress and British Library... seriously...
-
Why have i never heard of the Library and Archives Canada (I. Am. Canadian).
-
Wikipedia sooooo small???? damn, wow....
Hrooki t1_j63dlnj wrote
Library and Archives Canada is our national library and archives! It has all archival government records, a lot of pre-Confederation stuff, census records, and even a database on UFOs. Almost everything is free and open to the public. https://library-archives.canada.ca/eng/Pages/Home.aspx
tomiwa1a t1_j6ba59m wrote
- The other interesting piece is that Library of Congress was founded in 1800 (though a fire caused it to restart it's collection in 1815).
So in just 17 years, Youtube has amassed a collection of information that is 57% the size of the world's largest library which has been accumulating it's collection for over 200 years.
​
-
I'm also Canadian. Hadn't heard of it either until we did this report. We probably haven't heard it because we likely won't need to use any of it's resources. Public libraries already do a really good job for most of our day to day needs.
-
Wikipedia's small size makes sense given that contributions are heavily restricted and have such a high bar. Imagine if every Youtube video had to be approved by a editors before or every author had to have their books approved by editors before publishing.
Demolisher94 t1_j638fuw wrote
If my grandma had wheels, she would be a bicycle!
Andulias t1_j63dgrr wrote
And if my grandmother had wheels, she would've been a bike.
Error83_NoUserName t1_j604jaj wrote
I'm there sitting with my 50k hour Plex library...
MurdrWeaponRocketBra t1_j6093np wrote
This is really cool. I'm trying to understand how this works... would you have to store transcripts of all 800 million videos on YouTube? How often does this transcript database get updated?
tomiwa1a t1_j6bagzz wrote
Thanks! The transcripts get added on-demand when users request to search for a video. It wouldn't make sense to index the entire database given it's large size. We're also able to get the transcripts pretty quickly, so there's no need to pre-cache the transcripts if a user has never asked for it before.
​
A more detailed overview of how it works can be found here:
Ruleyoumind t1_j60mcbl wrote
Do you have a link to the search engine?
[deleted] t1_j61hsf6 wrote
[removed]
tomiwa1a t1_j6baiw7 wrote
Yup, here! https://atlas.atila.ca/
Lethlnjektn t1_j61mlm8 wrote
The library of Alexandria sheds a tear because of 999,999,990 terrible hours of "content" on YouTube
miskathonic t1_j633f7k wrote
The Library of Alexandria had maybe 100,000 books worth of scrolls containing ??? written at a time when the smart people thought disease was caused by bad air
There probably was some dope shit, but there's an order of magnitude more educational content on YouTube than burned in the LoA
Lethlnjektn t1_j63vrzq wrote
I gave YouTube 9 hours of useful information. I’d say most mechanics, electricians, and similar forms of trade would agree.
JoffeJoffer t1_j64moop wrote
Tbf, that would be the case for a significant portion of the British Library as well.
Delta4o t1_j62yelk wrote
Reddit would be a library where every other book would be a NSFW question from askreddit
ezenn t1_j63dwai wrote
I get a feeling that with the growing number of subscribes in this subreddit, the quality of posts are decreasing. What is the quality of data here and what does it tell us?
Ikbeneenpaard t1_j63l4gb wrote
If Reddit were a library, it would be a shitty library.
M3NTAL-313 t1_j63rdhe wrote
Can your AI Search algo index timestamps for stars and sexacts from a library of 100K+ p0rn videos? DM me if so...
[deleted] t1_j60f24j wrote
[removed]
rikspik t1_j60g2yt wrote
I would have guessed youtube to be second, after wikipedia. Looks like I was way off then. How do you compare? Pageviews?😁
simonezchen OP t1_j60igl5 wrote
Check out the data source where we showed how we calculated!
Thenerdy9 t1_j60h6td wrote
yes! yes yes! Where is this search engine gimme gimme :)
[deleted] t1_j60ira8 wrote
[removed]
tomiwa1a t1_j6bb9zk wrote
You can try it here: https://atlas.atila.ca/
Thenerdy9 t1_j6nqpk2 wrote
didn't work for me :/
[deleted] t1_j62u3kw wrote
[removed]
Purplekeyboard t1_j62z8oy wrote
Your AI search engine doesn't seem to work. I tried searching on multiple things from youtube videos, like "I gotta have more cowbell", and it produced results which didn't in any way relate to what I searched on.
tomiwa1a t1_j6bapiz wrote
The reason that happens is because unless someone has previously submitted a youtube video with "I gotta have more cowbell" we won't have it in our index.
​
>The transcripts get added on-demand when users request to search for a video. It wouldn't make sense to index the entire database given it's large size. We're also able to get the transcripts pretty quickly, so there's no need to pre-cache the transcripts if a user has never asked for it before.A more detailed overview of how it works can be found here:
HieronymusGoa t1_j62zqq3 wrote
...id be a very shitty librabry ^^ and i love youtube.
[deleted] t1_j631ne9 wrote
[removed]
rose1983 t1_j637lqz wrote
If YouTube was a library, it should be named Sturgeon’s Library.
3022_Dispatch t1_j637m0j wrote
The next time someone shows me a data table as definitive proof of some ridiculous idea they hold, I’m going to share this post
insane9001 t1_j63e1z1 wrote
What is the Y axis? Surely that must be a requirement for posting graphs in this sub
tomiwa1a t1_j6bbj7o wrote
The Y Axis is number of books. I agree with you though, That was an oversight on our part. I also don't like when graphs don't have a labelled Y-Axis. Next time we'll add them.
anynonus t1_j63rj7d wrote
If the atlantic ocean was a bath it's be the biggest bath in the world
Zenzayy t1_j63v7rs wrote
Nice axis title, dweeb. Why even post this here?
EICONTRACT t1_j64jmwj wrote
Doesn’t google already give you time stamps of your search as long as it’s chapetwred?
tomiwa1a t1_j6bbdzn wrote
Watch the demo. Youtube doesn't give matches this precise.
[deleted] t1_j64nkb2 wrote
[removed]
BradMH88 t1_j64ohh8 wrote
I feel like we’ve all let Reddit down. Look how small it is. It’s time to increase our Reddit participation. This is just embarrassing. I have to imagine there are more random safes or something to generate mini hysteria.
Chramir t1_j64xwtj wrote
They made a estimate of how many words are there in every youtube video uploaded. That estimate is calculated by the total runtime of all the videos multiplied by average word count in a conversation per given time. And the total words are devided by the number of words in a average book. To get a 'books size'.
I don't know, but that just seems kinda iffy. First youtube videos are rarely a back and forth conversation. And secondly it's like pointing to a skyscraper and saying it's like a big sandcastle because sand is used in concrete.
Edit: grammar and added the 'word count' estimate explanation.
tomiwa1a t1_j6bb8e8 wrote
Exactly! This is how it works.
I agree it's not perfect, but remember, Youtube itself is not a library so any comparisons to real libraries will require some degree of approximation. You can think of it as an approximate estimate or my preferred term, a Fermi Estimate.
Lyndon91 t1_j650wlw wrote
Don’t get how it makes sense. Is the book equivalent to the video once it’s been transcribed?
Lirlya t1_j65g2ta wrote
Your missing Hella lot of librairies in your data source
tomiwa1a t1_j69illm wrote
Which ones are missing?
[deleted] t1_j6bsymh wrote
[removed]
actvdecay t1_j61a7j7 wrote
I wonder what a AI chat bot trained on YouTube library would say…
[deleted] t1_j63ec5c wrote
[deleted]
simonezchen OP t1_j603wgp wrote
Source: https://docs.google.com/spreadsheets/d/1UbekWhTLJKQj6ZLipg1R269CQ8g0ACDbzPRDFN14inc/edit?usp=sharing
Tool: Canva
Check out more about our AI Search Engine: https://atila.ca/blog/tomiwa/atlas
DenL4242 t1_j60ko3y wrote
If Mt. Everest were a cow, it would be the largest cow on earth.