Viewing a single comment thread. View all comments

bulbaquil t1_je2yzza wrote

To summarize the .wav specification u/turniphat mentioned:

  • The first 4 bytes tell the computer "Hi, I'm a multimedia file. Please treat me accordingly."

  • Bytes 5 through 8 tell the computer "Here's how long I am." This is the answer to your question - one of the first things files of any kind will do is tell the computer how big they are, precisely because this is something the computer needs to know.

  • Bytes 9 through 12 tell the computer "Specifically, I'm a .wav file."

  • Bytes 13 through 36 tell the computer "Because I'm a .wav file, here are some things you need to know about me. Like, what's my bitrate, am I stereo or mono, how many channels do I have, etc."

  • Bytes 37 through 44 tell the computer: "Okay, the actual data's coming now. Just a reminder: this is how big it is."

  • Bytes 45 through whatever number the previous 44 bytes told us are the actual sound itself.

As for why the computer treats 1001 as 9 instead of as 2-1, because at a very fundamental level the computer isn't reading the data bit by bit; it's reading it in chunks (sort of like taking steps two at a time). By default, the chunk size is the "X" that they're talking about whenever they refer to an "X-bit system" or "X-bit architecture", but if a file is encountered, its directives on How to Read This Kind of File take over. So it isn't seeing it as a sequence "1-0-0-1" and trying to figure out where to break it; it's seeing it as a gestalt "1001" (really, "00001001") and treating it as a single unit. If you wanted a 2 and then a 1, you'd need two different units: 00000010 00000001.

Tl;dr: Files share information about themselves to the computer when they're loaded. One of the things they share is how big they are, and another is how many bits of data the computer should read from them at a time.


fiatfighter t1_je35lzm wrote

This really made sense to me and I am NOT that technologically literate. And I definitely do not understand coding or this byte structure thing. But when you said-ok this piece is the program or file saying this, and this one is telling it this-that helped me wrap my brain around it. Thank you! Off to submit my resume to Twitter! Oh wait…


RelativeApricot1782 t1_je4t9td wrote

>Bytes 37 through 44 tell the computer: “Okay the actual data is coming now. Just a reminder: this is how big it is.”

Why does the computer need to be reminded?


mrpenchant t1_je4vys2 wrote

They misstated it a little bit.

The way the format is set up is the first time it gives a length is for the whole thing, but it is defined to have 2 subchunks. The first subchunk will always have the same size for a wave file, but does provide a length of that subchunk and then the last data length is just for the data in the 2nd subchunk.

This is all to say, it's not a reminder but a slightly different length, which would be the length of the entire thing minus 36.


aiusepsi t1_je4u9v5 wrote

A computer doesn't, but software is (at least for now) written by human beings. You could have the size of the actual payload be implicit, and calculated from the information you've already seen, but there's more opportunity for the person writing the code which is reading the file to get the calculation wrong in some subtle way.

If the size is written explicitly just before the data, you can make the code which reads it much simpler and therefore more reliable. Simple and reliable is really good for this kind of code; mistakes can lead to software containing security vulnerabilities. Nobody wants to get a virus because they played a .wav file!


nerdguy1138 t1_je5ekl5 wrote

The gnu file utility can read the first few bytes of a file as a magic number to determine what kind of file it is.

There is a hacker magazine called POC or GTFO, meaning proof of concept.

The PDFs of that magazine can also be interpreted in various other ways. Files that you can do this with are called polyglots.