Cryptizard t1_ja69a0b wrote on February 27, 2023 at 3:41 AM

Reply to comment by Mason-B in So what should we do? by googoobah

It seems to come down to the fact that you think AI researchers are clowns and won’t be able to fix any of these extremely obvious problems in the near future. For example, there are already methods to break the quadratic bottleneck of attention.

Just two weeks ago there was a paper that compresses GPT-3 to1/4 the size. That’s two orders of magnitude in one paper, let alone 10 years. Your pessimism just makes no sense in light of what we have seen.

Mason-B t1_ja6cwsg wrote on February 27, 2023 at 4:12 AM

> It seems to come down to the fact that you think AI researchers are clowns and won’t be able to fix any of these extremely obvious problems in the near future.

No, I think they have forgotten the lessons of the last AI winter. That despite their best intentions to fix obvious problems, many of them will turn out to be intractable for decades.

Fundamentally what DNNs are is a very useful mechanism of optimization algorithm approximation over large domains. We know how that class of algorithms responds to exponential increases in computational power (and re, efficiency), more accurate approximations at a sub linear rate.

> For example, there are already methods to break the quadratic bottleneck of attention.

The paper itself says it's unclear if it works for larger datasets. But this group of techniques is fun because it's a trade off of accuracy for efficiency. Which yea, that's also an option. I'd even bet if you graphed the efficiency gain against the loss of accuracy across enough models and sizes it would match up.

> That’s two orders of magnitude in one paper, let alone 10 years.

Uh what now? Two doublings is not even half of one order of magnitude. Yes they may have compressed them by two orders of magnitude but having to decode them eats up most of those gains. Compression is not going to get enough gains on it's own, even if you get specialized hardware to remove a part of the decompression cost.

And left unanalyzed is how much of that comes from getting the entire model on a single device.

Fundamentally I think you are overlooking the fact that research into this topics has been making 2x, 4x gains all the time but a lot of those gains are being done in ways we can't repeat. We can't further compress already well compressed stuff for example. At some point soon (2-3 years) we are going to hit a wall where all we have is hardware gains.