Viewing a single comment thread. View all comments

Motion-to-Photons t1_ja29lm6 wrote

Dum question, but let’s say this time next year we are indeed running a 13-billion parameter LLM on our top spec home GPUs, how long would a response take? With images I’m happy to wait 60 seconds for a really good result, but would I wait that long for a reply from an LLM? Perhaps we are running 13-billion parameter models next year, but it might by be another 4 or 5 years until we would actually want to?

19

AylaDoesntLikeYou OP t1_ja2crc9 wrote

With stable diffusion they were able to drastically reduce their generation time to 5- 12 seconds (depending on the GPU) and they were able to reduce vram usage from 16gb to 4gb in less than a month.

These optimizations wouldn't take more than a year, they can happen within months. Weeks in some cases, especially once the model is running on a single device.

34

qrayons t1_ja375cg wrote

I don't know. It seems like the 13b parameter model is already the optimized version. Obviously I hope I'm wrong though.

2

visarga t1_ja2uz8e wrote

Apparently 13B models feel comparable with chatGPT on a 3090 card with 24gb vram (source). So it would be fast!

17

Motion-to-Photons t1_ja41xqc wrote

Wow! That pretty much answers my question, then!

Honestly, I’m not happy with this rate of progress. Many people are not smart enough to see through simple Facebook/TikTok/Instagram algorithms. They have no chance when confronted with weaponised AGI.

7