Dum question, but let’s say this time next year we are indeed running a 13-billion parameter LLM on our top spec home GPUs, how long would a response take? With images I’m happy to wait 60 seconds for a really good result, but would I wait that long for a reply from an LLM? Perhaps we are running 13-billion parameter models next year, but it might by be another 4 or 5 years until we would actually want to?


With stable diffusion they were able to drastically reduce their generation time to 5- 12 seconds (depending on the GPU) and they were able to reduce vram usage from 16gb to 4gb in less than a month.

These optimizations wouldn't take more than a year, they can happen within months. Weeks in some cases, especially once the model is running on a single device.


I don't know. It seems like the 13b parameter model is already the optimized version. Obviously I hope I'm wrong though.


Apparently 13B models feel comparable with chatGPT on a 3090 card with 24gb vram (source). So it would be fast!


Wow! That pretty much answers my question, then!

Honestly, I’m not happy with this rate of progress. Many people are not smart enough to see through simple Facebook/TikTok/Instagram algorithms. They have no chance when confronted with weaponised AGI.