Motion-to-Photons t1_ja29lm6 wrote on February 26, 2023 at 8:23 AM

Dum question, but let’s say this time next year we are indeed running a 13-billion parameter LLM on our top spec home GPUs, how long would a response take? With images I’m happy to wait 60 seconds for a really good result, but would I wait that long for a reply from an LLM? Perhaps we are running 13-billion parameter models next year, but it might by be another 4 or 5 years until we would actually want to?

AylaDoesntLikeYou OP t1_ja2crc9 wrote on February 26, 2023 at 9:04 AM

With stable diffusion they were able to drastically reduce their generation time to 5- 12 seconds (depending on the GPU) and they were able to reduce vram usage from 16gb to 4gb in less than a month.

These optimizations wouldn't take more than a year, they can happen within months. Weeks in some cases, especially once the model is running on a single device.

qrayons t1_ja375cg wrote on February 26, 2023 at 2:43 PM

I don't know. It seems like the 13b parameter model is already the optimized version. Obviously I hope I'm wrong though.

-AlkalineWater- t1_ja6tvr3 wrote on February 27, 2023 at 7:15 AM

It never ends

visarga t1_ja2uz8e wrote on February 26, 2023 at 12:56 PM

Apparently 13B models feel comparable with chatGPT on a 3090 card with 24gb vram (source). So it would be fast!

Motion-to-Photons t1_ja41xqc wrote on February 26, 2023 at 6:12 PM

Wow! That pretty much answers my question, then!

Honestly, I’m not happy with this rate of progress. Many people are not smart enough to see through simple Facebook/TikTok/Instagram algorithms. They have no chance when confronted with weaponised AGI.