Viewing a single comment thread. View all comments

blose1 t1_j1apar1 wrote

Even with int8 you need at least 175 GB of VRAM to run one model instance, time to launch and load it on demand will be higher that using openai api and your performance will be lower. Forget about running current generation of LLMs like OPT/BLOOM in cloud for real world cases, they are crap, I've tested them, they loop all the time and they can't match chatGPT results, you will not get performance of chatGPT from them without human assisted RL step that openai did. So wait for next gen of open source models or just use chatGPT.

5