Submitted by minimaxir t3_11fbccz in MachineLearning
harharveryfunny t1_jaj8bk2 wrote
Reply to comment by Educational-Net303 in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
Could you put any numbers to that ?
What are the FLOPS per token inference for a given prompt length (for a given model)?
What do those FLOPS translate to in terms of run time on Azure's GPUs (V100's ?)
What is the GPU power consumption and data center electricity costs ?
Even with these numbers can we really relate this to their $/token pricing scheme ? The pricing page mentions this 90% cost reduction being for the "gpt-3.5-turbo" model vs the earlier davinci-text-3.5 (?) one - do we even know the architectural details to get the FLOPs ?
WarProfessional3278 t1_jaj9nnt wrote
Rough estimate: with one 400w gpu and $0.14/hr electricity, you are looking at ~0.00016/sec here. That's the price for running the GPU alone, not accounting server costs etc.
I'm not sure if there are any reliable estimate on FLOPS per token inference, though I will be happy to be proven wrong :)
Viewing a single comment thread. View all comments