harharveryfunny t1_jaj8bk2 wrote on March 1, 2023 at 8:45 PM

Reply to comment by Educational-Net303 in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir

Could you put any numbers to that ?

What are the FLOPS per token inference for a given prompt length (for a given model)?

What do those FLOPS translate to in terms of run time on Azure's GPUs (V100's ?)

What is the GPU power consumption and data center electricity costs ?

Even with these numbers can we really relate this to their $/token pricing scheme ? The pricing page mentions this 90% cost reduction being for the "gpt-3.5-turbo" model vs the earlier davinci-text-3.5 (?) one - do we even know the architectural details to get the FLOPs ?

WarProfessional3278 t1_jaj9nnt wrote on March 1, 2023 at 8:53 PM

Rough estimate: with one 400w gpu and $0.14/hr electricity, you are looking at ~0.00016/sec here. That's the price for running the GPU alone, not accounting server costs etc.

I'm not sure if there are any reliable estimate on FLOPS per token inference, though I will be happy to be proven wrong :)