lostmsu t1_jaj0dw2 wrote on March 1, 2023 at 7:56 PM

Reply to comment by Educational-Net303 in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir

I would love an electricity estimate for running GPT-3-sized models with optimal configuration.

According to my own estimate, electricity cost for a lifetime (~5y) of a 350W GPU is between $1k-$1.6k. Which means for enterprise-class GPUs electricity is dwarfed by the cost of the GPU itself.

currentscurrents t1_jajfjr5 wrote on March 1, 2023 at 9:29 PM

Problem is we don't actually know how big ChatGPT is.

I strongly doubt they're running the full 175B model, you can prune/distill a lot without affecting performance.

MysteryInc152 t1_jal7d3p wrote on March 2, 2023 at 5:29 AM

Distillation doesn't work for token predicting language models for some reason.

currentscurrents t1_jalajj3 wrote on March 2, 2023 at 6:03 AM

DistillBERT worked though?

MysteryInc152 t1_jalau7e wrote on March 2, 2023 at 6:07 AM

Sorry i meant the really large scale models. Nobody has gotten a gpt-3/chinchilla etc scale model to actually distill properly.