Viewing a single comment thread. View all comments

IluvBsissa t1_j9j5t08 wrote


Destiny_Knight t1_j9j6iq0 wrote

impressed lol


IluvBsissa t1_j9j6v5v wrote

If these models are so smol and efficient, why are they not released ?? I just don't get it. I thought PaLM was kept private because it was too costly to run to be profitable...


kermunnist t1_j9kqsaw wrote

That's because the smaller models are less useful. With neural networks (likely including biological ones) there's a hard trade off between specialized performance and general performance. If these 100+x smaller models were trained on the same data as GPT-3 they would perform 100+x worse on these metrics (maybe not exactly because in this case the model was multimodal which definitely gave a performance advantage). The big reason this model performed so much is because it was fine tuned on problems similar to the ones on this exam where as GPT-3 was fine turned on anything and everything. This means that this model would likely not be a great conversationalist and would probably flounder at most other tasks GPT-3.5 does well on.