Viewing a single comment thread. View all comments

mrx-ai OP t1_izijamw wrote

You might want to read at p.8 in the paper. The authors evaluate three different models (GPT-3-175B, InstructGPT-3-175B, and text-davinci-002) using different prompt templates, but none of the models show improved performance. The variance of the results for text-davinci-002 is particularly high, and the best prompt template only achieves a 74.5% accuracy rate.

6