Viewing a single comment thread. View all comments

rePAN6517 t1_izilxsq wrote

Paper only tested against InstructGPT 175B / text-da-vinci-002. They did not test against ChatGPT or text-da-vinci-003.

If they had, I think the paper would obviously be titled "Large language models are zero-shot communicators"

14

CommunismDoesntWork t1_izj06ql wrote

Yeah, we're at the point where models are improving faster than we can evaluate them lol

10

egrefen t1_iznmjuu wrote

Those models weren’t released at time of writing. I would love it if these models significantly moved the dial on this benchmark, as that would confirm the direction we see with Davinci. Curious to hear why you are so confident, though.

1