Submitted by Balance- t3_124eyso in MachineLearning
Nhabls t1_je9anrq wrote
Reply to comment by bjj_starter in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Which team is that? The one at Microsoft that made up the human performance figures in a completely ridiculous way? Basically "We didn't like that pass rates were too high for humans for the hard problems that the model fails on completely so we just divided the accepted number by the entire user base" oh yeah brilliant
The "human" pass rates are also composed of people learning to code trying to see if their solution works. Its a completely idiotic metric, why not go test randos on the street and declare that represents the human coding performance metric while we're at it
Viewing a single comment thread. View all comments