Nhabls t1_je9anrq wrote on March 30, 2023 at 10:48 AM

Reply to comment by bjj_starter in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Which team is that? The one at Microsoft that made up the human performance figures in a completely ridiculous way? Basically "We didn't like that pass rates were too high for humans for the hard problems that the model fails on completely so we just divided the accepted number by the entire user base" oh yeah brilliant

The "human" pass rates are also composed of people learning to code trying to see if their solution works. Its a completely idiotic metric, why not go test randos on the street and declare that represents the human coding performance metric while we're at it