Submitted by Balance- t3_124eyso in MachineLearning
bjj_starter t1_jdzoafq wrote
This title is misleading. The only thing they found was that GPT-4 was trained on code questions it wasn't tested on.
Nhabls t1_je94xwx wrote
Not misleading. The fact it performs so differently on easy problems it has seen Vs not , specially when it fails so spectacularly on the latter does raise big doubts about how corrupted and unreliable their benchmarks might be
bjj_starter t1_je98wdx wrote
Okay, but an external team tested it on coding problems which only came into existence after its training finishes, and found human level performance. I don't think your theory explains how that could be the case.
Nhabls t1_je9anrq wrote
Which team is that? The one at Microsoft that made up the human performance figures in a completely ridiculous way? Basically "We didn't like that pass rates were too high for humans for the hard problems that the model fails on completely so we just divided the accepted number by the entire user base" oh yeah brilliant
The "human" pass rates are also composed of people learning to code trying to see if their solution works. Its a completely idiotic metric, why not go test randos on the street and declare that represents the human coding performance metric while we're at it
[deleted] t1_je15c4c wrote
[deleted]
All-DayErrDay t1_je1g2d8 wrote
Exactly!
Viewing a single comment thread. View all comments