bjj_starter t1_jdzoafq wrote on March 28, 2023 at 11:18 AM

This title is misleading. The only thing they found was that GPT-4 was trained on code questions it wasn't tested on.

Nhabls t1_je94xwx wrote on March 30, 2023 at 9:31 AM

Not misleading. The fact it performs so differently on easy problems it has seen Vs not , specially when it fails so spectacularly on the latter does raise big doubts about how corrupted and unreliable their benchmarks might be

bjj_starter t1_je98wdx wrote on March 30, 2023 at 10:25 AM

Okay, but an external team tested it on coding problems which only came into existence after its training finishes, and found human level performance. I don't think your theory explains how that could be the case.

Nhabls t1_je9anrq wrote on March 30, 2023 at 10:48 AM

Which team is that? The one at Microsoft that made up the human performance figures in a completely ridiculous way? Basically "We didn't like that pass rates were too high for humans for the hard problems that the model fails on completely so we just divided the accepted number by the entire user base" oh yeah brilliant

The "human" pass rates are also composed of people learning to code trying to see if their solution works. Its a completely idiotic metric, why not go test randos on the street and declare that represents the human coding performance metric while we're at it

[deleted] t1_je15c4c wrote on March 28, 2023 at 5:41 PM

[deleted]

All-DayErrDay t1_je1g2d8 wrote on March 28, 2023 at 6:47 PM

Exactly!