Submitted by Balance- t3_124eyso in MachineLearning
ArnoF7 t1_je0dzqg wrote
Funnily, I actually found GPT-4 far worse than what I expected in terms of coding, especially after I looked at its impressive performance on other exams. I guess it’s still a progress in terms of LLM for coding, maybe just a little underwhelming compared to other standardized tests it aces? GPT-4’s performance on codeforces is borderline abhorrent.
And now you are telling me there is data leakage, so the actual performance would be even worse than what’s on paper???
meister2983 t1_je0s90f wrote
GPT-4 is an extremely good pattern matcher - probably one of the best ever made. Most exams made seem to be able to executed with straight-forward pattern matching (with no backtracking). The same thing applies to basic coding questions - it reasonably performs at the level of a human gluing stack overflow solutions together (with the obvious variable renaming/moving lines around/removing dead code/etc.)
It struggles at logical reasoning (when it can't "pattern match" the logical reasoning to something it's trained on).
Coding example:
- Had no problem writing a tax calculator for ordinary income with progressive tax brackets
- It struggles to write a program to calculate tax on long term capital gains (US tax code), which is very similar to the above, except has an offset (you start bracket indexing at ordinary income). I'd think this is actually pretty easy for a CS student especially if they saw the solution above -- GPT4 struggled though as it doesn't really "reason" about code the way a human would and would generate solutions obviously wrong to a human.
Viewing a single comment thread. View all comments