cegras
cegras t1_je0jsud wrote
Reply to comment by MrFlamingQueen in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Do you know if ChatGPT was allowed to ingest PDFs found on the internet? Even if not, I'm sure there are many sections of famous textbooks reproduced in HTML or parsable form.
cegras t1_je0gfd7 wrote
Reply to comment by rfxap in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
If you google most leetcode problems I would bet a coffee that they've existed on the internet long before leetcode came into existence.
cegras t1_je0g90p wrote
Reply to comment by mrpickleby in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
How does the AI perform any better than a Google search? I'd say the AI is even more dangerous as it gives a single, authoritative sounding answer that you have to go to Google and secondary sources to verify anyways!
cegras t1_jdw8hde wrote
Reply to comment by Majestic_Food_4190 in [D] GPT4 and coding problems by enryu42
Well,
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
> As further evidence for this hypothesis, we tested it on Codeforces problems from different times in 2021. We found that it could regularly solve problems in the easy category before September 5, but none of the problems after September 12.
cegras t1_jdta9mj wrote
Reply to comment by pengo in [D] GPT4 and coding problems by enryu42
More like, the ability to know that 'reversing a linked list' and 'linked list cycle and traversal problems' are the same concepts but different problems, and to separate those into train/test. Clearly they haven't figured that out because ChatGPT is contaminated, and their (opaquely disclosed) ways of addressing that issue don't seem adequate at all.
cegras t1_jdsd89g wrote
Reply to [D] GPT4 and coding problems by enryu42
I don't see how it is possible to not end up just memorizing the internet, which is full of enough questions and discussions to simulate convincing Q&As. Consider if a team had invented an algorithm or heuristic to avoid data contamination (https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks). Then what you have is something that can separate content into logically similar, but orthogonal realizations. That would be an incredibe tool and worth a prize in its own right.
cegras t1_jdscwdv wrote
Reply to comment by ghostfaceschiller in [D] GPT4 and coding problems by enryu42
You mean, like continuously refining your google searches until you find the right stackexchange answer?
cegras t1_je2k9dr wrote
Reply to comment by TheEdes in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
ChatGPT is great at learning the nuances of english, i.e. synonyms and metaphors. But if you feed it a reworded leetcode question and it finds the answer within its neural net, has it learned to conceptualize? No, it just learned that synonym ...