mlresearchoor

mlresearchoor t1_je1mvf7 wrote

OpenAI blatantly ignored the norm to not train on the ~200 tasks collaboratively prepared by the community for BIG-bench. GPT-4 knows the BIG-bench canary ID afaik, which removes the validity of GPT-4 eval on BIG-bench.

OpenAI is cool, but they genuinely don't care about academic research standards or benchmarks carefully created over years by other folks.

92

mlresearchoor t1_j6r8x7y wrote

nice find! would be helpful, as well, to compare with similar papers from 2022 that this paper cites, but did not compare to in results section

("We note that our work is concurrent with Chen et al. (2022) and Gao et al. (2022), both generating the reasoning chain in Python code and calling a Python interpreter to derive the answer. While we do not compare with them empirically since they are not yet published...")

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks (Chen)
https://arxiv.org/abs/2211.12588

PAL: Program-aided Language Models (Gao)
https://arxiv.org/abs/2211.10435

20

mlresearchoor t1_j1y8ijq wrote

Impressive applied ML results will come in healthcare, multimedia (e.g., video summarization), sustainability, efficient ML (e.g., TinyML), robotics (e.g., vision-language navigation), human-machine interaction, and more. It's important for our community to value research that uses smaller domain-specific datasets, as well as massive datasets.

But many of the greatest breakthroughs in the next decade will probably come from collaborations between those academic ML researchers and large industry labs.

2