fmai

fmai t1_j9sns4a wrote

IMO, even if ML researchers assigned only an 0.1% chance to AI wiping out humanity, the cost of that happening is so unfathomably large that it would only be rational to shift a lot of resources from AI capability research to AI safety in order to drive that probability down.

If you tell people that society needs to do a lot less of the thing that is their job, it's no surprise they dismiss your arguments. The same applies to EY to some extent; I think it would be more reasonable to allow for a lot more uncertainty on his predictions, but would he then have the same influence?

Rather than giving too much credit to expert opinions, it's better to look at the evidence from all sides directly. You seem already be doing that, though :-)

2

fmai t1_j6hjauf wrote

GPT-3 ranks relatively low on SuperGLUE because it was not finetuned on the SuperGLUE tasks, whereas T5, etc. were. The amazing feat about GPT-3 is that you can reach impressive performance with just few-shot prompting, which was unknown before.

As to your questions:

  1. AFAIK, OpenAI hasn't published any numbers themselves and nobody outside of OpenAI has API access to ChatGPT yet, making it difficult to assess its performance on often thousands of examples from a benchmark. So, no, so far the performance improvement hasn't been quantified.

  2. No, there is no quantitative analysis. Most people seem to agree that, anecdotally, ChatGPT seems to hallucinate far less than GPT-3. But you can definitely get ChatGPT to generate bullshit if you keep digging, so it's far from perfect. Depending on what story you want to tell, some people will emphasize one or the other. Take it all with a grain of salt until we get solid numbers.

  3. AFAIK, LLMs are fantastic at closed-book question answering, where you're not allowed to look at external resources. I think a T5 based model was the first to show that it can answer trivia questions well from knowledge stored in the model parameters only. For open-book QA you will need to augment the LLM with some retrieval mechanism (which ChatGPT doesn't have yet), and therefore you can expect other models to be much better in this regard.

9

fmai t1_irhy6dc wrote

  1. Presentation type is usually not based on reviewer score, but rather the type of paper you're presenting.

  2. 99% of your resume boost through this achievement comes from the fact that you have a first author publication at EMNLP. When you apply to jobs in the future, nobody will care if you actually presented in person or online. It only matters for the connections you make at the conference itself - which are way more important at later stages of your career, not so much as an undergrad.

  3. For things like applications to PhD programs your GPA still matters. Take that into consideration if you think your exam performance might suffer as a result of attending EMNLP in person.

  4. The in-person EMNLP will certainly be large, but I also suspect a sizable online crowd as well, because Abu Dhabi is far from Europe, China, and NA, and has received a lot of criticism for being an LGBTQ-unfriendly place. While it's easier offline, you certainly can make a lot of connections during online events as well. I especially found poster sessions in GatherTown to be fruitful. If you study the program carefully to attend virtual posters of researchers you want to meet, it's a viable way of networking.

6