Franck_Dernoncourt

Franck_Dernoncourt t1_jc4tdft wrote

https://crfm.stanford.edu/2023/03/13/alpaca.html:

> We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited. There are three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial license, so we necessarily inherit this decision. Second, the instruction data is based OpenAI’s text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI. Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.

Why only academic research and not industry research? I don't see where that limitation comes from in their 3 factors.

4

Franck_Dernoncourt t1_j9v5wwh wrote

Why SOTA? Did they compare against GPT 3.5? Only comparison against GPT 3.5 I found in the LLaMA paper was:

> Despite the simplicity of the instruction finetuning approach used here, we reach 68.9% on MMLU. LLaMA-I (65B) outperforms on MMLU existing instruction finetuned models of moderate sizes, but are still far from the state-of-the-art, that is 77.4 for GPT code-davinci-002 on MMLU (numbers taken from Iyer et al. (2022)).

3

Franck_Dernoncourt t1_j6ydkiu wrote

> I was surprised at how much better GPT3 davinci 003 performed compared to AI21's 178B model. AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.

on which tasks?

> Of course, I didn't expect the smaller models to be on par with GPT-3

You could read Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto. Benchmarking Large Language Models for News Summarization. arXiv:2301.13848.:

> we find instruction tuning, and not model size, is the key to the LLM’s zero-shot summarization capability

6