Comments

You must log in or register to comment.

michaelthwan_ai OP t1_jdlf8g8 wrote

Because the recent release of LLMs has been too vigorous, I organized recent notable models from the news. Some may find the diagram useful, so please allow me to distribute it.

Please let me know if there is anything I should change or add so that I can learn. Thank you very much.

If you want to edit or create an issue, please use this repo.

---------EDIT 20230326

Thank you for your responses, I've learnt a lot. I have updated the chart:

Changes 20230326:

  • Added: OpenChatKit, Dolly and their predecessors
  • More high-res

To learn:

  • RWKV/ChatRWKV related, PaLM-rlhf-pytorch

Models that not considered (yet)

  • Models that is <= 2022 (e.g. T5 (2022May). This post is created to help people quickly gather information about new models)
  • Models that is not fully released yet (e.g. Bard, under limited review)
37

DarkTarantino t1_jdly8y7 wrote

If I wanted to create graphs like these for work what is that role called

2

light24bulbs t1_jdm04sx wrote

Are those it? Surely there's a bunch more notable open source ones?

9

Veggies-are-okay t1_jdmopy0 wrote

Does anyone have a good resource/video on the overview of these implementations? I don’t work much with language models but figure it might be good to understand where this is but I’m just running into the buzz feed-esque surface level nonsense on YouTube.

2

DigThatData t1_jdmv87n wrote

don't forget Dolly, the databricks model that was successfully instruct-finetuned on gpt-j-6b in 3 hours

4

DigThatData t1_jdmvjyb wrote

dolly is important precisely because the foundation model is old. they were able to get chatgpt level performance out of it and they only trained it for three hours. just because the base model is old doesn't mean this isn't recent research. it demonstrates:

  • the efficacy of instruct finetuning
  • that instruct finetuning doesn't require the worlds biggest most modern model or even all that much data

dolly isn't research from a year ago, it was only just described for the first time a few days ago.

EDIT: ok I just noticed you have an ERNIE model up there so this "no old foundation models" thing is just inconsistent.

5

tonicinhibition t1_jdn4v86 wrote

There's a YouTuber named Letitia, with a little Miss Coffee Bean character, who covers new models at a decent level.

CodeEmporium does a great job at introducing aspects of the GPT/ChatGPT architecture with increasing depth. Some of the videos have code.

Andrej Karpathy walks you through building GPT in code

As for the lesser known models, I just read the abstracts and skim the papers. It's a lot of the same stuff with slight variations.

6

Ph0masta t1_jdn5o16 wrote

Where does Google’s LAMDA fit on this chart?

3

big_ol_tender t1_jdoe9k1 wrote

Alpaca dataset is not open source so alpaca-lora is not open source.

2

ganzzahl t1_jdouip7 wrote

You're definitely missing the entire T5 (encoder-decoder) family of models. From the UL2 paper , it seems encoder-decoder models are more powerful than decoder-only models (such as the GPT family), especially if you're most interested in inference latency.

I do very much wonder if OpenAI has tested equally-sized T5 models, and if there's some secret reason they have found as to why we should stick with GPT models, or if they just are doubling down on "their" idea, even if it is slightly inferior. Or maybe there are newer papers I don't know about.

10