Comments

You must log in or register to comment.

michaelthwan_ai OP t1_jdlf8g8 wrote

Because the recent release of LLMs has been too vigorous, I organized recent notable models from the news. Some may find the diagram useful, so please allow me to distribute it.

Please let me know if there is anything I should change or add so that I can learn. Thank you very much.

If you want to edit or create an issue, please use this repo.

---------EDIT 20230326

Thank you for your responses, I've learnt a lot. I have updated the chart:

Changes 20230326:

  • Added: OpenChatKit, Dolly and their predecessors
  • More high-res

To learn:

  • RWKV/ChatRWKV related, PaLM-rlhf-pytorch

Models that not considered (yet)

  • Models that is <= 2022 (e.g. T5 (2022May). This post is created to help people quickly gather information about new models)
  • Models that is not fully released yet (e.g. Bard, under limited review)
37

gopher9 t1_jdlq1jy wrote

Add RWKV.

21

Puzzleheaded_Acadia1 t1_jdlx1g3 wrote

What is RWKV?

5

Rejg t1_jdmdspx wrote

I think you are potentially missing Claude 1.0 and Claude 1.2, the Co:Here Suite, and Google Flan models.

13

ganzzahl t1_jdouip7 wrote

You're definitely missing the entire T5 (encoder-decoder) family of models. From the UL2 paper , it seems encoder-decoder models are more powerful than decoder-only models (such as the GPT family), especially if you're most interested in inference latency.

I do very much wonder if OpenAI has tested equally-sized T5 models, and if there's some secret reason they have found as to why we should stick with GPT models, or if they just are doubling down on "their" idea, even if it is slightly inferior. Or maybe there are newer papers I don't know about.

10

signed7 t1_jdqm8lt wrote

I'm probably wrong but I think I read somewhere Google has a patent on encoder-decoder, thus everyone else uses decoder-only

2

addandsubtract t1_jdlvmm6 wrote

Where does GPT-J and dolly fall into this?

18

wywywywy t1_jdlz40b wrote

GPT-J & GPT-Neo are predecessors of GPT-NeoX 20b

14

michaelthwan_ai OP t1_jdlzwyi wrote

Sure I think it is clear enough to show parents of recent model (instead of their grand grand grand parents..

If people want, I may consider to make a full one (including older one)

8

wywywywy t1_jdm16va wrote

In my opinion, it'd be better to include only the currently relevant ones rather than everything under the sun.

Too much noise makes the chart less useful.

9

michaelthwan_ai OP t1_jdlztvv wrote

It is a good model but it's about one year ago, and not related to recent released LLM. Therefore I didn't add (otherwise a tons of good models).
For dolly, it is just ytd. I didn't have full info of it yet

2

addandsubtract t1_jdm1d9h wrote

Ok, no worries. I'm just glad there's a map to guide the madness going on, atm. Adding legacy models would be good for people who come across them now, to know that they are legacy.

6

DigThatData t1_jdmvjyb wrote

dolly is important precisely because the foundation model is old. they were able to get chatgpt level performance out of it and they only trained it for three hours. just because the base model is old doesn't mean this isn't recent research. it demonstrates:

  • the efficacy of instruct finetuning
  • that instruct finetuning doesn't require the worlds biggest most modern model or even all that much data

dolly isn't research from a year ago, it was only just described for the first time a few days ago.

EDIT: ok I just noticed you have an ERNIE model up there so this "no old foundation models" thing is just inconsistent.

5

light24bulbs t1_jdm04sx wrote

Are those it? Surely there's a bunch more notable open source ones?

9

michaelthwan_ai OP t1_jdm3v2y wrote

Please suggest so.

6

michaelthwan_ai OP t1_jdpxtp1 wrote

Open alternative -> added most and so is in TODO (e.g. palm)
OpenChatKit -> added
Instruct-GPT -> seems it's not a released model but plan.

1

philipgutjahr t1_jdn95o8 wrote

for completeness, you should also add all those proprietary models: Megatron-Turing (530B, NVIDIA), Gopher (280B, Google), Chinchilla (70B, DeepMind) and Chatgenie (WriteCream)

2

michaelthwan_ai OP t1_jdpy06p wrote

I only include recent LLM (Feb/Mar 2023) (that is the LLMs usually at the bottom) and 2-factor predecessors (parent/grandparent). See if your mentioned one is related to them.

1

DigThatData t1_jdmv87n wrote

don't forget Dolly, the databricks model that was successfully instruct-finetuned on gpt-j-6b in 3 hours

4

Ph0masta t1_jdn5o16 wrote

Where does Google’s LAMDA fit on this chart?

3

DarkTarantino t1_jdly8y7 wrote

If I wanted to create graphs like these for work what is that role called

2

ZestyData t1_jdmdjrd wrote

Well.. you can just create these graphs if its important for your current task.

There isn't a role called "Chief graph maker" who makes graphs for people when they need them.

9

Veggies-are-okay t1_jdmopy0 wrote

Does anyone have a good resource/video on the overview of these implementations? I don’t work much with language models but figure it might be good to understand where this is but I’m just running into the buzz feed-esque surface level nonsense on YouTube.

2

tonicinhibition t1_jdn4v86 wrote

There's a YouTuber named Letitia, with a little Miss Coffee Bean character, who covers new models at a decent level.

CodeEmporium does a great job at introducing aspects of the GPT/ChatGPT architecture with increasing depth. Some of the videos have code.

Andrej Karpathy walks you through building GPT in code

As for the lesser known models, I just read the abstracts and skim the papers. It's a lot of the same stuff with slight variations.

6

michaelthwan_ai OP t1_jdpy5dy wrote

Thanks for the sharing above!

My choice is yk - Yannic Kilcher. Some "AI News" videos is a brief introduction and he sometimes go through certain papers in details. Very insightful!

1

big_ol_tender t1_jdoe9k1 wrote

Alpaca dataset is not open source so alpaca-lora is not open source.

2