Hello! I'm trying to understand what available LLMs one can "relatively easily" play with. My goal is to understand the landscape since I haven't worked in this field before. I'm trying to run them "from the largest to the smallest".

By "relatively easy", I mean doesn't require to setup a GPU cluster or costs more than $20:)

Here are some examples I have found so far:

ChatGPT (obviously) - 175B params
OpenAI api to access GPT-3s (from ada (0.5B) to davinci (175B)). Also CodeX
Bloom (176B) - text window on that page seems to work reliably, you just need to keep pressing "generate"
OPT-175B (Facebook LLM), the hosting works surprisingly fast, but slower than ChatGPT
Several models on HuggingFace that I made to run with Colab Pro subscription: GPT-NeoX 20B, Flan-t5-xxl 11B, Xlm-roberta-xxl 10.7B, GPT-j 6B. I spent about $20 total on running the models below. None of the Hugging face API interfaces/spaces didn't work for me :(. Here is an example notebook I made for NeoX.

Does anyone know more models that are easily accessible?

P.S. Some large models I couldn't figure out (yet) how to run easily: Galactica-120b 120B Opt-30b 30B

Comments

gopher9 t1_j7cbdlg wrote on February 5, 2023 at 7:07 PM

#1,730,537

RWKV 14B, trained on The Pile.

Cheap_Meeting t1_j7chivx wrote on February 5, 2023 at 7:49 PM

#1,730,830

In terms of Consumer Apps, the Poe app from Quora has access to two models from Open AI and one from Anthropic.

Perplexity.ai, YouChat and Neeva are search engines that integrated LLMs.

Google has an AI + Search Event on Wednesday where they are likely to announce something as well.

In terms of APIs and getting a feeling for these models, I would use OpenAI's APIs. Their models are the best publically available models. Open Source models are still far behind.

Cheap_Meeting t1_j7cit9i wrote on February 5, 2023 at 7:58 PM

#1,730,890

Replying to MysteryInc152 (#1,730,137)

Are any benchmark scores such as MMLU or BigBench available for Aleph Alpha's models?

m98789 t1_j7ffidp wrote on February 6, 2023 at 12:11 PM

#1,734,850

Replying to gopher9 (#1,730,537)

 (final release around Feb-15-2023):

gopher9 t1_j7fq4kw wrote on February 6, 2023 at 1:53 PM

#1,735,344

Replying to m98789 (#1,734,850)

With RWKV-4-Pile-14B-20230204-7324.pth released 2 hours ago, as you can see at https://huggingface.co/BlinkDL/rwkv-4-pile-14b/tree/main.

But yeah, it's still WIP.

MysteryInc152 t1_j7fymgb wrote on February 6, 2023 at 2:58 PM

#1,735,727

Replying to Cheap_Meeting (#1,730,890)

don't think so

CriticalTemperature1 t1_j7fzacn wrote on February 6, 2023 at 3:03 PM

#1,735,757

Google has their AI Test Kitchen for LaMDA

mrpogiface t1_j7g03gj wrote on February 6, 2023 at 3:09 PM

#1,735,790

Do we actually know that chatGPT is the full 175B? With codex being 13B and still enormously powerful, and previous instruction tuned models (in the paper) being 6.7B it seems likely that they have it working on a much smaller parameter count

MysteryInc152 t1_j7g83pw wrote on February 6, 2023 at 4:04 PM

#1,736,122

Replying to Cheap_Meeting (#1,730,830)

GLM-130B is really really good. https://crfm.stanford.edu/helm/latest/?group=core_scenarios

I think some instruction tuning is all it needs to match the text-davinci models

NoLifeGamer2 t1_j7geyw5 wrote on February 6, 2023 at 4:50 PM

#1,736,438

I love how bloom was just like "F*ck it let's one-up openAI"

sinavski OP t1_j7gi1xs wrote on February 6, 2023 at 5:10 PM

#1,736,578

Replying to NoLifeGamer2 (#1,736,438)

Yeah, I think its a just like a 1B MLP with random weights not connected to any outputs:)

NoLifeGamer2 t1_j7gin1l wrote on February 6, 2023 at 5:13 PM

#1,736,605

Replying to sinavski (#1,736,578)

Honestly wouldn't be surprised lol

danysdragons t1_j7gt2ak wrote on February 6, 2023 at 6:19 PM

#1,737,061

Replying to Cheap_Meeting (#1,730,830)

To pre-empt possible confusion by people wanting to try YouChat, its URL is you.com/chat, while youchat.com is an unrelated messaging service.

yaosio t1_j7gtm5q wrote on February 6, 2023 at 6:23 PM

#1,737,087

I've been trying out you.com's chatbot and it seems to work well, sometimes. It has the same problem ChatGPT has with just making stuff up, but it provides sources (real and imagined) so if it lies you can actually check. I asked it what Todd Howard's favorite cake it and it gave me an authorative answer without a source, and when I asked for a source it gave me a Gamerant link that didn't exist. When it does provide a source it notates it like Wikipedia. It also can access the Internet as it was able to tell me about events that happened in the last 24 hours.

It's able to produce code, and you can have a conversation with it but it really prefers to give information from the web whenever possible. It won't tell me what model they use, it could be their own proprietary model. They also have Stable Diffusion, and a text generator but I don't know what model that is.

Chatbot: https://you.com/search?q=who+are+you&tbm=youchat&cfr=chat

Stable Diffusion: https://you.com/search?q=python&fromSearchBar=true&tbm=imagine

Text generator: https://you.com/search?q=python&fromSearchBar=true&tbm=youwrite

visarga t1_j7hhqvc wrote on February 6, 2023 at 8:55 PM

#1,738,224

Replying to sinavski (#1,736,578)

Does Bloom do tasks? is it well behaved?

Cheap_Meeting t1_j7j70tj wrote on February 7, 2023 at 4:17 AM

#1,741,374

Replying to MysteryInc152 (#1,736,122)

That's not my takeway. GLM-130B is even behind OPT according to the mean win rate, and the instruction tuned version of OPT in turn is worse than FLAN-T5 which is a 10x smaller model (https://arxiv.org/pdf/2212.12017.pdf Table 14)

MysteryInc152 t1_j7ja39c wrote on February 7, 2023 at 4:45 AM

#1,741,505

Replying to Cheap_Meeting (#1,741,374)

I believe the fine-tuning dataset matters as well as the model but I guess we'll see. I think they plan on fine-tuning.

The set used to tune OPT doesn't contain any chain of thought.

farmingvillein t1_j7jboe1 wrote on February 7, 2023 at 5:00 AM

#1,741,573

Replying to visarga (#1,738,224)

bloom is pretty terrible, unfortunately

xeneks t1_j7ki4qg wrote on February 7, 2023 at 1:36 PM

#1,743,406

I am looking at parametric search, where I can highlight in a graph-database style way, the mistakes with the results, by reassigning weights or links, to redo the search, until I get answers that are more correct, based off things like 'water isn't useful for cleaning dried paint, acetone or paint thinners may be more useful'. Is it possible to build such features into any of the open source tools here, or are lacking any gui for the feedback, beyond text and a thumb up or down as one sees in the commercial packages?

lostmsu t1_j7mia4m wrote on February 7, 2023 at 9:38 PM

#1,747,643

I would love to see comparison of these models on some common tasks.

Taenk t1_j8nfwkh wrote on February 15, 2023 at 4:30 PM

#1,818,866

Comprehensive list of LLMs.

[D] List of Large Language Models to play with.