avocadoughnut t1_ja35pg6 wrote on February 26, 2023 at 2:32 PM

Reply to comment by visarga in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico

There's risk of breaking OpenAI TOS by training on their models. It's a hard no for this project to ensure legal safety.

avocadoughnut t1_j9a64k1 wrote on February 20, 2023 at 12:49 PM

Reply to comment by gliptic in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics

Yup. I'd recommend using whichever RWKV model that can be fit with fp16/bf16. (apparently 8bit is 4x slower and lower accuracy) I've been running GPT-J on a 24GB gpu for months (longer contexts possible using accelerate) and I noticed massive speed increases when using fp16 (or bf16? don't remember) rather than 8bit.

avocadoughnut t1_j8p3psq wrote on February 15, 2023 at 10:55 PM

Reply to comment by redv in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

He has trained several smaller RWKV models. You can find them on huggingface

avocadoughnut t1_j7yaq8w wrote on February 10, 2023 at 7:00 AM

Reply to comment by Sm0oth_kriminal in [D] Using LLMs as decision engines by These-Assignment-936

I'm considering a higher level idea. There's no way that transformers are the end-all-be-all model architecture. By identifying the mechanisms that large models are learning, I'm hoping a better architecture can be found that reduces the total number of multiplications and samples needed for training. It's like feature engineering.

avocadoughnut t1_j7xvd0p wrote on February 10, 2023 at 4:18 AM

Reply to comment by currentscurrents in [D] Using LLMs as decision engines by These-Assignment-936

Makes me wonder if pretraining makes the model converge on essentially a more efficient architecture that we could be using instead. I'm hoping this thought has already been explored, it would be interesting to read about.

avocadoughnut t1_j4n8bp2 wrote on January 16, 2023 at 10:21 PM

Reply to comment by LetGoAndBeReal in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

Well, there are projects like WebGPT (by OpenAI) that make use of external knowledge sources. I personally think that's the future of these models: moderated databases of documents. The knowledge is much more interpretable and modifiable that way.

avocadoughnut t1_j4n5sp8 wrote on January 16, 2023 at 10:05 PM

Reply to comment by LetGoAndBeReal in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

From what I've heard, they want a model small enough to run on consumer hardware. I don't think that's currently possible (probably not enough knowledge capacity). But I haven't heard that a decision has been made on this end. The most important part of the project at the moment is crowdsourcing good data.

avocadoughnut t1_j4mci2y wrote on January 16, 2023 at 7:01 PM

Reply to comment by Acceptable-Cress-374 in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

ChatGPT is GPT3 + instructional finetuning + RLHF for alignment. If you're talking about using those models ro gather training data, that's against OpenAI TOS, so I've heard. The goal is to make something that isn't closed source, something you can run yourself.

avocadoughnut t1_j4m12v2 wrote on January 16, 2023 at 5:51 PM

Reply to [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

There's currently a project in progress called OpenAssistant. It's being organized by Yannic Kilcher and some LAION members, to my understanding. Their current goal is to develop interfaces to gather data, and then train a model using RLHF. You can find a ton of discussion in the LAION discord. There's a channel for this project.