avocadoughnut
avocadoughnut t1_j9a64k1 wrote
Reply to comment by gliptic in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Yup. I'd recommend using whichever RWKV model that can be fit with fp16/bf16. (apparently 8bit is 4x slower and lower accuracy) I've been running GPT-J on a 24GB gpu for months (longer contexts possible using accelerate) and I noticed massive speed increases when using fp16 (or bf16? don't remember) rather than 8bit.
avocadoughnut t1_j8p3psq wrote
Reply to comment by redv in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
He has trained several smaller RWKV models. You can find them on huggingface
avocadoughnut t1_j7yaq8w wrote
Reply to comment by Sm0oth_kriminal in [D] Using LLMs as decision engines by These-Assignment-936
I'm considering a higher level idea. There's no way that transformers are the end-all-be-all model architecture. By identifying the mechanisms that large models are learning, I'm hoping a better architecture can be found that reduces the total number of multiplications and samples needed for training. It's like feature engineering.
avocadoughnut t1_j7xvd0p wrote
Reply to comment by currentscurrents in [D] Using LLMs as decision engines by These-Assignment-936
Makes me wonder if pretraining makes the model converge on essentially a more efficient architecture that we could be using instead. I'm hoping this thought has already been explored, it would be interesting to read about.
avocadoughnut t1_j4n8bp2 wrote
Reply to comment by LetGoAndBeReal in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws
Well, there are projects like WebGPT (by OpenAI) that make use of external knowledge sources. I personally think that's the future of these models: moderated databases of documents. The knowledge is much more interpretable and modifiable that way.
avocadoughnut t1_j4n5sp8 wrote
Reply to comment by LetGoAndBeReal in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws
From what I've heard, they want a model small enough to run on consumer hardware. I don't think that's currently possible (probably not enough knowledge capacity). But I haven't heard that a decision has been made on this end. The most important part of the project at the moment is crowdsourcing good data.
avocadoughnut t1_j4mci2y wrote
Reply to comment by Acceptable-Cress-374 in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws
ChatGPT is GPT3 + instructional finetuning + RLHF for alignment. If you're talking about using those models ro gather training data, that's against OpenAI TOS, so I've heard. The goal is to make something that isn't closed source, something you can run yourself.
avocadoughnut t1_j4m12v2 wrote
There's currently a project in progress called OpenAssistant. It's being organized by Yannic Kilcher and some LAION members, to my understanding. Their current goal is to develop interfaces to gather data, and then train a model using RLHF. You can find a ton of discussion in the LAION discord. There's a channel for this project.
avocadoughnut t1_ja35pg6 wrote
Reply to comment by visarga in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico
There's risk of breaking OpenAI TOS by training on their models. It's a hard no for this project to ensure legal safety.