dancingnightly t1_jadj7fa wrote
Reply to comment by Beli_Mawrr in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
Edit: Seems like for this one yes. They do consider human instructions (similarish to the goal of a RLHF which requires more RAM), by adding them directly in the text dataset, as mentioned in 3.3 Language-Only Instruction Tuning-
For other models, like OpenAssistant coming up, one thing to note is that, although the generative model itself may be runnable locally, the reward model (the bit that "adds finishing touches" and ensures following instructions) can be much bigger. Even if the GPT-J underlying model is 11GB on RAM and 6B params, the RLHF could seriously increase that.
This models is in the realm of the smaller T5, BART and GPT-2 models released 3 years ago and runnable then on decent gaming GPUs
currentscurrents t1_jaetyg1 wrote
Can't the reward model be discarded at inference time? I thought it was only used for fine-tuning.
[deleted] t1_jaejynm wrote
[removed]
Viewing a single comment thread. View all comments