Submitted by _underlines_ t3_zstequ in MachineLearning
Bartmoss t1_j1aqxem wrote
I think playing around with a nice encoder-decoder like T5 is a great start. Trying the original model is already nice, the newer flan-t5 can be better for some few shot tasks. The base models are already pretty good. Even the small models perform pretty well. I haven't tried the t5-tiny yet, but it is on my list to play with.
Of course if you have specific tasks in respect to generating texts, you could do some fine-tuning of T5. You can even use the same model for fine-tuning on several tasks with different prompts. I have found that for some tasks (especially where a sequence-to-sequence model have advantages), a fine-tuned T5 (or some variant thereof) can beat a zero, few, or even fine-tuned GPT-3 model.
It can be suprising what such encoder-decoder models can do with prompt prefixes, and few shot learning and can be a good starting point to play with large language models.
Viewing a single comment thread. View all comments