fundamental_entropy t1_jb4bu9u wrote on March 6, 2023 at 9:30 AM

Reply to LLaMA model parallelization and server configuration by ChristmasInOct

For the first question , we are moving towards not having to design such pipelines , ideally we will have a library which will do the model sharding or parallel computation for us. Look at parallelformers which worked for some big models(11B) i tried. Why i think this is going to happen is , 3 years back distributed training used to be a big black box, horovod, pytorch distributed training and TPUs are the only solution but right now no one designs such peipelines anymore ,everyone uses deepspeed. It has implementations of all known techniques(zero , cpu offloading etc). So if you are not one of these computation/data engineers , i suggest to watch out for such libraries.

fundamental_entropy t1_jasqy64 wrote on March 3, 2023 at 8:12 PM

Reply to comment by average-joee in What do you recommend for a text summarization task? by average-joee

Flan models are trained in almost every open dataset available in Generic English tasks. Recent research suggests models trained to perform multiple tasks (in fact ratios of different tasks too affect see flan 2022 paper) are better than models trained only on a given task. Flan T5 beats T5 in almost every task and sometimes Flan T5 XXL matches gpt3 type of prompt generation.

fundamental_entropy t1_jasohit wrote on March 3, 2023 at 7:55 PM

Reply to What do you recommend for a text summarization task? by average-joee

Fine-tuning flan T5 xl or XXL can give you decent results. From my experience these are best open source models to fine-tune on . However they won't match results of larger models like gpt 3.5. But if you have millions of such reviews then chatgpt or gpt 3.5 may not be financially feasible.