Viewing a single comment thread. View all comments

Southern-Trip-1102 t1_itwt1ac wrote

You might want to look into the BLOOM models on huggingface.

5

AuspiciousApple OP t1_itwvawz wrote

Thanks, I'll take a look. Have you played around with them yourself?

1

Southern-Trip-1102 t1_itwyp8o wrote

A bit, as far as I can tell they (the 176B one) are on par with gpt 3. Though I haven't done much testing or comparison. They are also trained on 13 programming and 59 languages from what i read.

4

AuspiciousApple OP t1_itx00gv wrote

Thanks! Even a qualitative subjective judgement of rough parity is quite encouraging. I might need deepspeed/etc. to get it to run on my 8GB GPU, but if it's even similar quality, that's very cool.

1

visarga t1_itwxzgs wrote

My experience is that models that have not had the instruction tuning treatment don't behave nice.

1

Southern-Trip-1102 t1_itwyur3 wrote

Could that be because of Bloom being trained on a more varied datasets as opposed to being focused on English, as it was trained on multiple languages and programming langs?

2