Viewing a single comment thread. View all comments

learn-deeply t1_j2u53ek wrote

My unsubstantiated hypothesis: BLOOM is severely undertrained, so most neurons aren't contributing at all to the final result compared to OPT-175.

13