Viewing a single comment thread. View all comments

timy2shoes t1_iuptv7y wrote

We had to do a workaround to fit the 15b parameter model on a p3.8xlarge instance.

> I've also been working with ESM and tried the 15B parameter variant.

Huh. We’ve noticed the same thing. Interesting that others are having the same problem.

2

Mister_Abc t1_iur4gme wrote

First author here. We've had some indication that the 15B model may be overfit. It seemed to sightly improve on a few important metrics (casp14) which is why we included it.

2