Viewing a single comment thread. View all comments

ROFLLOLSTER t1_iupsskm wrote

> requires a workaround that is difficult to implement

What workaround? I've also been working with ESM and tried the 15B parameter variant. It seemed worse than the 3B in my tests, but maybe I just missed the problem?

2

timy2shoes t1_iuptv7y wrote

We had to do a workaround to fit the 15b parameter model on a p3.8xlarge instance.

> I've also been working with ESM and tried the 15B parameter variant.

Huh. We’ve noticed the same thing. Interesting that others are having the same problem.

2

Mister_Abc t1_iur4gme wrote

First author here. We've had some indication that the 15B model may be overfit. It seemed to sightly improve on a few important metrics (casp14) which is why we included it.

2