Viewing a single comment thread. View all comments

Cheap_Meeting t1_ix56uu6 wrote

>but the transformer is stuck and loss doesn't decrease after abt 20k steps.

Presumably they meant training loss, which would indicate that this is an optimization problem.

14

waa007 t1_ix7s5k7 wrote

Maybe, There is too little data and model overfit, mode parameter got stuck in locally optimal result, Is it possible?

0