Submitted by parabellum630 t3_z088fo in MachineLearning
Cheap_Meeting t1_ix56uu6 wrote
Reply to comment by erannare in [R] Tips on training Transformers by parabellum630
>but the transformer is stuck and loss doesn't decrease after abt 20k steps.
Presumably they meant training loss, which would indicate that this is an optimization problem.
waa007 t1_ix7s5k7 wrote
Maybe, There is too little data and model overfit, mode parameter got stuck in locally optimal result, Is it possible?
Viewing a single comment thread. View all comments