Viewing a single comment thread. View all comments

erannare t1_ix44g1p wrote

Dataset size is a BIG factor here. Transformers are very data hungry. They present a much larger hypothesis space and thus take a lot more data to train.

44

Cheap_Meeting t1_ix56uu6 wrote

>but the transformer is stuck and loss doesn't decrease after abt 20k steps.

Presumably they meant training loss, which would indicate that this is an optimization problem.

14

waa007 t1_ix7s5k7 wrote

Maybe, There is too little data and model overfit, mode parameter got stuck in locally optimal result, Is it possible?

0