Submitted by parabellum630 t3_z088fo in MachineLearning
erannare t1_ix44g1p wrote
Dataset size is a BIG factor here. Transformers are very data hungry. They present a much larger hypothesis space and thus take a lot more data to train.
Cheap_Meeting t1_ix56uu6 wrote
>but the transformer is stuck and loss doesn't decrease after abt 20k steps.
Presumably they meant training loss, which would indicate that this is an optimization problem.
waa007 t1_ix7s5k7 wrote
Maybe, There is too little data and model overfit, mode parameter got stuck in locally optimal result, Is it possible?
Viewing a single comment thread. View all comments