Submitted by parabellum630 t3_z088fo in MachineLearning
parabellum630 OP t1_ix4fc49 wrote
Reply to comment by ChangingHats in [R] Tips on training Transformers by parabellum630
Thank you!! I was experimenting with off-the-shelf implementation with little customization. I am using the transformer in an encoder fashion with 800 hidden dimensions due to the constraints of other models surrounding it. I will try out varying all these hyper parameters. Looks like it's going to be a long week.
Viewing a single comment thread. View all comments