erannare
erannare t1_ix44g1p wrote
Reply to [R] Tips on training Transformers by parabellum630
Dataset size is a BIG factor here. Transformers are very data hungry. They present a much larger hypothesis space and thus take a lot more data to train.
erannare t1_ixvt17s wrote
Reply to comment by sarmientoj24 in [D] Pytorch or TensorFlow for development and deployment? by CodaholicCorgi
TensorFlow has model optimization libraries such as, but not limited to: weight clustering, pruning, and weight quantization, as well as training support for these.