Submitted by begooboi t3_119zmpd in deeplearning
OnceReturned t1_j9rb03o wrote
Reply to comment by suflaj in Why bigger transformer models are better learners? by begooboi
>We're talking about physically impossible number of parameters here, which will require solutions radically different that simple matrix multiplication and nonlinear activations.
Solutions for what, exactly? Memorizing the entire internet (or entire training set, but still)?
Viewing a single comment thread. View all comments