Submitted by 00001746 t3_1244q71 in MachineLearning
keepthepace t1_jdzvxl2 wrote
Maybe I am stubborn but I haven't totally digested the "bitter lesson" and I am not sure I agree in its inevitability. Transformers did not appear magically out of nowhere, they were a solution to RNN's venishing gradient problem. AlphaGo had to be put into a min-max montecarlo search to do anything good, and it is hard to not feel that LLMs grounding issues may be a problem to solve with architecture changes rather than scale.
Viewing a single comment thread. View all comments