brates09 t1_ix4xxas wrote on November 20, 2022 at 8:00 PM

Reply to [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Autoregressive next token prediction is incredibly compute efficient. By using causally masked attention you can make a valid prediction for every token in a sequence with a single forward pass during training. I imagine this is a large part of why AR models eg GPT won out in popularity over masked token prediction models (eg BERT).

brates09 t1_irhz7ml wrote on October 8, 2022 at 9:09 AM

Reply to comment by EmbarrassedHelp in [R] Google announces Imagen Video, a model that generates videos from text by Erosis

Are there examples of the recent big model work that haven’t been able to be replicated in terms of quality? Seems much more likely to attribute to conservatism of the companies rather than deception about the results.