brates09
brates09 t1_irhz7ml wrote
Reply to comment by EmbarrassedHelp in [R] Google announces Imagen Video, a model that generates videos from text by Erosis
Are there examples of the recent big model work that haven’t been able to be replicated in terms of quality? Seems much more likely to attribute to conservatism of the companies rather than deception about the results.
brates09 t1_ix4xxas wrote
Reply to [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd
Autoregressive next token prediction is incredibly compute efficient. By using causally masked attention you can make a valid prediction for every token in a sequence with a single forward pass during training. I imagine this is a large part of why AR models eg GPT won out in popularity over masked token prediction models (eg BERT).