jeankaddour t1_ir66cik wrote
Reply to comment by TheInfelicitousDandy in [R] Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging by rlresearcher
Thank you very much. This is extremely useful feedback and I appreciate your time spent on writing it! I will look into using the adapative-inputs LM on wiki103 the next time. I believe that bookcorpus + a wiki dump will likely not be in my computational budget, but I might try. Your guess of me being new to the LM literature and only wanting to use it as a testbed for optimization is right :) therefore, again, thanks for sharing your insights!
jeankaddour t1_irdvad7 wrote
Thanks again for this feedback. I haven't trained on a different dataset yet, but I replaced all BERT perplexity numbers/plots with the MLM losses in the meantime. The paper has been updated today on Arxiv.
Viewing a single comment thread. View all comments