jeankaddour t1_ir66cik wrote on October 5, 2022 at 4:57 PM

Reply to comment by TheInfelicitousDandy in [R] Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging by rlresearcher

Thank you very much. This is extremely useful feedback and I appreciate your time spent on writing it! I will look into using the adapative-inputs LM on wiki103 the next time. I believe that bookcorpus + a wiki dump will likely not be in my computational budget, but I might try. Your guess of me being new to the LM literature and only wanting to use it as a testbed for optimization is right :) therefore, again, thanks for sharing your insights!

jeankaddour t1_irdvad7 wrote on October 7, 2022 at 9:00 AM

Thanks again for this feedback. I haven't trained on a different dataset yet, but I replaced all BERT perplexity numbers/plots with the MLM losses in the meantime. The paper has been updated today on Arxiv.