Small-Reason-8096 t1_irr6g0q wrote on October 10, 2022 at 12:44 PM

Hands down the best paper I have ever read (and reimplemented) is the ResNets paper:

The descriptions are clear and concise - but with enough detail to reimplement in whatever framework you like. Also, OOTB the results I got on CIFAR10 matched the paper pretty much perfectly (not always a given!).

Another good paper to try is AWD-LSTM: https://arxiv.org/pdf/1708.02182.pdf

Basically, if you are implementing and training from scratch, focus on something you can train with a smallish dataset in a reasonable period of time. I would generally steer away from LLMs and object detection / segmentation models as they require more resources to train that are commonly available!

TheInfelicitousDandy t1_irsfw1a wrote on October 10, 2022 at 6:09 PM

I've tried to reimplement AWD-LSTM in pytorch > 1. and have never been able to get close to the original results. I've also seen other people try and not get close. Pretty sure it has to do with the weight dropout they used.

If anyone knows of any pytorch > 1. version that achieves the same PPL on PTB/Wiki02 I'd very much like to know.

Small-Reason-8096 t1_irzvwc8 wrote on October 12, 2022 at 8:01 AM

That surprises me as there was a good Fastai version:

https://docs.fast.ai/text.models.awdlstm.html

which is built on pytorch. When I played with it ages ago the results seemed comparable to the paper, but I haven't revisited it for a while :)

TheInfelicitousDandy t1_is0ajet wrote on October 12, 2022 at 11:22 AM

As far as I know that version doesn't give comparable PPL.

Someone else saying the same https://github.com/salesforce/awd-lstm-lm/issues/86#issuecomment-453266265

A major issue here (and for other reproductions) are people saying they have a reproduction because they can run it without errors but never actually getting the same results.