Viewing a single comment thread. View all comments

Small-Reason-8096 t1_irr6g0q wrote

Hands down the best paper I have ever read (and reimplemented) is the ResNets paper:

https://arxiv.org/abs/1512.03385

The descriptions are clear and concise - but with enough detail to reimplement in whatever framework you like. Also, OOTB the results I got on CIFAR10 matched the paper pretty much perfectly (not always a given!).

Another good paper to try is AWD-LSTM: https://arxiv.org/pdf/1708.02182.pdf

Basically, if you are implementing and training from scratch, focus on something you can train with a smallish dataset in a reasonable period of time. I would generally steer away from LLMs and object detection / segmentation models as they require more resources to train that are commonly available!

22

TheInfelicitousDandy t1_irsfw1a wrote

I've tried to reimplement AWD-LSTM in pytorch > 1. and have never been able to get close to the original results. I've also seen other people try and not get close. Pretty sure it has to do with the weight dropout they used.

If anyone knows of any pytorch > 1. version that achieves the same PPL on PTB/Wiki02 I'd very much like to know.

3

Small-Reason-8096 t1_irzvwc8 wrote

That surprises me as there was a good Fastai version:

https://docs.fast.ai/text.models.awdlstm.html

which is built on pytorch. When I played with it ages ago the results seemed comparable to the paper, but I haven't revisited it for a while :)

1