drivanova
drivanova t1_ix9vpi7 wrote
Reply to comment by fasttosmile in [R] Tips on training Transformers by parabellum630
that + decent lr scheduler, e.g. linear ramp up + exponential/cosine annealing
drivanova t1_ix9rha7 wrote
Reply to comment by drivanova in [R][D] Reading ML Papers - Workflow/Advice by EndlessRevision
PS: another thing I personally often do for papers from big conferences (ICML, Neurips etc) is watch the authors present their work on slideslive.com (most post pandemic papers have videos!). This is usually helpful to understand the motivation, high level ideas and key experiment results.
drivanova t1_ix7ia0q wrote
I think the way you read papers depends on the subfield.
For example, computer vision papers (papers that go into CVPR, ICCV etc) tend to be more empirical, meaning that you may want to spend more time on the experiments, watching out for potential failure cases, asking yourself if baselines considered are appropriate and in line with what people in the field do.
For more theory-oriented papers (AISTATS, ICML etc), I'd spend more time on the method, understanding assumptions and proofs of key results.
To familiarise myself with a paper and related work, I tend to use connected papers (https://www.connectedpapers.com) -- I find it super useful when getting into a subfield that's not exactly my area of research.
HTH
drivanova t1_ixw8w0v wrote
Reply to [D] Pytorch or TensorFlow for development and deployment? by CodaholicCorgi
PyTorch.
Datapoint 1: it’s part of the Linux foundation
Datapoint 2: Jax
Disclosure: I haven’t used TensorFlow.