drivanova t1_ixw8w0v wrote on November 26, 2022 at 9:33 PM

Reply to [D] Pytorch or TensorFlow for development and deployment? by CodaholicCorgi

PyTorch.

Datapoint 1: it’s part of the Linux foundation

Datapoint 2: Jax

Disclosure: I haven’t used TensorFlow.

drivanova t1_ix9vpi7 wrote on November 21, 2022 at 9:17 PM

Reply to comment by fasttosmile in [R] Tips on training Transformers by parabellum630

that + decent lr scheduler, e.g. linear ramp up + exponential/cosine annealing

drivanova t1_ix9rha7 wrote on November 21, 2022 at 8:50 PM

Reply to comment by drivanova in [R][D] Reading ML Papers - Workflow/Advice by EndlessRevision

PS: another thing I personally often do for papers from big conferences (ICML, Neurips etc) is watch the authors present their work on slideslive.com (most post pandemic papers have videos!). This is usually helpful to understand the motivation, high level ideas and key experiment results.

drivanova t1_ix7ia0q wrote on November 21, 2022 at 9:50 AM

Reply to [R][D] Reading ML Papers - Workflow/Advice by EndlessRevision

I think the way you read papers depends on the subfield.

For example, computer vision papers (papers that go into CVPR, ICCV etc) tend to be more empirical, meaning that you may want to spend more time on the experiments, watching out for potential failure cases, asking yourself if baselines considered are appropriate and in line with what people in the field do.

For more theory-oriented papers (AISTATS, ICML etc), I'd spend more time on the method, understanding assumptions and proofs of key results.

To familiarise myself with a paper and related work, I tend to use connected papers (https://www.connectedpapers.com) -- I find it super useful when getting into a subfield that's not exactly my area of research.

HTH