Hey, I'm a casual observer of the DL space, what are the biggest technique changes or discoveries that are now used everywhere? From my view:

Pretraining - reuse large data sets in the same domain (2010)
ReLU - simple to train non-linear function (2010)
Data Augmentation - how to make up more data (including noise, random erasing) (2012-)
Dropout - how to not overfit (2014)
Attention - how to model long range dependencies (2014)
Batch normalisation - how to avoid class of training issues (2015)
Residual connections - how to go deeper (2015)
Layer normalisation - how to avoid class of training issues (2016)
Transformers - how to do sequence modelling (2017)
Large Language Models - how to use implicit knowledge in language (2019)

What's the other improvements or discoveries? More general the idea the better.

Edit: added attention, pretraining, data augmentation, batch normalisation, contrastive methods

Comments

You must log in or register to comment.

ukshin-coldi t1_iv0593t wrote on November 4, 2022 at 8:45 AM

Your dates are wrong, these were all discovered by Schmidhuber in the 90s.

cautioushedonist t1_iuzeog4 wrote on November 4, 2022 at 3:20 AM

Not as famous and might not qualify as a 'trick' but I'll mention "Geometric Deep Learning" anyway.

It tries to explain all the successful neural nets (CNN, RNN, Transformers) on a unified, universal mathematical framework. The most exciting extrapolation of this being that we'll be able to quickly discover new architectures using this framework.

Link - https://geometricdeeplearning.com/

and1984 t1_iv0qjbs wrote on November 4, 2022 at 12:50 PM

TIL

BrisklyBrusque t1_iv6negg wrote on November 5, 2022 at 6:38 PM

Is this different from the premise that neural networks are universal function approximators?

cautioushedonist t1_ivcx548 wrote on November 7, 2022 at 1:08 AM

Yes, it's different.

Universal function approximation sort of guarantees/implies that you can approximate any mapping function given the right config/weights of neural nets. It doesn't really guide us to the correct config.

CremeEmotional6561 t1_iuyt5ae wrote on November 4, 2022 at 12:40 AM

LSTMs - how to train sequences (1997)

ziad_amerr t1_iuyxzxz wrote on November 4, 2022 at 1:15 AM

Check out GANs, One shot learning, Read about CoAtNets, RoBERTa, StyleGAN, XLNet, DoubleU Net and others

carlthome t1_iv0gzvw wrote on November 4, 2022 at 11:20 AM

Interesting to mention layer normalisation over batch normalisation. I thought the latter was "the thing" and that layernorm, groupnorm, instancenorm etc. were follow-ups.

acertainmoment t1_iv1ddh0 wrote on November 4, 2022 at 3:33 PM

yup, same thoughts. BatchNorm was the OG norm. The cousins came later

mhddjazz t1_iuz3px3 wrote on November 4, 2022 at 1:56 AM

NERF, Diffusion

JackandFred t1_iuzb389 wrote on November 4, 2022 at 2:51 AM

I feel like if your going to include transformers you should include the attention is all you need paper.

PassionatePossum t1_iv05451 wrote on November 4, 2022 at 8:42 AM

I would only include as a historical reference. It is certainly not a "must read" paper. It is written so poorly that you are better off to just look at the code.

ukshin-coldi t1_iv0qocf wrote on November 4, 2022 at 12:51 PM

Any good resources for writing a well written ML paper?

Intelligent-Aioli-43 t1_iv1lgvq wrote on November 4, 2022 at 4:26 PM

Check out MLRC

flaghacker_ t1_iv5jf05 wrote on November 5, 2022 at 1:57 PM

What's wrong with it? They explain all the components of their model in enough detail (in particular the multi head attention stuff), provide intuition behind certain decisions, include clear results, they have nice pictures, ... What could have been improved about it?

onyx-zero-software t1_iv0bgza wrote on November 4, 2022 at 10:14 AM

Agreed

BeatLeJuce t1_iuzz1ku wrote on November 4, 2022 at 7:11 AM

Layer norm is not about fitting better, but training more easily (activations don't explode which makes optimization more stable).

Is your list limited to "discoveries that are now used everywhere"? Because there are a lot things that would've made it onto your list if you'd compiled it at different points in time but are now discarded (i.e., i'd say they are fads). E.g. GANs.

Other things are currently hyped but it's not clear how they'll end up long term:

Diffusion models are another thing that are currently hot.

Combining Multimodal inputs, which I'd say are "clip-like things".

There's self-supervision as a topic as well (with "contrastive methods" having been a thing).

Federated learning is likely here to stay.

NeRF will likely have a lasting impact, too.

BrisklyBrusque t1_iv6otss wrote on November 5, 2022 at 6:47 PM

I recall that experimenters disagreed on why batchnorm worked in the first place? has the consensus settled?

BeatLeJuce t1_iv7co26 wrote on November 5, 2022 at 9:36 PM

No. But we all agree that it's not due to internal covariate shift.

Gere1 t1_iv0505o wrote on November 4, 2022 at 8:41 AM

Does someone know a good ablation study of the mentioned techniques. I've seen results where neither dropout nor layer normalization did much. So I wonder if these 2 techniques are a believe or still crucial.

redditrantaccount t1_iv3oxg8 wrote on November 5, 2022 at 1:16 AM

Data augmentation to more explicitely define invariant transformations as well as to reduce dataset labeling costs.

BrisklyBrusque t1_iv6ogqg wrote on November 5, 2022 at 6:45 PM

2007-2010: Deep learning begins to win computer vision competitions. In my eyes, this is what put deep learning on the map for a lot of people, and kicked off the renaissance we see today.

2016ish: categorical embeddings/entity embeddings. For tabular data with categorical variables, categorical embeddings are faster and more accurate than one-hot-encoding, and preserve the natural relationships between factors by mapping them to a low dimensional space

FoundationPM t1_iv018k3 wrote on November 4, 2022 at 7:44 AM

Quite clean. 2020-2022 is empty, because you don't see progress these years?

windoze OP t1_iv4y3f6 wrote on November 5, 2022 at 10:03 AM

It's empty because I've not kept up to date, and also impact won't be seen until more people build on it.

samlhuillier3 t1_iv2wzqy wrote on November 4, 2022 at 9:41 PM

Diffusion and GANs!!

blunzegg t1_iwl1d81 wrote on November 16, 2022 at 12:46 PM

- Kernel tricks: How can purely mathematical approaches beat neural networks in terms of efficiancy? (This is actually an open problem for a long time, you can check Neural Tangent Kernels, Reproducing Kernel Hilbert Spaces for examples and Universal Approximation Property for neural networks )

- I was mainly here for Geometric Deep Learning but another user has already posted it. You should definitely check http://geometricdeeplearning.com . As a mathematician-to-be, I strongly believe that this is the future of ML/DL . Hit me up if you wanna discuss this statement further.