Submitted by windoze t3_ylixp5 in MachineLearning

Hey, I'm a casual observer of the DL space, what are the biggest technique changes or discoveries that are now used everywhere? From my view:

  • Pretraining - reuse large data sets in the same domain (2010)
  • ReLU - simple to train non-linear function (2010)
  • Data Augmentation - how to make up more data (including noise, random erasing) (2012-)
  • Dropout - how to not overfit (2014)
  • Attention - how to model long range dependencies (2014)
  • Batch normalisation - how to avoid class of training issues (2015)
  • Residual connections - how to go deeper (2015)
  • Layer normalisation - how to avoid class of training issues (2016)
  • Transformers - how to do sequence modelling (2017)
  • Large Language Models - how to use implicit knowledge in language (2019)

What's the other improvements or discoveries? More general the idea the better.

Edit: added attention, pretraining, data augmentation, batch normalisation, contrastive methods

41

Comments

You must log in or register to comment.

ukshin-coldi t1_iv0593t wrote

Your dates are wrong, these were all discovered by Schmidhuber in the 90s.

62

cautioushedonist t1_iuzeog4 wrote

Not as famous and might not qualify as a 'trick' but I'll mention "Geometric Deep Learning" anyway.

It tries to explain all the successful neural nets (CNN, RNN, Transformers) on a unified, universal mathematical framework. The most exciting extrapolation of this being that we'll be able to quickly discover new architectures using this framework.

Link - https://geometricdeeplearning.com/

16

BrisklyBrusque t1_iv6negg wrote

Is this different from the premise that neural networks are universal function approximators?

1

cautioushedonist t1_ivcx548 wrote

Yes, it's different.

Universal function approximation sort of guarantees/implies that you can approximate any mapping function given the right config/weights of neural nets. It doesn't really guide us to the correct config.

2

ziad_amerr t1_iuyxzxz wrote

Check out GANs, One shot learning, Read about CoAtNets, RoBERTa, StyleGAN, XLNet, DoubleU Net and others

12

carlthome t1_iv0gzvw wrote

Interesting to mention layer normalisation over batch normalisation. I thought the latter was "the thing" and that layernorm, groupnorm, instancenorm etc. were follow-ups.

12

acertainmoment t1_iv1ddh0 wrote

yup, same thoughts. BatchNorm was the OG norm. The cousins came later

3

mhddjazz t1_iuz3px3 wrote

NERF, Diffusion

11

JackandFred t1_iuzb389 wrote

I feel like if your going to include transformers you should include the attention is all you need paper.

7

PassionatePossum t1_iv05451 wrote

I would only include as a historical reference. It is certainly not a "must read" paper. It is written so poorly that you are better off to just look at the code.

1

flaghacker_ t1_iv5jf05 wrote

What's wrong with it? They explain all the components of their model in enough detail (in particular the multi head attention stuff), provide intuition behind certain decisions, include clear results, they have nice pictures, ... What could have been improved about it?

2

BeatLeJuce t1_iuzz1ku wrote

Layer norm is not about fitting better, but training more easily (activations don't explode which makes optimization more stable).

Is your list limited to "discoveries that are now used everywhere"? Because there are a lot things that would've made it onto your list if you'd compiled it at different points in time but are now discarded (i.e., i'd say they are fads). E.g. GANs.

Other things are currently hyped but it's not clear how they'll end up long term:

Diffusion models are another thing that are currently hot.

Combining Multimodal inputs, which I'd say are "clip-like things".

There's self-supervision as a topic as well (with "contrastive methods" having been a thing).

Federated learning is likely here to stay.

NeRF will likely have a lasting impact, too.

3

BrisklyBrusque t1_iv6otss wrote

I recall that experimenters disagreed on why batchnorm worked in the first place? has the consensus settled?

1

BeatLeJuce t1_iv7co26 wrote

No. But we all agree that it's not due to internal covariate shift.

2

Gere1 t1_iv0505o wrote

Does someone know a good ablation study of the mentioned techniques. I've seen results where neither dropout nor layer normalization did much. So I wonder if these 2 techniques are a believe or still crucial.

2

redditrantaccount t1_iv3oxg8 wrote

Data augmentation to more explicitely define invariant transformations as well as to reduce dataset labeling costs.

2

BrisklyBrusque t1_iv6ogqg wrote

2007-2010: Deep learning begins to win computer vision competitions. In my eyes, this is what put deep learning on the map for a lot of people, and kicked off the renaissance we see today.

2016ish: categorical embeddings/entity embeddings. For tabular data with categorical variables, categorical embeddings are faster and more accurate than one-hot-encoding, and preserve the natural relationships between factors by mapping them to a low dimensional space

2

FoundationPM t1_iv018k3 wrote

Quite clean. 2020-2022 is empty, because you don't see progress these years?

1

windoze OP t1_iv4y3f6 wrote

It's empty because I've not kept up to date, and also impact won't be seen until more people build on it.

3

blunzegg t1_iwl1d81 wrote

- Kernel tricks: How can purely mathematical approaches beat neural networks in terms of efficiancy? (This is actually an open problem for a long time, you can check Neural Tangent Kernels, Reproducing Kernel Hilbert Spaces for examples and Universal Approximation Property for neural networks )

- I was mainly here for Geometric Deep Learning but another user has already posted it. You should definitely check http://geometricdeeplearning.com . As a mathematician-to-be, I strongly believe that this is the future of ML/DL . Hit me up if you wanna discuss this statement further.

1