Submitted by windoze t3_ylixp5 in MachineLearning

Hey, I'm a casual observer of the DL space, what are the biggest technique changes or discoveries that are now used everywhere? From my view:

  • Pretraining - reuse large data sets in the same domain (2010)
  • ReLU - simple to train non-linear function (2010)
  • Data Augmentation - how to make up more data (including noise, random erasing) (2012-)
  • Dropout - how to not overfit (2014)
  • Attention - how to model long range dependencies (2014)
  • Batch normalisation - how to avoid class of training issues (2015)
  • Residual connections - how to go deeper (2015)
  • Layer normalisation - how to avoid class of training issues (2016)
  • Transformers - how to do sequence modelling (2017)
  • Large Language Models - how to use implicit knowledge in language (2019)

What's the other improvements or discoveries? More general the idea the better.

Edit: added attention, pretraining, data augmentation, batch normalisation, contrastive methods

41

Comments

You must log in or register to comment.

ziad_amerr t1_iuyxzxz wrote

Check out GANs, One shot learning, Read about CoAtNets, RoBERTa, StyleGAN, XLNet, DoubleU Net and others

12

JackandFred t1_iuzb389 wrote

I feel like if your going to include transformers you should include the attention is all you need paper.

7

cautioushedonist t1_iuzeog4 wrote

Not as famous and might not qualify as a 'trick' but I'll mention "Geometric Deep Learning" anyway.

It tries to explain all the successful neural nets (CNN, RNN, Transformers) on a unified, universal mathematical framework. The most exciting extrapolation of this being that we'll be able to quickly discover new architectures using this framework.

Link - https://geometricdeeplearning.com/

16

BeatLeJuce t1_iuzz1ku wrote

Layer norm is not about fitting better, but training more easily (activations don't explode which makes optimization more stable).

Is your list limited to "discoveries that are now used everywhere"? Because there are a lot things that would've made it onto your list if you'd compiled it at different points in time but are now discarded (i.e., i'd say they are fads). E.g. GANs.

Other things are currently hyped but it's not clear how they'll end up long term:

Diffusion models are another thing that are currently hot.

Combining Multimodal inputs, which I'd say are "clip-like things".

There's self-supervision as a topic as well (with "contrastive methods" having been a thing).

Federated learning is likely here to stay.

NeRF will likely have a lasting impact, too.

3

FoundationPM t1_iv018k3 wrote

Quite clean. 2020-2022 is empty, because you don't see progress these years?

1

Gere1 t1_iv0505o wrote

Does someone know a good ablation study of the mentioned techniques. I've seen results where neither dropout nor layer normalization did much. So I wonder if these 2 techniques are a believe or still crucial.

2

ukshin-coldi t1_iv0593t wrote

Your dates are wrong, these were all discovered by Schmidhuber in the 90s.

62

carlthome t1_iv0gzvw wrote

Interesting to mention layer normalisation over batch normalisation. I thought the latter was "the thing" and that layernorm, groupnorm, instancenorm etc. were follow-ups.

12

redditrantaccount t1_iv3oxg8 wrote

Data augmentation to more explicitely define invariant transformations as well as to reduce dataset labeling costs.

2

flaghacker_ t1_iv5jf05 wrote

What's wrong with it? They explain all the components of their model in enough detail (in particular the multi head attention stuff), provide intuition behind certain decisions, include clear results, they have nice pictures, ... What could have been improved about it?

2

BrisklyBrusque t1_iv6ogqg wrote

2007-2010: Deep learning begins to win computer vision competitions. In my eyes, this is what put deep learning on the map for a lot of people, and kicked off the renaissance we see today.

2016ish: categorical embeddings/entity embeddings. For tabular data with categorical variables, categorical embeddings are faster and more accurate than one-hot-encoding, and preserve the natural relationships between factors by mapping them to a low dimensional space

2

cautioushedonist t1_ivcx548 wrote

Yes, it's different.

Universal function approximation sort of guarantees/implies that you can approximate any mapping function given the right config/weights of neural nets. It doesn't really guide us to the correct config.

2

blunzegg t1_iwl1d81 wrote

- Kernel tricks: How can purely mathematical approaches beat neural networks in terms of efficiancy? (This is actually an open problem for a long time, you can check Neural Tangent Kernels, Reproducing Kernel Hilbert Spaces for examples and Universal Approximation Property for neural networks )

- I was mainly here for Geometric Deep Learning but another user has already posted it. You should definitely check http://geometricdeeplearning.com . As a mathematician-to-be, I strongly believe that this is the future of ML/DL . Hit me up if you wanna discuss this statement further.

1