banmeyoucoward t1_iw6r361 wrote on November 13, 2022 at 11:28 AM

You have to learn by doing, but you can do a surprising amount with small data, which will mean you can implement a paper and learn a whole lot faster since you aren't waiting on training. For example, if all you have is MNIST:

Supervised MLP classifier

Supervised convolutional classifier

Supervised transformer classifier

MLP GAN

Convolutional GAN

Gan regularizers (W-GAN, GAN-GP, etc- https://avg.is.mpg.de/publications/meschedericml2018 is mandatory reading + replicate experiments if you want to work on GANs)

Variational Autoencoder

Vector quantized variational autoencoder (VQVAE)

Diffusion model

Represent MNIST Digits using an MLP that maps pixel x, y -> brightness (Kmart NeRF)

I've done most of these projects (still need to do diffusion and my vqvae implementation doesn't work) and they each take about 2 days to grok the paper, translate to code, and implement on MNIST (~6 hours of coding?) using pytorch and the pytorch documentation + reading the relevant papers. very educational!

LightGreenSquash OP t1_iwi9q1g wrote on November 15, 2022 at 8:57 PM

Yep, that's kind of along the lines I'm thinking as well. The only possible drawback I can see is that for such small datasets even "basic" architectures like MLPs can do well enough and thus you might not be able to see the benefit, say, a ResNet brings.

It's still very much a solid approach though, and I've used it in the past to deepen my knowledge of stuff I already knew, e.g. coding a very basic computational graph framework and then using it to train an MLP on MNIST. It was really cool to see my "hand-made" graph topological sort + fprop/bprop methods written for different functions actually reach 90%+ accuracy.