saw79

saw79 t1_iz0158r wrote

I don't think it makes sense these days to implement a CNN architecture from scratch for a standard problem (e.g., classification), except as a learning exercise. A common set of classification networks that I use as a go-to are the EfficientNet architectures. Usually I use the timm library (for PyTorch), and instantiating the model is just 1 line of code (see its docs). You can either load it in pretrained (from ImageNet) or randomly initialized, and further fine-tune yourself. EfficientNet has versions 0-7 that give increasing performance at the cost of computation/size. If you're in TensorFlow-land I'm sure there's something analogous. Both TF and PT have model zoos in official packages too. Like torchvision.models or whatever.

8

saw79 t1_ixiusbb wrote

Your model should output 3 logits, one for class_a, one for class_b, and one for class_c.

When you use data from the 1st dataset,

  • penalize class_a outputs for samples with class_b and anything_but_a_b labels
  • penalize class_b outputs for samples with class_a and anything_but_a_b labels
  • penalize class_c outputs for samples with class_a and class_b labels

When you use data from the 2nd dataset,

  • penalize class_a outputs for samples with class_c labels
  • penalize class_b outputs for samples with class_c labels
  • penalize class_c outputs for samples with not_class_c labels
1

saw79 t1_ir23cl5 wrote

All I meant by nebulous was that he didn't have a concrete idea for what to actually use as visual quality, and you've nicely described how it's actually a very deep inference that we as humans make with our relatively advanced brains.

I did not mean that it it's conceptually something that can't exist. I think we're very much in agreement.

3

saw79 t1_ir13tsx wrote

In addition to other commenter's [good] point about your nebulous "visual quality" idea, a couple other comments on what you're seeing:

  1. Frankly, your generative model doesn't seem very good. If your generated samples don't look anything like CIFAR images, I would stop here. Your model's p(x) is clearly very different from CIFAR's p(x).

  2. Why are "standard"/discriminative models' confidence scores high? This is a hugely important drawback of discriminative models and one reason why generative models are interesting in the first place. Discriminative models model p(y|x) (class given data), but don't know anything about p(x). Generative models model p(x, y) = p(y|x) p(x); i.e., they generally have access to the prior p(x) and can assess whether an image x can even be understood by the model in the first place. These types of models would (hopefully, if done correctly), give low confidence on "crappy" images.

7