Submitted by ThoughtOk5558 t3_xvcman in deeplearning

I generated CIFAR10 images using energy based models from the joint distribution of an "airplane: 0" and "bird: 2" classes. As can be see below, the generated images can't be visually classified as any of the CIFAR10 classes, i.e., the prediction should roughly be uniform distribution.

Sampled from the joint distribution of CIFAR10 \"airplane\" and \"bird\" classes.

However, when I make inference using a pre-trained CIFAR10 model link the confidence scores of the predicted classes are very high.

Predicted probabilities

​

Predicted classes

I am aware of adversarial attacks and this is kind of adversarial attack.

So, here is my opinion (question). I believe CNNs or any network should consider the visual quality when making a prediction.

Should / can CNNs be improved to act this way?

Thank you.

12

Comments

You must log in or register to comment.

XecutionStyle t1_ir0dv73 wrote

How do you propose we define quality?

10

XecutionStyle t1_ir0fgw6 wrote

See that's the problem. We benefit from eons for evolution to imprint what quality is (i.e. what correlates with real life) the most genetically.

To tell a CNN about quality without using a CNN to analyze is either cyclical or redundant. I'm afraid.

9

porygon93 t1_ir0wesp wrote

you are modeling p(z|x) instead of p(x)

2

saw79 t1_ir13tsx wrote

In addition to other commenter's [good] point about your nebulous "visual quality" idea, a couple other comments on what you're seeing:

  1. Frankly, your generative model doesn't seem very good. If your generated samples don't look anything like CIFAR images, I would stop here. Your model's p(x) is clearly very different from CIFAR's p(x).

  2. Why are "standard"/discriminative models' confidence scores high? This is a hugely important drawback of discriminative models and one reason why generative models are interesting in the first place. Discriminative models model p(y|x) (class given data), but don't know anything about p(x). Generative models model p(x, y) = p(y|x) p(x); i.e., they generally have access to the prior p(x) and can assess whether an image x can even be understood by the model in the first place. These types of models would (hopefully, if done correctly), give low confidence on "crappy" images.

7

ThoughtOk5558 OP t1_ir17ijo wrote

I intentionally generated "bad" samples by doing few steps of MCMC sampling. I am also able to generate CIFARR10 looking samples.

I think your explanation is convincing.

Thank you.

4

saw79 t1_ir1a2tb wrote

Oh ok cool. Is your code anywhere? What kind of energy model? I have experience with other types of deep generative models but actually am just starting to learn about EBMs myself recently.

2

XecutionStyle t1_ir20qoc wrote

I don't think it's nebulous. We infuse knowledge, bias, prior etc. like physics (in Lagrangian networks) all the time. I was just addressing his last point. There's no analytical solution for quality we can use as labels.

Networks can understand the difference between pretty and ugly semantically with tons of data, and tons of data only.

3

saw79 t1_ir23cl5 wrote

All I meant by nebulous was that he didn't have a concrete idea for what to actually use as visual quality, and you've nicely described how it's actually a very deep inference that we as humans make with our relatively advanced brains.

I did not mean that it it's conceptually something that can't exist. I think we're very much in agreement.

3

BrotherAmazing t1_ir3dmwz wrote

Nearly every data-driven approach to regression and purely discriminative classification has this problem, and it’s a problem of trying to extrapolate far outside the domain that you trained/fit the model in. It’s not about anything else.

Your generated images clearly look nothing like CIFAR-10 training images, so it’s not much different than if I fit two Gaussians to data that was Gaussian in 2-D using samples that all fit within the sphere of radius 1, then I send a 2-D feature measurement into my classifier than is a distance 100 from the origin. Any discriminative classifier that doesn’t have a way to detect outliers/anomalies will likely be extremely confident in classifying this 2-D feature as one of the two classes. We would not say that the classifier has a problem not considering “feature quality”, but would say it’s not very sophisticated.

In the real world in critical problems, CNNs aren’t just fed images like this. Smart engineers have ways to detect if an image is likely not in the training distribution and throw a flag to not have confidence in the CNN’s output.

4

BrotherAmazing t1_ir3etpb wrote

I think someone didn’t understand what you meant and downvoted or downvoted because you didn’t define ‘z’ and ‘x’ and so on, but I know what you mean and you’re correct. This is another way of looking at it that is completely right.

p(x) for all these images under a CIFAR-10 world is basically 0, but your CNN is not computing that or factoring that in and is just assuming the input images are good images, then estimating the probability of airplane vs. bird for these nonsense images given that they are not nonsense images and given that they come from the same pdf as CIFAR-10….. which is a very very false assumption!

3