Viewing a single comment thread. View all comments

omgitsjo t1_ive49sz wrote

Is there a good sparse loss function that also does regression? I have what basically amounts to an image to image problem, but the resulting image is a dense UV set (red channel goes from 0-255, green from 0-255). Most of the image is "no signal" so MSE tends to just predict all zeros after a while. I can't split the image into multuple channels because softmax over 255 values for red and 255 more channels for green would make me OOM. I might try and narrow it down to just 16 quantized channels each, but I'd really rather spit out a two channel image and do clever losses on that. I'm sure masking has some clever tricks like union over intersection, but those don't seem to handle regression cases, only boolean.

1

No_Lingonberry2565 t1_ivluuww wrote

Given you’re working with images, maybe you could perform some non-linear dimensionality reduction, such as using an auto-encoder, or SkLearn has functionality to use PCA with a kernel, and resulting reduced images might be less sparse and easier to work with traditional models?

2

omgitsjo t1_ivm2qw6 wrote

Wouldn't an auto encoder run into the same issue? If the dataset is mostly zeros then every loss function I can think of would hit the same issue. PCA could be an option, but disappointing to introduct it into what is otherwise a pure UNet architecture.

1

No_Lingonberry2565 t1_ivm7za6 wrote

Yea you’re right, since loss function for auto encoder for X, and X’ (reconstructed X) would be matrix frobenius norm of X - X’, which would then be close to 0, and then I think the weights would approach zero -> lower dimensional embeddings close to 0 (Im trying to visualize it in my head with the chain rule and weight updates as you back propagate - I THINK it would be something like that lol)

Considering that, maybe make use of some modified loss function that is higher for values closer to 0?

The only difficulty then instead of using a nice Keras architecture and then training automatically, you would probably need to first define this custom loss function, then update Keras model weights with gradient tape, and then even then the loss function you choose might have really shitty behavior and your network may not converge well.

Edit: Ignore my weird comment of making a loss function that is higher for arguments closer to 0.

Maybe try infinity norm of X-X’ in autoencoder instead of just ||X-X’||_F

2

omgitsjo t1_ivmay42 wrote

You might be on to something. Not necessarily the inf norm, but maybe an asymmetric loss function. Guess zero when it's 0.1 and the penalty is much higher than guessing 0.1 when it is 0.

1

No_Lingonberry2565 t1_ivmkyiw wrote

I suggested inf norm, because that will return a larger value, then when updating the weights through chain rule, it might lead to less sparse reduced states of your data

1