Viewing a single comment thread. View all comments

No_Lingonberry2565 t1_ivluuww wrote

Given you’re working with images, maybe you could perform some non-linear dimensionality reduction, such as using an auto-encoder, or SkLearn has functionality to use PCA with a kernel, and resulting reduced images might be less sparse and easier to work with traditional models?

2

omgitsjo t1_ivm2qw6 wrote

Wouldn't an auto encoder run into the same issue? If the dataset is mostly zeros then every loss function I can think of would hit the same issue. PCA could be an option, but disappointing to introduct it into what is otherwise a pure UNet architecture.

1

No_Lingonberry2565 t1_ivm7za6 wrote

Yea you’re right, since loss function for auto encoder for X, and X’ (reconstructed X) would be matrix frobenius norm of X - X’, which would then be close to 0, and then I think the weights would approach zero -> lower dimensional embeddings close to 0 (Im trying to visualize it in my head with the chain rule and weight updates as you back propagate - I THINK it would be something like that lol)

Considering that, maybe make use of some modified loss function that is higher for values closer to 0?

The only difficulty then instead of using a nice Keras architecture and then training automatically, you would probably need to first define this custom loss function, then update Keras model weights with gradient tape, and then even then the loss function you choose might have really shitty behavior and your network may not converge well.

Edit: Ignore my weird comment of making a loss function that is higher for arguments closer to 0.

Maybe try infinity norm of X-X’ in autoencoder instead of just ||X-X’||_F

2

omgitsjo t1_ivmay42 wrote

You might be on to something. Not necessarily the inf norm, but maybe an asymmetric loss function. Guess zero when it's 0.1 and the penalty is much higher than guessing 0.1 when it is 0.

1

No_Lingonberry2565 t1_ivmkyiw wrote

I suggested inf norm, because that will return a larger value, then when updating the weights through chain rule, it might lead to less sparse reduced states of your data

1