Hello there!

I'm trying to segment masses in medical images with a modified u-net network. The medical images are big files (with a corresponding mask) that are split into small square images that are then used to train the model, using dice score as metric and 1-dice as loss. The thing is that most square images do not have any mass in them, so the data is very unbalanced and I get a really low dice score that improves very slowly.

I'm using Keras, and I fit the model with a generator. I tried returning a custom weight array for each sample (Keras allows this). For example, if I have a (32, 32, 1) image, with a (32, 32, 1) mask, I also return a (32, 32, 1) array that has a 1 where the mask has a 0 and a 100 where the mask has a 1. While that changes the loss value, the dice score doesn't improve faster.

I would really appreciate any help, hint or advice to make this work.

Comments

You must log in or register to comment.

ShadowStormDrift t1_itt8dx4 wrote on October 26, 2022 at 3:15 AM

Almost all my experience with Deep Learning in industry is people being given tiny datasets and expected to perform miracles upon them. This feels like one of those cases

Deep_Quarter t1_ittr5p4 wrote on October 26, 2022 at 6:40 AM

Hey, what you are trying is a form of sample weighting. It basically says data imbalance is the loss functions problem.

What you need to do is write a better data loader. Make sure that the imbalance is handled at the data loader by customising it to load batches that are balanced. Easier said than done i know, but this is where concepts like sampling and class weighting come in.

Second thing you can do is to train on a smaller resolution. A proper data pipeline paired with a good loss function like dice or tversky or focal loss can help you get a benchmark from which to improve on. Just search segmentation loss in github.

Lastly, you can reframe the problem to something simple like box regression or heatmap. This helps if the mask region is relatively larger or smaller compared to the input resolution.

Yeinstein20 t1_ituv2bx wrote on October 26, 2022 at 2:09 PM

Could you give a few more details on what kind of images you have, what you are trying to segment, your model...? Are you calculating you dice score and dice loss on foreground and background? Its usually a good idea to calculate it on the foreground and if you have more than one foreground class take the mean. That should already help a lot with class imbalance. Also I would add cross entropy or focal loss in addition to dice loss, that's something I have found to work well in general. You can also modify your data loader such that it will oversample foreground during training (say you have a batch size of 2 and force that at least one image has foreground). It's probably also a good idea to find a good baseline to compare against so you get a better idea how your performance is.

jantonio78 OP t1_iu6hq97 wrote on October 28, 2022 at 10:33 PM

The images are grayscale x-ray images that have masses in them. For example, one of the images may have a shape of (2000, 1500, 1). I extract chunks of (32, 32, 1) and use those chunks to train the segmentation network. The dice score and loss is calculated on the foreground, and there is only one class (mass). I'm going to change the data loader to use only chunks with at least some mass in them, although my guess is that then the trained model is going to find masses in empty chunks too. Thanks for your suggestions!

Yeinstein20 t1_iu80u42 wrote on October 29, 2022 at 7:26 AM

Is there a particular reason why you chose a patch size of 32x32? You have a rather shallow U-Net with this small patch size and not that much information in your patches but look rather uniform. I would try to go for maybe 128x128 or even 256x256. For 2d segmentation that should still not be too memory intensive. What's the batch size you are using? If you use for example a batch size of 32 you could force in the data loader that at least 8 of the scans should have some mass in them. Just play around a bit with this number to see how it works. Keep an eye on recall and precision in addition to dice to see if your false positives will rise.

jantonio78 OP t1_iudv7jg wrote on October 30, 2022 at 4:13 PM

No particular reason. I'm going to try different patch sizes. Regarding the batch size, right now I'm using 32. Discarding chunks without mass, I get a dice score of 0.8 approx. which isn't really great, but I still have many things to try. And I'm checking recall and precision (and specificity) at the end of each epoch.

I'm going to try a bigger patch size and change the data loader as you suggested. Thanks for your help!

pornthrowaway42069l t1_itsbufj wrote on October 25, 2022 at 11:01 PM

I'd try some baseline/simpler models on the same data and see how it performs. Maybe the model just can't do any better, that's always a good one to check before panicking.

You can also try to use K-means or DBSCAN or something like that, and try to get 2 clusters of results - see if those algos can segment your data better than your network. If so, maybe the network is set up incorrectly somehow, if not, maybe something funky happening to your data in pipeline.

Internal-Diet-514 t1_itwdhg2 wrote on October 26, 2022 at 8:03 PM

To start I’d down sample the number of images that don’t have any mass in them (or upsample the ones with mass) for the training data while keeping an even balance in the test/ validation. Others have said above that the loss functions is better suited to see an even representation. This is an easy way to do it without writing a custom data loader and you can see if that’s the problem before diving deeper.