Submitted by viertys t3_125xdrq in MachineLearning


I am working on a project in which I'm detecting cavities in X-rays.

The dataset I have is pretty limited (~100 images). Each X-ray has a black and white mask that shows where in the image are the cavities.

I'm trying to improve my results.

What I've tried so far:

  1. different loss functions: BCE, dice loss, bce+dice, tversky loss, focal tversky loss
  2. modifying the images' gamma to make the cavities more visible
  3. trying out different U-Nets: U-net, V-net, U-net++, UNET 3+, Attention U-net, R2U-net, ResUnet-a, U^2-Net, TransUNET, and Swin-UNET

None of the new U-nets that I've tried improved the results. Probably because they are more suited for a larger dataset.

I'm now looking for other things to try to improve my results. Currently my network is detecting cavities, but it has trouble with the smaller ones.



You must log in or register to comment.

trajo123 t1_je7dgjz wrote

100 images??? Folks, neural nets are data hungry, if you don't have reams of data, don't fiddle with architectures, definitely not at first. The first thing to do when data is limited is to use pre-trained models. Then do data augmentation and only then look at other things like architectures and losses if you really have nothing better to do with your time.

SMP offers a wide variety of segmentation models with the option to use pre-trained weights.


viertys OP t1_je9mjxr wrote

Thank you a lot! I will try SMP


Tight-Lettuce7980 t1_jea4ojl wrote

How about medical images, which are more difficult to obtain due to privacy issues? I don't think it's easy to get for example 1000+ images. Would 300 - 700 or so be sufficient?


trajo123 t1_jebbxaf wrote

Sufficient to train a model from scratch? Unlikely. Sufficient to fine-tune a model pre-trained on 1million+ images (imagenet, etc)? Probably yes. As mentioned, some extra performance can be squeezed out with some smart data augmentation.


BrotherAmazing t1_je7vj9v wrote

People saying get more than 100 images are right (all else being equal, yes, get more images!) but you likely can make good progress without as many images for your problem with clever augmentation and a smaller network.

Here’s why:

  1. You only have to detect cavities. It’s not some 1,000-class semantic segmentation problem.

  2. You should be working with single channel grayscale images, and not pixels that naturally come in 3-channel RGB color.

  3. This is X-ray data just of teeth, so you don’t have nearly the amount of complex fine-detailed textures and patterns (with colors) that are exhibited in more general RGB optical datasets of all sorts of objects and environments.

Of course for a real operational system that you will use in commercial products you will need to get far more than 100 images. However, for a simple research problem or prototype demo, you can likely show good results and feasibility (without overfitting, yes) on your dataset with a smaller net and clever augmentation.


viertys OP t1_je9nno8 wrote

I didn't mention it in the post, but I'm using the albumentations module. I rotate, shift, rotate, blur, horizontal flip, downscale and use gauss noise. I get around 400 images after doing this. Is there anything you would suggest?
I have an accuracy of 98.50 and I have dice of around 0.30-0.65 in each image

And yes, the images are grayscale and they are cropped around the teeth area, so only that part of the X-ray remains.


MadScientist-1214 t1_je6o0st wrote

Most new architectures based on U-Net do not actually work. Researchers need papers to get published, so they introduce leakage or optimize the seed. Segmentation papers in journals like CVPR are of better quality.


BreakingCiphers t1_je7dlg5 wrote

While testing models and playing with hyperparams can be fun, the real problem is that you are trying to apply deep learning to 100 images.

Get more images.


Adventurous-Mouse849 t1_je7gyqe wrote

And also data augmentation. Rotation, cropping, zooming. This is essential for data scarcity in medical imaging.


viertys OP t1_je9mlvj wrote

I didn't mention it in the post, but I'm using the albumentations module. I rotate, shift, rotate, blur, horizontal flip, downscale and use gauss noise. I get around 400 images after doing this. Is there anything you would suggest?


Adventurous-Mouse849 t1_jedi4wq wrote

For augmentation that’s all bases covered. For more high-level or fully generative tasks I would also suggest mix-match (convex combo between similar samples). But you can’t justify that here bc you would have to relabel. Ultimately this does come down to too few images. If there’s a publicly available pretrained CT segmentation model you could fine-tune it to your task, or distill it’s weights to your model… just make sure they did a good job in the first place.

Also some other notes: I’d suggest sticking with distribution losses ie cross entropy. U-Net is sensitive to normalization so I’d also suggest training with and without normalized inputs.


azorsenpai t1_je6hjpu wrote

Is there any reason you're really restraining to a Unet based model ? I'd recommend testing different architectures such as DeepLab V3 or FPN and see whether stuff improves. If it doesn't I'd recommend looking to your data and the quality of the ground truth as with only 100 data points you should be very much limited by the information contained in your data.

If the data is clean I'd recommend using some kind of ensemble method, this might be overkill, especially with heavy models but having multiple models with random initializations infer on a same input generally gives a few more points of accuracy/dice so if you really need it , this is an option.


viertys OP t1_je6peyv wrote

I started with U-Net, but I'm open to other architectures. I will try out DeepLab V3, thank you!

I believe the data is generally clean. Sadly, I can't get more data as all the datasets used in the research papers that I've read are private.


deep-yearning t1_je710wy wrote

What accuracy (Dice?) are you getting? 100 training images is pretty small. Have you tried cross-validation?


viertys OP t1_je9mocy wrote

I have an accuracy of 98.50 and I have dice of around 0.30-0.65 for each image


deep-yearning t1_je9qqrf wrote

Accuracy is not a good metric here given the large number of true negative pixels you will get.

How large is the typical region you are trying to segment (in pixels)? If you've already done data augmentation I would also try to generate images if you can. Use a larger batch size, try different optimizers and a learning rate scheduler. How many images do not have cavities in them?


viertys OP t1_je9srha wrote

All images have cavities in them and in general the cavities make up 5-10% of the image.

Here is an example: The mask on the left is the ground truth and the mask on the right is the predicted one.


I'm currently using Kaggle and I can't use very large batch sizes. My batch size is 4 now. Is there an alternative to Kaggle that you would suggest?


deep-yearning t1_je9te4j wrote

Train locally on your own machine if you have a GPU, or try using google colab if you don't. Google Colab has V100 which should fit larger batch sizes.

To be honest, given how limited the data set is and how small some of the segmentation masks are, I am not sure other architectures will be able to do any better than U-Net.

I would also try the nnU-Net which should give state-of-the-art performance, and so will give you a good idea of what's possible with the dataset that you have:


viertys OP t1_je9u6ny wrote

Thank you, I will try nnU-net too


currentscurrents t1_je7c29r wrote

The architecture probably isn't the problem. You only have 100 images, that's your problem.

If you can't get more labeled data, you should pretrain on unlabeled data that's as close as possible to your task - preferably other dental x-rays. Then you can finetune on your real dataset.


dubbitywap t1_je8xewl wrote

Do you have a git repository that we can take a look at?


itsyourboiirow t1_je7n7p8 wrote

Others have mentioned it, but do data augmentation, crop, resize, rotate, etc. and you'll be able to increase the size of your dataset and improve results.


viertys OP t1_je9mpwr wrote

I didn't mention it in the post but I'm using the albumentations module. I rotate, shift, rotate, blur, horizontal flip, downscale and use gauss noise. I get around 400 images after doing this. Is there anything you would suggest?


mofoss t1_je9ayyx wrote

Try segformers after augmentation


NoLifeGamer2 t1_je9gi5u wrote

I recommend using bootstrapping to create more datapoints, then approve the ones you like and add them to the dataset. Then, train based on the larger dataset.


CyberDainz t1_je6qsbb wrote

The success of generalization for segmentation depends not only on the network configuration, but also on the augmentation and pretrain on non mask target.

try my new project Deep Roto