Comments

You must log in or register to comment.

ItalianPizza91 t1_iwblluq wrote

If the training loss decreases and validation loss stays the same, this is usually a sign of overfitting. The usual steps I take to avoid this:

- use a dropout layer

- add data augmentations

- get more data

16

Tiny-Mud6713 OP t1_iwc1vq9 wrote

Yeah the problem is that this is a challenge and the data is limited, tried data augmentation but haven't had much luck.

However, I must ask, when using data augmentation is it better to augment the training and the validation sets or just the training?, seen conflicted opinions online.

2

Nhabls t1_iwc3rap wrote

You don't augment validation data, you'd be corrupting your validation scores, you'd only augment it at the end when/if you're training with all the data

Speaking of, look at your class representation %s, accuracy might be completely misleading if you have 1 or 2 overwhelmingly represented classes

4

Tiny-Mud6713 OP t1_iwc919l wrote

7 classes are equally distributed (500 images), only 1 has like 25% of the other data share (150-ish), it is a problem but I'm not sure how to solve it considering the fact that it's a challenge and I can't add data, augmentation will keep the imbalance since it augments everything equally.

1

Nhabls t1_iwcdek4 wrote

The data doesn't seem that imbalanced, not to cause the issues you're having. And idk what you are using for augmentation but you can def augment classes to specifically solve imbalance ( i don't like doing that personally). My next guess would be looking at how you're splitting the data for train/val and/or freezing the vast majority of the pretrained model and maybe even just training on the last layer or 2 that you add on top.

Regardless, it's something that's useful to know (very frequent in real world datasets) here's a link that goes over how to weigh classes for such cases it's with tensorflow in mind but it's the same concept regardless

3

GullibleBrick7669 t1_iwc3fyj wrote

From my understanding and performance on a recent work of mine (similar problem), augmenting just the training data is beneficial in interpreting the validation accuracy. In the sense, validation data quite literally functions as the test data with no alterations. So, when you plot the loss on training and validation, that should give you an understanding of how well the model will perform on the test data. So, for my problem I augmented just the training data and left validation and test data as is.

Also looking at your plots, it could also be a sign of unrepresentative validation data set. Ensure that there are enough data samples for each class if you find that they are not, try performing the same augmentations that you do on the training data on the validation data as well to generate more samples.

3

sbduke10 t1_iwbmm7c wrote

In addition to the data augmentation recommendation someone else made, make sure your test set is representative. If you just used the last 10% of images they might be all one class depending on ordering.

13

Tiny-Mud6713 OP t1_iwc207i wrote

It's a challenge the test is online on unseen data, and I'm shuffling the split data each run

3

FakeOuter t1_iwbsmnp wrote

- try triplet loss

- swap Flatten with GlobalMaxPooling2D layer, it will reduce trainable params 49x in your case. Less params -> lower chance of overfitting. Maybe place some normalization layer right after maxPool

8

RoaRene317 t1_iwc102m wrote

There is some tricks that could increase your accuracy:

  1. Adjust the parameter tuning more. Sometimes increasing or decreasing the Augmentation Value (Rotation, zoom , shear,etc) could improve accuracy
  2. Unfreeze Batch Normalization Layers could help
  3. Check for each class of the dataset if it's balanced or not. If the datasets is inbalanced, try to use ViT or SMOTE algorithm.
  4. Increasing Dropout value , sometimes could help
  5. After doing transfer learning, add pooling layers , either MaxPool or Average Pool.

Also I don't think you should dropout some networks before the Dense. Because Flatten is just making it the matrices into one dimensional (Flatten out).

5

Tiny-Mud6713 OP t1_iwc7y2g wrote

Very insightful, Haven't tried most of these things, thanks for sharing the knowledge.

1

Technical-Owl-6919 t1_iwcakcv wrote

One thing I don't know why anyone has not mentioned yet is, why have you kept two linear layers ?. Two linear layers one after the other in a Transfer learning case is something which will lead to very bad generalization. DenseNet is large enough to extract features and make them simple enough for single layers to understand. Try removing the dense layer between the output and functional(DenseNet). Also try swapping the Flatten with Global Max or Global Average Pooling.

3

Tiny-Mud6713 OP t1_iwcfxo2 wrote

I have tried that at first since it was intuitive and a benchmark since it's less parameters, but two layers gave better results, also the GAP has caused the training to early stop very early on, what do you suggest to as the top layer, eg GAP, batchnorm, dense

1

Technical-Owl-6919 t1_iwch8sv wrote

See, from my experience I would ask you to use EfficientNets in the first place. Secondly please don't unfreeze the model at the very beginning. Train the frozen model with your custom head for a few epochs and when the loss saturates, reduce the Lr and unfreeze the entire network and train again. Btw did you try LR Scheduling ?

1

Tiny-Mud6713 OP t1_iwcjcrv wrote

In the post I said I unfroze the CNN layers, I meant after the transfer learning part. I run it untill it early stops with all CNN layers frozen, then run it with unfreezing the top 200 layers or so.

I'm obliged to work on Keras K don't know if it has an LR sched method, I'll check the API great advice.

1

Tiny-Mud6713 OP t1_iwck8e1 wrote

The problem with efficient nets is that I ran a test on some models apriori, I got this graph, note that the dataset was ran for 3 epochs only each model.

https://drive.google.com/file/d/1OyXaWg6vMirYeI9zLSeGJ2v_qCz3msu4/view?usp=share_link

1

Technical-Owl-6919 t1_iwckvp7 wrote

Something seems to be wrong, the validation scores should not be so low. Exactly what type of data are you dealing with ?

1

Tiny-Mud6713 OP t1_iwcleya wrote

They're pictures of some plant, 8 classes for 8 different species of the same type of the plant.

1

Technical-Owl-6919 t1_iwclyq8 wrote

So my friend, then you have to train the network from scratch, it is getting trapped into a local minima. Maybe a small network might help. Try training a ResNet15 or something similar from scratch. This has happened with me once, I was working with Simulation Images and could not get the AuC score to go above 0.92, once I trained it from scratch I got AUC scores close to 0.99, 0.98 etc.

1

arg_max t1_iwcn3y0 wrote

Imagenet 1k pretraining might not be the best for this as it contains few plant classes. The bigger in-21k has a much larger selection of plants and might be better suited for you. Timm has efficient net v2, beit, vit and convnext models pretrained on this though I don't use keras you might be able to find them for this framework.

1

czhu12 t1_iwcejk1 wrote

Any chance this is just a hard problem? I always try to just manually sift through my dataset to see if i can correctly manually predict the data. Thats usually the benchmark I expect the ML model to be able to achieve.

3

Tiny-Mud6713 OP t1_iwcgyg7 wrote

Hahhaha, definitely! The pictures are of leaves of 8 different species and they're square 96-pixel images, so not so great to visually look at

2

ID4gotten t1_iwbpnjv wrote

Perhaps dig deeper on activation functions, optimization algorithm, or step sizes. Try some alternatives.
If your domain images (and things that differentiate between classes) are very different than those in the pretrained network maybe it doesn't have the features you need.

2

Nhabls t1_iwc3gcr wrote

What is the representation of each class? A class imbalance could create this exact behavior. You dont even need to use a data augmentation technique ( i don't have a particularly great opinion of them, personally) and just scale the weights appropriately instead.

Also what does "Standard" mean here?

2

lambdasintheoutfield t1_iwcov7x wrote

Here are some tricks that have worked for me in a similar enough use case:

  1. use triplet loss + weighted cross entropy loss (possibly with a weighing to the triplet loss term itself.

I definitely found that carefully considering the objective function has the most influence on performance on problems like this.

  1. try a cyclic learning rate schedule - here, you aren’t necessarily trying to get best results off the bat. You can however study the train and validation loss plots to see how learning rate at different epochs impacts your results.

  2. data augmentation - try as many kinds as you see reasonable

DenseNet reuses the feature maps for every layer in each subsequent layer, and that can help guide how you tweak your algorithm further

Good luck!

2

ok531441 t1_iwbgwt2 wrote

  1. Try without any fine tuning, use the pretrained network as a preprocessing step.

  2. Try a different/newer model.

1

Tiny-Mud6713 OP t1_iwbhh7f wrote

I have been trying all of the Keras API transfer models but no luck, any suggestions on a newer model, I know the models will behave according to the problem but I'm ready to test anything rn, also any tips on the FC architecture?

1

ok531441 t1_iwbiz7c wrote

What about point 1? Did you try keeping the pretrained model frozen?

1

Tiny-Mud6713 OP t1_iwbjjp1 wrote

Yes that's the first step I do, after that step I try to unfreeze and fine tune

1

Ragdoll_X_Furry t1_iwc23i9 wrote

A few more details about your implementation would be useful for us to help you.

  1. How many images are you using for validation?

  2. What batch size and optimizer are you using during training?

  3. What's the dropout rate in the Dropout layers?

  4. How are you preprocessing the images before feeding them to your model? Are you using the tf.keras.applications.densenet.preprocess_input function as suggested in the Keras documentation?

You should try increasing the batch size if you can, and use data augmentation as others have already suggested.

You can also try other networks besides DenseNet, like one of the ResNet or EfficientNet models, and you can replace the Flatten layer by a GlobalAvgPool2D or GlobalMaxPool2D layer to reduce parameter size (in my experience the former gives better results). Also that resizing layer might not necessary to improve accuracy.

1

Tiny-Mud6713 OP t1_iwcglrg wrote

1- I'm doing a 20% split, so in total they're around 2800, 700 training and validation.

2- batches of 8, Adam with LR=0.001 in the transfer part, LR=0.0001 in the fine tuning, any other combination caused everything to crumble.

3- currently 0.3, 0.5 caused some early stopping problems, since the model was stuck

4- valid_data_gen = ImageDataGenerator(rescale=1/255.)

train_data_gen = ImageDataGenerator(

rescale=1/255.,

rotation_range = 30,

width_shift_range = 0.2,

height_shift_range = 0.2,

horizontal_flip = True,

vertical_flip = True

)

​

and then flow from file to get the preprocessed images

1

Ragdoll_X_Furry t1_iwcxiv6 wrote

Adam is usually more likely to overfit, so using SGD with Nesterov momentum might help a bit. I'd also recommend augmenting contrast, brightness, saturation and hue if those options are available for the ImageDataGenerator class.

Also does the rotation in the ImageDataGenerator fill the background with black pixels or is there the option to extend/reflect the image? In my experience simply filling the background with black after rotation tends to hinder the accuracy.

One trick that might also help is to extract the outputs not only from the last layer of the pretrained network but also from earlier layers to feed into your network. In my experience this can help improve the accuracy. I've done this with the EfficientNet B0, so I've pasted some example code here to help you out, though if you don't want to use an EfficientNet I'm sure this can be adapted to the DenseNet201 too.

Of course, sometimes transfer learning just doesn't help really, so if nothing else helps you push the accuracy above 90% it might be best to just build and train your own model from scratch to better suit your needs.

2

Tiny-Mud6713 OP t1_iwd0cqz wrote

I haven't tried playing with the optimizer, thank you for the notice, also thanks for the code, will try to play around with it too :)

1

Tiny-Mud6713 OP t1_iwcgqff wrote

Actually the resizing really boosted the performance by like 5%, I'm at at 80% now, but still looking to boost it up

1

shot_a_man_in_reno t1_iwcil08 wrote

Maybe I'm misunderstanding, but is the DenseNet itself frozen? You're only training the one, massive, fully connected layer?

1

Tiny-Mud6713 OP t1_iwcjsgx wrote

Oh no, from the comments i realized that I have explained things in a bad way, I train the FC layer until it early stops while the DenseNet is frozen, then I take that model and retrain the weights with unfreezing 200-ish layers and lowering the learning rate

1

Intelligent-Aioli-43 t1_iwbji6t wrote

Yes, I had a similar problem, my model was underperforming on keras API notebooks, I switched to pytorch lightning and it works

−5

Tiny-Mud6713 OP t1_iwbkf15 wrote

Never worked with lightning, may sound dumb but, how does changing the library change the output of the learning process?

4

snaykey t1_iwbw78d wrote

So you kept the exact same model structure, but switched the library and "it works"? I have absolutely no clue what I just read and I'm honestly not sure if I even wanna know

4

The-Last-Lion-Turtle t1_iwbi5jv wrote

It doesn't look like there are any convolutions in that net. Fully connected layers don't work that well.

Resnet or wide resnet would be a better idea.

−6

Tiny-Mud6713 OP t1_iwbibsp wrote

The DenseNet201(functional) layer is the full CNN but it's collapsed because it's>700 layer, will try those, thank you

5