Viewing a single comment thread. View all comments

PassionatePossum t1_it3rd0p wrote

If the examples between classes are strongly unbalanced, I would probably go for a precision/recall plot. One per class. Overall performance can be compared by the mean average precision.

You are right. In an overfitting classifier, training accuracy should go up over the long term. But that does not have to be a strong effect. I've seen plenty of overfitting classifiers where the training loss was essentially flat but the validation loss kept increasing. Also doesn't have to be a strong effect. But from what you told me, that makes my theory of overfitting slightly less likely.

Your explanation of the 128 units makes a lot more sense. However, I would argue to start simple. One dense layer after a sufficiently deep convolutional network, should be all that is needed.

I feel like you quest for "understanding" network structures is an unproductive direction. Well-performing network architectures are mostly just something that empirically works, there is not real theory behind it. You can waste a lot of time trying to tweak something that has been shown to work across a wide area of problem domains or you can just stick with something that you know works. Especially if you only need a ballbark estimate.

My setup for a ballpark estimate for pretty much any problem is:

  1. An EfficientNet as backbone. That has the advantage you can easily scale up the backbone if you have the resources and want to see what is possible with a larger network. I usually start with EfficientNet-B1.
  2. Pretrained imagenet weights (without the densely connected layers)
  3. Global average pooling on the features.
  4. A single dense layer to the output neurons.
  5. I usually train only the last layer for a single epoch and then release the weights for the backbone.

After I have the initial predictions. I try to visualize the error cases to see whether I can spot commonalities and work my way up from there.

That hasn't failed me so far. I normally usually use a focal loss to guard against strongly unbalanced examples. Unfortunately, the multi-class case isn't implemented in TensorFlow (which is what I tend to use), but that is easily implemented in a few lines of code.

But in your case I wouldn't go through the trouble of tweaking the loss. A normal crossentropy loss should be sufficient to get an idea of what is possible. If everything fails, downweight the loss on examples that are overrepresented.

1

thanderrine OP t1_it8qwlf wrote

So I used to do transfer learning for my models before right... But it kind of feels like I'm using someone else's work. Like if I'm using the weights and architecture of someone else's work then how does it shows my skills... You know what I mean.

All I do is use the image dataset and preprocess it so that it fits the model. So how can I possibly present something like. as my project if the majority of the work is done by someone else.

About tweaking the loss, so I am kind of doing that for my model. I'm using focal tversky loss, with gamma as 0.65.

1

PassionatePossum t1_it9djl0 wrote

I understand, if you are working on an academic paper or something like that. In that case novelty is important. If you are working in industry - as I currently am- I have no such concerns. In industry, skill is to produce a working solution fast and if someone has already built a framework that I am allowed to use, even better.

1