Viewing a single comment thread. View all comments

ed3203 t1_j4sim4z wrote

Depending on the size of the training data and the network it may be better to just retain the whole thing from scratch

2

BrotherAmazing t1_j4tjnj1 wrote

I would like to see what happens if you train an N-class classifier with a final FC output layer that is of size (N+M) x 1 and you simply pretend there are “M” unknown classes that you have no training examples for so those M components are always 0 for your initial training set and you always make predictions by re-normalizing/conditioning on the fact that those elements are 0.

Now you add a new class with your spare “capacity” in that last layer and start re-training from where you left off without modifying the architecture, but now some data have non-zero labels for the N+1st class and now you re-normalize predictions only to condition on the last M-1 classes being 0 instead of M.

Then see how training starting from this initially trained N-class network progresses in becoming an (N+1)-class classifier compared to the baseline of just starting over from scratch and see whether it saves you compute time for certain problems while simultaneously being just as accurate in the end (or not!).

IDK how practical or important this would really be (probably not much!) even if it did lead to computational savings, but would be a fun little nerdy study.

2

Quiet-Investment-734 OP t1_j4w3eby wrote

This is exactly what I was thinking of doing, but I wanted to know if there are any comparatively efficient methods to achieve the same.

1

WinterExtreme9316 t1_j4skh3m wrote

Why? If you're just adding a category, why not use what you've got and just train the last layer. You mean in case the new category has some unique low-level feature that early layers of network need to extract?

1

ed3203 t1_j4slyft wrote

Yes, you may arrive at a different local minima which could be more performant. You give the model more freedom to explore. OP gave no context, if it's a huge transformer model for instance that would be impractical to retain then sure use the model as is with a different final classification layer.

5