Viewing a single comment thread. View all comments

BrotherAmazing t1_j4tjnj1 wrote

I would like to see what happens if you train an N-class classifier with a final FC output layer that is of size (N+M) x 1 and you simply pretend there are “M” unknown classes that you have no training examples for so those M components are always 0 for your initial training set and you always make predictions by re-normalizing/conditioning on the fact that those elements are 0.

Now you add a new class with your spare “capacity” in that last layer and start re-training from where you left off without modifying the architecture, but now some data have non-zero labels for the N+1st class and now you re-normalize predictions only to condition on the last M-1 classes being 0 instead of M.

Then see how training starting from this initially trained N-class network progresses in becoming an (N+1)-class classifier compared to the baseline of just starting over from scratch and see whether it saves you compute time for certain problems while simultaneously being just as accurate in the end (or not!).

IDK how practical or important this would really be (probably not much!) even if it did lead to computational savings, but would be a fun little nerdy study.

2

Quiet-Investment-734 OP t1_j4w3eby wrote

This is exactly what I was thinking of doing, but I wanted to know if there are any comparatively efficient methods to achieve the same.

1