Viewing a single comment thread. View all comments

derpderp3200 OP t1_j1vgi23 wrote

I assume this is the case early into training, but eventually the training process starts needing to "compress" information so a given parameter handles more than one very specific case, at which point it'll be subject to this phenomenon again- any dog example will want "not dog" neurons inactive, any dog example will want neurons contributing to classification of other classes inactive.

Sure, statistically you're still descending down the slope of a network that's good at each class, but this is only the case when your classes - and thus the "pull effects" are balanced, not as an intrinsic ability of the network to extract differentiating features.

1