Viewing a single comment thread. View all comments

Ananth_A_007 OP t1_izq8op0 wrote

But if we use 1x1 with stride 2, aren't we just slipping half the information without even looking at it? Like at least in max pooling, the filters see all the pixels before shrinking dimensions.

1

IntelArtiGen t1_izqc26r wrote

The information you have before a layer is conditioned by how it goes into that layer, at first the information that goes into that layer is noise, weights change depending on the loss such that when the information goes into that layer it reduces the loss, and becomes something meaningful.

So the question would be: is it better for information processing in the neural network to compare 2x2 values and take the max? or is it better to train the network such that it can put the correct information in 1 of the 2x2 values and always keep that one?

I think the answer depends on the dataset, the model and the training process.

And I think the point of that layer isn't necessarily to look at everything but just to shrink dimensions without loosing too much information. Perhaps looking at everything is not required to keep enough information.

2