Submitted by Ananth_A_007 t3_zgpmtn in MachineLearning
I am aware that one 1x1 convolution is needed for separable convolution but when else is it useful. I see it used in mobilenetv2 before the depthwise separable convolution later in the bottleneck but not sure why. I also see it used with stride 2 when max pooling could be used instead. Could someone please explain the logic behind this. Thanks.
IntelArtiGen t1_izipwih wrote
1x1 convolutions are practical when you need to change the shape of a tensor. If you have a tensor of shape (B, H, W, 128) you can use an 1x1 to have a tensor of shape (B, H, W, 64) without loosing too much information.
You can use an 1x1 with stride 2 in place of a max pooling depending on your constraints. It could perform better, it could be more computationally intensive or take an extra memory you don't have.
For mobilenetv2 I think you're talking about inverted residual / linear bottleneck? I think the point of this layer is to expand and then compress the information, plus it's a residual layer. Because the 1x1 allows you to efficiently expand and compress a tensor, you can use it to do these steps in this layer, and to re-shape the tensor so that it can be added as a residue. It seems that "expand / process (dwise) / compress / residue" requires less parameters for the same result than just doing "process / process / process" as we usually do, or even "process / residue ..." in resnet. However it's not easier for the algorithm to learn so the training might be longer and still be more parameter efficient.
If you're working on new neural network architectures, you have to be able to manipulate tensors of different shapes, 1x1 essentially helps to change shapes of tensors while keeping information.