arg_max t1_izpadl8 wrote on December 10, 2022 at 8:44 PM

I think the most prominent use case in CNN is as a very simple, localised and fast operation that changes the number of channels without touching the spatial dimensions.

For example, deep resnets have a bottleneck design. The input is something like a Nx256xHxW Tensor (N batch size, H, W spatial dimensions) with 256 channels. To save compute/memory, we might not want to actually use the 3x3 conv on all 256 channels. Thus we use a 1x1 conv first to change the number of channels from 256 to 64. On this smaller Tensor, we then implement a 3x3 conv that doesn't change the number of channels. Finally, we use another 1x1 conv to convert back from 64 to 256 channels. So here the first 1x1 conv decreases the number of channels while the second one restores the output back to the original shape with 256 channels.