Viewing a single comment thread. View all comments

Unlikely-Video-663 t1_j284flc wrote

In CNNs you usually already have long range dependencies channel wise - and imho one of the advantages of vit is allowing long range spatial information flow as well.

So channel-wise tokenization would not improve upon CNNs.. maybe?

2