Submitted by TensorDudee t3_zloof9 in MachineLearning
Hello Everyone 👋,
I just implemented the paper named AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE popularly known as the vision transformer paper. This paper uses a Transformer encoder for image recognition. It achieves state-of-the-art performance without using convolutional layers given that we have a huge dataset and enough computational resources.
Below I am sharing my implementation of this paper, please have a look and give it a 🌟 if you like it. This implementation provides easy-to-read code for understanding how the model works internally.
My implementation: GitHub Link
Thanks for your attention. 😀
MOSFETBJT t1_j078ma9 wrote
Thanks dude. Tensorflow gets a lot of hate on this sub. But I think part of it is people memeing