Submitted by aleguida t3_yde1q8 in MachineLearning
I have been playing around with both TF and Pytorch for a while and I noticed that PyTorch in general gives me better results than Tensorflow on a simple binary classification**.** Baffled by this I tried to investigate a little bit further with a simple comparison:
I made two colabs notebooks trying to solve the very same binary classification problem (cats vs dogs) with both frameworks. I tried to keep the model's architecture as similar as possible, as far as I can tell, relying on pre-trained VGG16 weights and allowing training on all the layers. The following plots show that PyTorch immediately reaches top performance in just one epoch, while TF is not even close after 10 epochs.
Learning rate, optimizer are the same. The VGG16 architecture seems a little bit different in the two frameworks with different number of parameters. Am I missing something obvious?
plot legend
- training = blue line
- validation = orange line
​
​
​
​
​
- TF colab link: https://colab.research.google.com/drive/1YeOlEGNJXXWJ2bkY1kk2iMJ4JCJwpsKm?usp=sharing
- Pytorch colab link: https://colab.research.google.com/drive/1nSAuyd9x7WAfA3FkwD6NTyBOzuBvGiRQ#scrollTo=Yf22Fq7CyB3F
​
EDIT 1
On a closer inspection, pytorch vgg16 is using Batch Normalized layers while TF is not. There is an alternative VGG16 pretrained network that doesn't use BN. I will try to use that one instead
TiredOldCrow t1_its89ot wrote
Since you're using different pre-trained VGG16 models as a starting point, you may just be demonstrating that the PyTorch torchvision model is more amenable to your combination of hyperparameters than the TensorFlow one.
Ideally for this kind of comparison you'd use the exact same pretrained model architecture+weights as a starting point. Maybe look for a set of weights that has been ported to both PyTorch and TensorFlow?