I have been playing around with both TF and Pytorch for a while and I noticed that PyTorch in general gives me better results than Tensorflow on a simple binary classification**.** Baffled by this I tried to investigate a little bit further with a simple comparison:

I made two colabs notebooks trying to solve the very same binary classification problem (cats vs dogs) with both frameworks. I tried to keep the model's architecture as similar as possible, as far as I can tell, relying on pre-trained VGG16 weights and allowing training on all the layers. The following plots show that PyTorch immediately reaches top performance in just one epoch, while TF is not even close after 10 epochs.

Learning rate, optimizer are the same. The VGG16 architecture seems a little bit different in the two frameworks with different number of parameters. Am I missing something obvious?

plot legend

training = blue line
validation = orange line

Tensorflow (tf.keras)

PyTorch

TF colab link: https://colab.research.google.com/drive/1YeOlEGNJXXWJ2bkY1kk2iMJ4JCJwpsKm?usp=sharing
Pytorch colab link: https://colab.research.google.com/drive/1nSAuyd9x7WAfA3FkwD6NTyBOzuBvGiRQ#scrollTo=Yf22Fq7CyB3F

EDIT 1

On a closer inspection, pytorch vgg16 is using Batch Normalized layers while TF is not. There is an alternative VGG16 pretrained network that doesn't use BN. I will try to use that one instead

Comments

TiredOldCrow t1_its89ot wrote on October 25, 2022 at 10:34 PM

Since you're using different pre-trained VGG16 models as a starting point, you may just be demonstrating that the PyTorch torchvision model is more amenable to your combination of hyperparameters than the TensorFlow one.

Ideally for this kind of comparison you'd use the exact same pretrained model architecture+weights as a starting point. Maybe look for a set of weights that has been ported to both PyTorch and TensorFlow?

seba07 t1_ittpya9 wrote on October 26, 2022 at 6:23 AM

Or otherwise don't use a pre-trained network for this test. Pytorch randomness shouldn't be better than Tensorflows.

aleguida OP t1_itvgiu8 wrote on October 26, 2022 at 4:33 PM

Thanks for the feedback. I tried retraining everything from scratch without downloading any pretrained weights. here is the colab links update.

While Pytorch is learning something, Tf is not learning anything. This is actually quite confusing as I used tf.Keras to minimize any possible error on my part. I will try to build the same network from scratch in both frameworks next

[deleted] t1_itzwb38 wrote on October 27, 2022 at 3:03 PM

[deleted]

edunuke t1_itsiaxr wrote on October 25, 2022 at 11:49 PM

What if you export vgg16 from tensorflow to onnx and load it on pytorch and compare that (and the opposite as well) maybe more comparative.

aleguida OP t1_itvhw6d wrote on October 26, 2022 at 4:42 PM

it is worth a shot, thanks for the suggestions! I will try it in the next days and report back here :)

pornthrowaway42069l t1_itrxwks wrote on October 25, 2022 at 9:20 PM

Try predicting using a generated dataset, or one of the generic datasets, this will show if vgg16 is the culprit or its a pattern. GET MORE DATA FOR THE GODS OF DATA

_peabody124 t1_itt8pno wrote on October 26, 2022 at 3:18 AM

How balanced is your data? Looks like a bug. The cross entropy loss in PyTorch seems incredibly out of line, as does the training-validation loss gap.

aleguida OP t1_itvhqg8 wrote on October 26, 2022 at 4:40 PM

We should have 50% of the dataset as cats and 50% dogs.

Looks like a bug. The cross entropy loss in PyTorch seems incredibly out of line, as does the training-validation loss gap.

Good point. I need to double check that. What worries me more is the TF implementation that is struggling to get Ok results.

aleguida OP t1_itw7yh3 wrote on October 26, 2022 at 7:28 PM

there was indeed a bug on the val loss calculation. Good catch!

I_will_delete_myself t1_itv85fs wrote on October 26, 2022 at 3:38 PM

I know it's more "cool" to use PyTorch, however they are practically performing very similar math and should get very similar results. This is if you decide to train something from scratch. What's more important is the person using the tool rather than just the tools themselves.

Edit: Also their VGG16 weights are probably going to be different than Tensorflow's, so it isn't an accurate representation. You should try a model trained from scratch.

aleguida OP t1_itvhaee wrote on October 26, 2022 at 4:38 PM

thanks for the feedback. Turning off the pretraining causes pytorch to learn more slowly (to be expected) but TF is stuck and not learning anything. See colabs notebooks below.

I see many other TF implementation adding a few more FC layers on the top of the VGG16 but as you also stated I would expect to see the same problem in pytorch while I am kind of getting different results with a similar network. I will try next to build a CNN from scratch using the very same layers for both frameworks

TF: https://colab.research.google.com/drive/1O6qzopiFzK5tDmLQAzLKmNoNaEMDc4Ze?usp=sharing
PYTORCH: https://colab.research.google.com/drive/1g-1CEpzmWJi9xOiZHzvDSv_-eDlXZO9u?usp=sharing