Submitted by chaotycmunkey t3_11qwzb6 in MachineLearning

Hola!

I am working on comparing some models, few of which have been implemented in PyTorch and the rest of them in Tensorflow (some in 1.x and others in 2.x versions). I know if they are implemented well, one should be able to simply compare their graphs/performances regardless of the platform. But often there might be some subtle differences in the implementations (within the platforms themselves and the way model code utilizes it) that can make it painful to trust the training. Some models are from official sources so I'd rather not verify much of their code before using them. Of course, I don't want to implement all of them into a single platform unless I must.

If you have come across such a problem, how have you dealt with it? Are there certain tests you would conduct to ensure the loss curves can be compared? How would you go about this issue other than finding someone else's implementation of say, a TF model in PyTorch, and verifying it?

Sincerely, A man in crisis.

6

Comments

You must log in or register to comment.

cthorrez t1_jc5s8ag wrote

Basically I would just make sure the metrics being compared are computed the same way. Same numerator and denominator like summing vs averaging, over the batch vs epoch. If the datasets are the same and the type of metric you are computing is the same it's comparable.

The implementation details just become part of the comparison.

9

BellRock99 t1_jc7fsag wrote

Trust the implementation, or simply just use the metrics presented in their papers on the standard datasets. The latter is more correct in my opinion, since even your implementation could be wrong.

1