pommedeterresautee OP t1_itv3bu7 wrote

Yeah, it doesn't make sense to me either. Also I was expecting a bit better speedup (regarding those shared on the PyTorch dev forum). I tried several combinations of params (enabling the disabled optimizations) but they were either broken (eg matmul ops template) or making things slower.

Scripts are here:

Let me know if you find something suspicious.