Viewing a single comment thread. View all comments

pommedeterresautee OP t1_itv3bu7 wrote

Yeah, it doesn't make sense to me either. Also I was expecting a bit better speedup (regarding those shared on the PyTorch dev forum). I tried several combinations of params (enabling the disabled optimizations) but they were either broken (eg matmul ops template) or making things slower.

Scripts are here: https://github.com/ELS-RD/kernl/tree/main/experimental/benchmarks

Let me know if you find something suspicious.

1