You’re getting a lot of hate but IMO you’re totally right. Python may be convenient short term but it really does not scale.
I’ve been a working in ML for around 8 years now and I’ve just joined a new project where we’re training models trained on billions of pieces of dialog for semantic parsing, and it took us weeks to fix a bug in the way torch and our code would interact with multiprocessing…
There are memory leaks caused by using Python objects like lists in dataset objects, but only if you use DistributedDataParallel (or a library like DeepSpeed, i.e. multiple processes)…
Loading and saving our custom data format requires calling into our own C++ code to avoid waiting hours to deserialize data for every training run…
Wish I could say there’s a better alternative now (due to existing community resources) but we can hope for the future.
Dependent_Change_831 t1_j1hc9x4 wrote
Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev
You’re getting a lot of hate but IMO you’re totally right. Python may be convenient short term but it really does not scale.
I’ve been a working in ML for around 8 years now and I’ve just joined a new project where we’re training models trained on billions of pieces of dialog for semantic parsing, and it took us weeks to fix a bug in the way torch and our code would interact with multiprocessing…
There are memory leaks caused by using Python objects like lists in dataset objects, but only if you use DistributedDataParallel (or a library like DeepSpeed, i.e. multiple processes)…
Loading and saving our custom data format requires calling into our own C++ code to avoid waiting hours to deserialize data for every training run…
Wish I could say there’s a better alternative now (due to existing community resources) but we can hope for the future.