Submitted by vprokopev t3_zqo8jm in MachineLearning

Every time I try to implement something I have to make sure I never use loops and I only use Pytorch/tf tensors.

If I want to have efficient code, I must kind of abandon Python and only use data structures and operations that are provided by those frameworks.

Every time I have a solution in my head, I must think how do Implement it using ONLY the framework, and not the programming language (python).

We basically constraint ourselves to those limited operations that someone implemented in C++ for us. This make things harder, not easier.

We are not programming in python at all. We use a language within a language that really constraints us.

Why not just move to C++ or something new like Rust/Go?

0

Comments

You must log in or register to comment.

dumbmachines t1_j0zlr39 wrote

If you're using pytorch, what's stopping you from using the C++ api? Seems like it is exactly what you are asking for.

48

vprokopev OP t1_j0zoeub wrote

Nothing but the fact that everywhere I go to work they have most of ML code base in python so I have to use it.

−14

moist_buckets t1_j0zq3wt wrote

Because development time is fast and far more people have experience with Python than C++.

51

Exarctus t1_j0zqq3b wrote

The vast majority of PyTorch functions calls are implemented in either Cuda C or OpenMP parallelized C++

python is only used as a front-end. Very little of the computational workload is done by the python interpreter.

Additionally The C++ API for PyTorch is very much in the same style as the python API. Obviously you have some additional flexibility in how you optimize your code but the tensor-based operations are the same.

PyTorch also makes it trivially easy to write optimized CUDA C code and provide python bindings to it so you can make use of it with faster development time in python, while retaining the computational benefits of C/C++/CUDA C for typical workloads.

33

vprokopev OP t1_j0zrven wrote

I understand this. This is not answering my question.

Using python front end I must implement any algorithm I have in mind in terms of vectorized pytorch operations. I can't use loops and indexing and other python libraries, or my code will be slow and only executed with 1 core.

How that supposed to make my job easier?

−16

ok531441 t1_j0zsvqr wrote

It’s easier than the alternatives. If you don’t think it is, use whatever you think is better. You’ll either solve your problem and find that better language or you’ll learn why Python is used so much.

18

dumbmachines t1_j0zsn0r wrote

The alternative is writing your own cuda code or C++. Fortunately for you pytorch is pretty easily extendable. If you have something that needs to be done quickly, why not write a cpp extension?

10

Exarctus t1_j0ztwve wrote

I’ve not encountered many situations where I cannot use existing vectorized PyTorch indexing operations to do complicated masking or indexing etc, and I’ve written some pretty complex code bases during my PhD.

Alternatively you could write your code in C++/CUDA C however you like and provide PyTorch bindings to include it in your python workflow.

6

float16 t1_j10sam3 wrote

OK guys, you can chill with the downvotes. They're just asking questions.

As mentioned elsewhere, Python does not do much work; the important parts are in CUDA. So if you used some other language such as C++, you still can't write loops, and you still have to use the framework's data structures.

5

Dependent_Change_831 t1_j1hc9x4 wrote

You’re getting a lot of hate but IMO you’re totally right. Python may be convenient short term but it really does not scale.

I’ve been a working in ML for around 8 years now and I’ve just joined a new project where we’re training models trained on billions of pieces of dialog for semantic parsing, and it took us weeks to fix a bug in the way torch and our code would interact with multiprocessing…

There are memory leaks caused by using Python objects like lists in dataset objects, but only if you use DistributedDataParallel (or a library like DeepSpeed, i.e. multiple processes)…

Loading and saving our custom data format requires calling into our own C++ code to avoid waiting hours to deserialize data for every training run…

Wish I could say there’s a better alternative now (due to existing community resources) but we can hope for the future.

2

vprokopev OP t1_j1j0xs5 wrote

Thank you for sharing experience!.

My intuition is that C++ python extensions make it easier to to easy things (then in C++) but make it harder to do hard things.

People always go for convenience first and then move to something more fundamental and flexible.

Data Science was mostly in R and MATLAB about 12-15 years ago. Then people moved to more general python. Next step is a compiled language with static types imo.

1

30katz t1_j11v4wa wrote

Maybe you’d take all this free software and make it easier for others in the future?

1

Featureless_Bug t1_j10vwsx wrote

What the hell do you want, mate? Everyone uses Python because it is easier to use Python as a front end in ML. And if you ever need to customize something heavy, you just write it in C++ or Rust and call it from Python.

If you don't think it is easier than writing everything in C++ or Rust (which is braindead, btw, any compiled language is a terrible choice for ML experimenting), then do it - noone is stopping you.

5

fastglow t1_j0zlqhj wrote

  1. Using libraries available in a given language does not mean you are not programming in that language.
  2. Python is a very popular, flexible, and easy language with a huge dev community. This drives development of ML tools for Python, which in turn increases adoption of Python for ML.
22

vprokopev OP t1_j0zo91k wrote

  1. But you have to use vectorized operations instead of loops, pytorch functions for everything you want to be fast etc... This is different from just using a library. It's ONLY using the library
  2. There are better languages that are also flexible and popular. And python is not so flexible anyway (GIL).
−14

suflaj t1_j0zq31h wrote

What if I told you that even if you were using C/C++, you'd still need to be using library functions? Because the code, ultimately, doesn't run natively, it calls Fortran, Assembly and CUDA libraries.

You cannot directly program in whatever CUDA compiles to because it's proprietary and GPU model-specific, so why bother? Researchers chose Python not because they like snakey-boys or enjoy British comedy, they chose it because it is adequate to do research in, unlike C or C++, which are horrible to work with and too hideous to read and understand even if a pro writes them, let alone some researcher.

Ultimately Python code is easier to maintain and work on, and there are more developers as opposed to C/C++, so of course companies will use it over whatever C++ API exists for DL libraries.

As for your Rust/Go question, although Go has some potential it has no community to work with. It is also harder to use than Python. There is almost no benefit of using Go over Python even if the decision was to be made now, let alone transfer, other than Go's nice concurrency model. Now, why would you use that when from joblib import delayed, Parallel does the trick? So far, the biggest problem Python has with concurrency is its lack of a good shared memory API, which is probably going to be fixed in a year or so now that it is part of Python. But this lack of API does not significantly impact Python, because you'd do this kind of stuff via a C module anyways.

As for Rust it will probably never become a serious contender for research and development because it is even more hideous and complex than C/C++ are. It is also slower, so, what's the point? Unless you want to double the average wages of people in DL and kill 90% of jobs since barely anyone can use Rust effectively.

15

vprokopev OP t1_j0zqzxm wrote

Again, in C++ I am not so much constrained to only use Pytorch functions. I can use other libraries and native features.

In python I basically must express any algorithm I have in my head in terms of vectorized pytorch operations with broadcasting. Not the case in C++. Am I wrong here?

I am not taking about researchers, I am talking more about businesses. No problem with researchers using python.

−6

suflaj t1_j0ztusm wrote

> Not the case in C++. Am I wrong here?

Probably. It seems you "have" to do these things because you want speed. But if you want speed, then you'll have to do them in C++ as well.

> I am not taking about researchers, I am talking more about businesses.

This applies to businesses more than anything. Your manager does not give a single fuck about the purity and the performance of your code before its deployed. Until then the only thing that matters is that the product is developed before your competitors get the contracts for as low of a cost as possible.

And when code is deployed, it will often not even be in C++. A lot of the time you have to port it to C because there are not C++ compilers for a platform, or you will keep it in ONNX format and then deploy on some runtime to keep maintenance easy.

8

RaptorDotCpp t1_j0zttgc wrote

You'd still use vectorized functions in C++ though, just because they're faster for doing algebra

6

[deleted] t1_j0zrakb wrote

[removed]

−12

suflaj t1_j0zthsd wrote

Looking at your post history, there are plenty of things I could make fun of. Dehumanize you even.

But instead of stooping to your level, all I will say is - I frequently program in C and (x86 & aarch64) Assembly, but I recognise that many of my R&D colleagues are not engineers, and that their strengths can be better utilised if they focus on the things they are good at.

2

Ricenaros t1_j0zqyxw wrote

using vectorized operations isn't just a design choice of the language you're programming in. It's a fundamental concept for optimizing code. for loops don't magically become fast just because you're using C++. For example, google "vectorize for loop c++" there are tons of results. In general you don't want to be using loops, especially for large scale data problems.

10

vprokopev OP t1_j0zt2rb wrote

Agree, what I am trying to say is C++ gives you more freedom here. In python it's just no way you can use native features in anything you want to run fast.

Vectorized operations are goated, I agree. But I don't want to be constrained to always use them. Especially when I have to write a lot of very specific modifications to data before I feed it to a NN.

−1

Veggies-are-okay t1_j0zq951 wrote

  1. I mean at that point why not start using C for most efficient processes? I think you’re thinking that for loops are kind of this “base” truth when it’s yet another tool that happened to come first in the history of programming. Vectorized operations are a godsend and have allowed programmers to graduate beyond for loops. I don’t have to write out my little optimized c++ function every time I want to run an apply statement or whatever. Specific to ML, I don’t have to manually format my data structure to hook it up to pytorch; I just gotta have a pandas data frame with some minor tweaks to stick it into a neural network.
3

Zealousideal_Low1287 t1_j0zr6ix wrote

The GIL has nothing to do with ‘flexibility’

1

vprokopev OP t1_j0ztfxi wrote

Flexible = more freedom as I understand ot. GIL = less freedom. Maybe you have different definition.

I do understand it's needed for python to be what it is now and to become so popular.

1

rehrev t1_j10052l wrote

The fast ones are not flexible.

Python is flexible, you can do loops.

Of you want to be fast, you won't be doing loops in C++ either.

Overall, you sound confused.

1

vprokopev OP t1_j0zppi0 wrote

Every time I need to manipulate tensors I have to describe my algorithm in terms of vectorized pytorch functions and broadcasting, instead of more straightforward loops and indexing.

How is that supposed to make my job easier?

−16

fastglow t1_j0zqs33 wrote

It sounds like your issues are more about PyTorch than Python itself. The need for vectorization is not specific to a language, though some libraries make it easier than others. If you want automatic vectorization, have a look at Jax, which has grown tremendously in the past couple years.

15

Zealousideal_Low1287 t1_j0zqyr2 wrote

Even if you wrote in a language like Rust, Go, C, C++ you wouldn’t avoid ‘calling into a framework’ syndrome.

There are several reasons why things like the compute kernels available in PyTorch are fast / optimised, and it’s not as simple as they are compiled. Operations are written efficiently with hardware and memory access patterns in mind, and we also have to ensure we do things like correctly implement the API for backward mode autodiff.

If you wanted to swap over to writing all of this in bare C++ it would undoubtedly be much slower than using PyTorch. We use Python because it’s a nice convenient language for calling into the framework. The overhead from this usually isn’t particularly significant. If you have a use-case where it is significant then sure use C++.

22

vprokopev OP t1_j0zu1ke wrote

But I do want to use pytorch, I like it very much.

I just usually have a lot of specific modifications to data and find myself avoiding native python and loops/indexing, because it makes things way slower.

It would still be slower if I implemented it in C++ then in pytorch, but at least not like way slower and would not create a bottleneck

−1

Zealousideal_Low1287 t1_j0zuq39 wrote

I have no idea what you’re suggesting. Use C++ instead of vectorising properly and using PyTorch? Do you currently do much compute ‘outside’ PyTorch?

8

vprokopev OP t1_j0zy68v wrote

Mostly use vectorized pytorch operations.

Sometimes use just native loops and indexing.

Yes, unfortunately there are specific data preprocessing cases where I have to do stuff outside of pytorch, it's just more convenient.

And even when mostly using pytorch I still want the freedom to just use native functionality of a language without a huge hit to a performance.

But I know pytorch vectorized ops will still be faster and are suitable for majority of tasks

0

Zealousideal_Low1287 t1_j0zycf1 wrote

And you feel if you wrote raw C++ it would be as fast as the PyTorch ops, or you seek to replace the Python part, or something else?

6

vprokopev OP t1_j0zzfpg wrote

Just seek to replace a python part with something that is slower (obviously) but not like way slower then Pytorch vectorized ops.

And to have freedom to use more native structures and a bit less thinking about how to vectorize every algorithm possible

1

FinancialElephant t1_j10025f wrote

Now I use Julia most of the time. It's great, not just for the speed. The combination of the type system and multiple dispatch lead to much better implementations. I find the same implementations take less code and are easier to understand. Also using a language without a native array type for data science work always seemed crazy to me. There are also a number of smaller things about Julia that are nice compared to Python and reduce friction for me (broadcast a function fn(...) withfn.(...), much better/cleaner package management than Python). I still have to use Python for a lot of work but I'm hoping more people try Julia so that the Python hedgemony can end.

11

quisatz_haderah t1_j0zqtt4 wrote

Pytorch already is fast enough, whatever you scrape when you switch to c++ will bite you in the ass as lower readability

4

Centurion902 t1_j0ztnrx wrote

Even if you wrote in C++, you would still need to vectorize everything. You can't rely on for loops for this kind of stuff because it needs to be paralelized. If you paralelized your for loops. Well, you are vectorizing your code.

4

pyppo42 t1_j10xylo wrote

There are people spending their whole careers to make low level operations, such as Matrix multiplication, faster. I prefer reusing their jobs and focus on new problems. Then, calling CBLAS from C or using an @ operator in numpy does not change the fact I need to think in terms of tensors, and to me Python is more friendly than C/C++ there. Also, this approach became profitable for enterprise that provide you with closed-source kernels for supporting their specialized hardware (GPUs, FPGAs but also modern CPUs). The way you think in Python remains the same, but you get backed by the work of thousands of engineers focusing on how to make an FFT faster to sell their hardware over competitors. In the unlikely event you need an algorithm nobody needed before and/or that you can write a faster implementation at least for CPU, please do it in C! I am willing to bind and reuse your code rather than rethinking it, at least as long as it does not become a bottleneck for my application.

4

I_will_delete_myself t1_j0zxeqn wrote

Most of the ML libraries use mostly C++, we just use Python as an interface to make it easier to code and deploy. So really you are using C++ code.

Also PyTorch is a tensor computing library so of course you can only use tensors in Neural Networks.

3

bobwmcgrath t1_j0zrc8m wrote

Python just makes C calls so most of that is not running slowly in the interpreter. Python is a little slower than pure C but not much. You're not even stuck using python. Plenty of people use c. My time is more valuable than the CPU cycles so I use python.

2

tripple13 t1_j0zv1a5 wrote

Speed, quite literally.

Not computation, but ease of implementing a new idea and making a proof of concept.

Researchers try to maximize time spent on iterating through failure, rather than spending a lot of time to perfect a technique. (Generally speaking)

2

rehrev t1_j0zzv69 wrote

You want to implement transformers in C++?

2

ureepamuree t1_j1tam8k wrote

Hmm, I should prolly give this as a term project to the students and then they'll hate me even more 🤡

1

mxmrsn t1_j10ahhu wrote

Why not train in pytorch and then deploy with libtorch C++ API?

2

trajo123 t1_j101qp8 wrote

>Why not *just* move to C++ or something new...?

Moving to a different language is never a "just" in any non-trivial organization. With Python you have the option but not the obligation to optimize: you can write slow pythonic code or faster framework-y code. You also have the option to write python extensions in whatever language you want for performance critical parts of the code. The latter seems like a much more pragmatic aproach to me than completely switching languages.

1

PredictorX1 t1_j102cz7 wrote

>Why are we stuck with Python ...

I can only speak for myself, but I have been working in anakytics for a long time and I rarely use Python. Most of my analytical work is done in MATLAB, though I occasionally use machine learning shells or (matrix-capable) compiled BASIC. Since I write nearly all of my own code at the algorithm level, I can generate source code for any deployment system (SAS, SQL, Java, ...even Python!) with no need of libraries, etc.

1

sharky6000 t1_j10ay53 wrote

I mean the main answer is familiarity and the abundance of code available in Python.

Some people are exploring alternate routes. You can use the PyTorch C++ API. Meta released Flashlight. Both Rust and Go are picking up in ML (more Rust than Go, now, I think but Go has Gorgonia, for Rust there is a Torch interface https://towardsdatascience.com/machine-learning-and-rust-part-4-neural-networks-in-torch-85ee623f87a)

But often you start going down these roads to later find that they're not worth it. Much of the computational savings can be done without forcing a new language on people. The whole "shaping your thinking around the framework" is an unfortunate necessary evil because of the nature of how the networks are used (via high speed devices like GPUs/TPUs) or how data gets assembled or transferred. Sadly, a lot of this is not the fault of the top-level language.

1

bironsecret t1_j10mf71 wrote

probably yeah main reasons are

  1. everything popular is already in python
  2. it's simple and easy to write in
  3. most of the important stuff is either way written in c++ under the hood
1

moodoki t1_j11vtce wrote

It just looks like you're not thinking enough about your problem. Yes, loops in C++ or other compiled languages might faster than in Python, but they won't be faster than properly vectorized code. In fact, properly vectorized code in Python/PyTorch would probably be faster than a sloppy C++ implementation.

1

Comprehensive_Ad7948 t1_j12206e wrote

You're probably not using numpy enough. I do computer vision with a lot of custom ops and moved from C++ to Python years ago - it's just superior for about anything other than some special cases where you can only afford a few ms of delay in a control loop or something like that. Vectorized numpy or GPU frameworks are the way to go, just cleaner and better than the nested loops. You get used to ndarray/matrix ops and become more productive.

1

Zulfiqaar t1_j13dtqo wrote

Quite often, developer time is more valuable than machine time. It's therefore cheaper and more efficient and even sometimes faster, to quickly write the code and spend more on compute power, than it is to reduce compute time at the cost of development speed. Especially in the R&D phase.

1

rk3000 t1_j1ftjee wrote

Give Julia a try: https://fluxml.ai

It's just-in-time compiled with speed comparable to C/Fortran. And the syntax is as easy as python.

1

domestication_never t1_j10dp68 wrote

Python isn't slow at all, provided you hold it right. Pandas/Numpy are pure C libraries under the covers, provided you are dealing with it not row-at-a-time. PyTorch etc drop down pretty much immediately into C (and the GPU specific libs). Python is just kinda binding it togther.

One of the reasons Python gained so much traction so quickly was easy integration with C. Using FFI, I can open and call C functions from a share library in about 5 lines of code. Plus python has very readable and straight forward C code itself, it's not that hard to make "pure C" extensions to python.

And now optional typing in Python allows for great JIT compilation, so even the pure python parts are getting quicker.

Most importantly: Python is blazing fast where it needs to be, developer speed. Scientists and engineers are the expesnive part.

0