fastglow t1_j0zlqhj wrote
- Using libraries available in a given language does not mean you are not programming in that language.
- Python is a very popular, flexible, and easy language with a huge dev community. This drives development of ML tools for Python, which in turn increases adoption of Python for ML.
vprokopev OP t1_j0zo91k wrote
- But you have to use vectorized operations instead of loops, pytorch functions for everything you want to be fast etc... This is different from just using a library. It's ONLY using the library
- There are better languages that are also flexible and popular. And python is not so flexible anyway (GIL).
suflaj t1_j0zq31h wrote
What if I told you that even if you were using C/C++, you'd still need to be using library functions? Because the code, ultimately, doesn't run natively, it calls Fortran, Assembly and CUDA libraries.
You cannot directly program in whatever CUDA compiles to because it's proprietary and GPU model-specific, so why bother? Researchers chose Python not because they like snakey-boys or enjoy British comedy, they chose it because it is adequate to do research in, unlike C or C++, which are horrible to work with and too hideous to read and understand even if a pro writes them, let alone some researcher.
Ultimately Python code is easier to maintain and work on, and there are more developers as opposed to C/C++, so of course companies will use it over whatever C++ API exists for DL libraries.
As for your Rust/Go question, although Go has some potential it has no community to work with. It is also harder to use than Python. There is almost no benefit of using Go over Python even if the decision was to be made now, let alone transfer, other than Go's nice concurrency model. Now, why would you use that when from joblib import delayed, Parallel
does the trick? So far, the biggest problem Python has with concurrency is its lack of a good shared memory API, which is probably going to be fixed in a year or so now that it is part of Python. But this lack of API does not significantly impact Python, because you'd do this kind of stuff via a C module anyways.
As for Rust it will probably never become a serious contender for research and development because it is even more hideous and complex than C/C++ are. It is also slower, so, what's the point? Unless you want to double the average wages of people in DL and kill 90% of jobs since barely anyone can use Rust effectively.
vprokopev OP t1_j0zqzxm wrote
Again, in C++ I am not so much constrained to only use Pytorch functions. I can use other libraries and native features.
In python I basically must express any algorithm I have in my head in terms of vectorized pytorch operations with broadcasting. Not the case in C++. Am I wrong here?
I am not taking about researchers, I am talking more about businesses. No problem with researchers using python.
suflaj t1_j0ztusm wrote
> Not the case in C++. Am I wrong here?
Probably. It seems you "have" to do these things because you want speed. But if you want speed, then you'll have to do them in C++ as well.
> I am not taking about researchers, I am talking more about businesses.
This applies to businesses more than anything. Your manager does not give a single fuck about the purity and the performance of your code before its deployed. Until then the only thing that matters is that the product is developed before your competitors get the contracts for as low of a cost as possible.
And when code is deployed, it will often not even be in C++. A lot of the time you have to port it to C because there are not C++ compilers for a platform, or you will keep it in ONNX format and then deploy on some runtime to keep maintenance easy.
RaptorDotCpp t1_j0zttgc wrote
You'd still use vectorized functions in C++ though, just because they're faster for doing algebra
[deleted] t1_j0zrakb wrote
[removed]
suflaj t1_j0zthsd wrote
Looking at your post history, there are plenty of things I could make fun of. Dehumanize you even.
But instead of stooping to your level, all I will say is - I frequently program in C and (x86 & aarch64) Assembly, but I recognise that many of my R&D colleagues are not engineers, and that their strengths can be better utilised if they focus on the things they are good at.
Ricenaros t1_j0zqyxw wrote
using vectorized operations isn't just a design choice of the language you're programming in. It's a fundamental concept for optimizing code. for loops don't magically become fast just because you're using C++. For example, google "vectorize for loop c++" there are tons of results. In general you don't want to be using loops, especially for large scale data problems.
vprokopev OP t1_j0zt2rb wrote
Agree, what I am trying to say is C++ gives you more freedom here. In python it's just no way you can use native features in anything you want to run fast.
Vectorized operations are goated, I agree. But I don't want to be constrained to always use them. Especially when I have to write a lot of very specific modifications to data before I feed it to a NN.
Veggies-are-okay t1_j0zq951 wrote
- I mean at that point why not start using C for most efficient processes? I think you’re thinking that for loops are kind of this “base” truth when it’s yet another tool that happened to come first in the history of programming. Vectorized operations are a godsend and have allowed programmers to graduate beyond for loops. I don’t have to write out my little optimized c++ function every time I want to run an apply statement or whatever. Specific to ML, I don’t have to manually format my data structure to hook it up to pytorch; I just gotta have a pandas data frame with some minor tweaks to stick it into a neural network.
Zealousideal_Low1287 t1_j0zr6ix wrote
The GIL has nothing to do with ‘flexibility’
vprokopev OP t1_j0ztfxi wrote
Flexible = more freedom as I understand ot. GIL = less freedom. Maybe you have different definition.
I do understand it's needed for python to be what it is now and to become so popular.
Zealousideal_Low1287 t1_j0zutkt wrote
Yeah, no
rehrev t1_j10052l wrote
The fast ones are not flexible.
Python is flexible, you can do loops.
Of you want to be fast, you won't be doing loops in C++ either.
Overall, you sound confused.
vprokopev OP t1_j0zppi0 wrote
Every time I need to manipulate tensors I have to describe my algorithm in terms of vectorized pytorch functions and broadcasting, instead of more straightforward loops and indexing.
How is that supposed to make my job easier?
fastglow t1_j0zqs33 wrote
It sounds like your issues are more about PyTorch than Python itself. The need for vectorization is not specific to a language, though some libraries make it easier than others. If you want automatic vectorization, have a look at Jax, which has grown tremendously in the past couple years.
wadawalnut t1_j16qjat wrote
You should try Julia!
Viewing a single comment thread. View all comments