Comments

You must log in or register to comment.

mileseverett t1_iyn1es7 wrote

100% backwards compatibility. Thank god.

319

SufficientStautistic t1_iyngd0t wrote

God forbid a deep learning framework would not be backwards compatible right lol

133

Gordath t1_iynu2m2 wrote

TensorFlow flashbacks.... 😵

95

mileseverett t1_iyo7646 wrote

Pytorch 2 lacking backwards compatibility is the best advertisement there is for using JAX

−67

NoKatanaMana t1_iynanv7 wrote

>We introduce a simple function torch.compile that wraps your model and returns a compiled model.

This will be interesting to try out and see how it develops.

110

gambs t1_iynencv wrote

I've been using JAX recently and the compiler has kicked my ass in so many ways, it is very hard to get used to and there are many things it just straight up prevents you from doing

Will be interesting to see if PyTorch can make a more enjoyable experience on this front

60

Desperate-Whereas50 t1_iynp7vk wrote

Imho you need functional programming or undefined behaviour (like in C/C++) to get high optimized code. Undefined behaviour is more pain than functional programming, so i doubt it.

Edit: And even C/C++ compilers like gcc have tags for pure functions to improve optimizations.

15

gambs t1_iyns2fy wrote

It’s not just functional programming, but for instance, you have to use jax.numpy instead of numpy when compiling, but also not every function from numpy is implemented in jax.numpy, and other issues like that

22

Desperate-Whereas50 t1_iynukpa wrote

I only have the informations of your link. So I dont know about the other issues you talk about.

But if you set for the functional paradigm it is obvious that you need some jax.numpy and that jax.numpy can not implement every numpy function. Numpy and some of its functions (like inplace updates) are inherent non functional. I cant imagine an other way to fix this.

6

kc3w t1_iyosur2 wrote

> Imho you need functional programming or undefined behaviour (like in C/C++) to get high optimized code.

That's not true, see rust.

7

Desperate-Whereas50 t1_iyow7w7 wrote

I am no rust expert therefore convince me that I am wrong, but that is only true if you dont use unsafe blocks. This would exclude using CUDA and as far as I know in some cases you need unsafe blocks to get C like performance.

But even if I am wrong and no undefined behaviour is needed. Even Rust has a pure function attribute to improve optimizations.

It just makes sense to use this improvements in libraries like pytorch/jax. Especially since mainly mathematical operations are performed that are pure functions anyway.

3

Craksy t1_iypi2h6 wrote

I'm no expert either, but you're right that using CUDA requires use of unsafe. I believe kernels are even written in C through macros.

However, using unsafe does not necessarily mean UB. You preferably want to avoid that regardless. And UB is not the only way a compiler can optimize. Unsafe code simply means that you are responsible for memory safety, not that it should be ignored.

I don't know, you're talking about UB as if it was a feature and not an unfortunate development of compilers over the years.

In fact, Rust made it very clear that if you rely on UB that's your pain. Don't come crying in a week when your shit does not compile anymore. No guarantees are made, and no extra consideration is made to maintain compatibility with programs that make up their own rules.

5

Desperate-Whereas50 t1_iyqiqqi wrote

>However, using unsafe does not necessarily mean UB. You preferably want to avoid that regardless.

>Unsafe code simply means that you are responsible for memory safety, not that it should be ignored.

Maybe I am wrong but I think you misunderstand UB. Of course you want to avoid UB and have memory safety in your code/executable because otherwise you can not argue about the program anymore. But you want UB (at least in C/C++ the language I work with) in your standard. UB is more like a promise of the programmer to not do specific things. The compiler assumes the code contains no UB and optimizies like that. See for example signed integer overflow. Because the compiler knows this is UB and the programmer promised to not allow it he can use better optimizations. Rust does not have this "feature" in safe blocks and produces less optimal code.

>And UB is not the only way a compiler can optimize.

I would not disagree about that. But if you want the last .x% of performance increase than you need it too. Especially if you want your language to work on different systems. Because even Hardware can have UB.

The only other option (as far as I know) you have to get some comperable (with UB assumption) performance is to rely on other assumptions like functions have no side effects etc.

>I don't know, you're talking about UB as if it was a feature and not an unfortunate development of compilers over the years.

As language specification it is like a feature. In the binary it is a bug. I have read enough discussions of UB in C++ threads to know that a lot of C++ developers dont see UB as unfortunate development of compilers.

>In fact, Rust made it very clear that if you rely on UB that's your pain.

By the way this is the sentence why I think you that you misunderstand UB. As mentioned: You should never rely on UB you promised the compiler to avoid it. And by avoiding it the compiler can work better.

1

keturn t1_iynwayu wrote

Previews of PyTorch 2.0. i.e. you can get from the nightly builds.

> We expect to ship the first stable 2.0 release in early March 2023.

53

--dany-- t1_iyn91x1 wrote

The speed up is only available for newer Volta and Ampere GPUs for more. Hopefully with primTorch it’s easier to port to other accelerators in the long run. And the speed up is less prominent for consumer GPUs.

37

throwaway2676 t1_iyn1qmn wrote

Wow, this sounds pretty exciting. I wonder how the speed will compare to JAX or Julia.

26

SleekEagle t1_iyn7x1m wrote

One of my first thoughts as well! Is there any reason PT's speed ceiling would be lower than JAX's? Ik PyTorch-XLA is a thing but not sure about its current status

6

code_n00b t1_iyng534 wrote

These are some exciting sets of features!! It's especially great that there are no breaking changes.

I personally like semver a lot, so the only thing I don't like about this announcement is that they bumped the major version to 2.0 even though there is full backwards compatibility.

21

lexcess t1_iynsm74 wrote

SemVer allows for significant internal or additive changes to cause a major revision, so I wouldn't worry about it.

21

JanneJM t1_iyos1fp wrote

It allows for breaking changes; it doesn't require it.

Or I'm sure they could do another 1.x release with a new "hello world" function that they change the signature on for the 2.0 version otherwise.

8

lohvei0r t1_iynme39 wrote

Someone needed to be promoted to distinguished

7

Conscious_Heron_9133 t1_iyqf7a9 wrote

Why going towards JAX ONLY. As a quick survey: am I the only one who wants a high level differentiable framework in a strongly typed language?

8

p-morais t1_iyr0wzl wrote

All I want for Christmas is a strongly typed, low level, non-garbage-collected, safe programming language with pythonic syntax and first class tensor types and native compile-time autodiff. Is that too much to ask for?

10

Conscious_Heron_9133 t1_iz5j6a8 wrote

Lol, kind of the opposite -- I am not a fan of python syntax.

The rest? All except for low-level and non0garbage collected.

I mean... in case Santa were listening...

1

gdahl t1_iyy504n wrote

Have you tried Dex? https://github.com/google-research/dex-lang It is in a relatively early stage, but it is exploring some interesting parts of the design space.

2

Conscious_Heron_9133 t1_iz5ik8t wrote

I did, yes, but I found the syntax counterintuitive. It is very python-like, but its syntax was conceived to not include types declarations in the first place, and only later adapted to do so.

When I say high level differentiable framework in a strongly typed language I imagine to take something that works already as stronlgy typed, and then adapt it to automatic differentiation and jit compilation -- but not the opposite.

I refer to an hypothetical language that is, for example, what C# is to the C++. Similar syntax, higher level.

Does that make sense?

1

sash-a t1_j0b3pcg wrote

Flux in Julia is your answer. Although Julia isn't technically strongly typed (it's optionally typed for multiple dispatch) it's about as good as it gets imo

1

gokulPRO t1_iyplcsc wrote

Anyone personally tested the speedups? Please share if you did

6

CyberDainz t1_iyvj7ti wrote

so, with torch.compile people can keep writing graph-unfriendly code with random dynamic shapes and direct python code over tensors ?

4

formalsystem t1_iz9wzhi wrote

that's the goal yes, although dynamic shape support is still under works

3

maybethrowawaybenice t1_iyo9r2w wrote

This is pretty cool, hoping they can give specific by-gpu benchmarks at some point.

3

netw0rkf10w t1_iyqbmun wrote

The new compiler is so cool!!

Though virtually no speed-up on ViT: https://pbs.twimg.com/media/Fi_CUQRWQAAL-rf?format=png&name=large. Anyone has an idea on why?

2

marcodena t1_iytiekm wrote

fixed-size sequences?

1

netw0rkf10w t1_iyzid5m wrote

That's a good point. Though it's still unclear to me why that would result in no speedup.

1

keisersoje988 t1_iyreywv wrote

What dies it mean for my MNIST model?

2

alterframe t1_iz0ags7 wrote

I like how flexible they are about different compilation approaches. In TF2 the problem was that you always need to wrap everything in tf.function to get the performance improvements. Debugging it was a nightmare since for more complicated pipelines it could take several minutes just to compile the graph.

2

M4mb0 t1_iyqrp73 wrote

Will this allow to finally JIT-compile custom backward? (in python)

1

SirPandkok t1_iyt63wk wrote

Normally I build my model with TF, so I don't have a deep understanding of PyTorch, so I don't understand why this .compile thing is important. Can someone explain to me?

1

mankav t1_iywpn6o wrote

Basically, pytorch is like writing tensorflow 2 eager execution code. Now with compile maybe they create a static computational graph like tensorflow 1.x or tf.function?

2