Submitted by SleekEagle t3_ye0iw7 in MachineLearning

Hey everyone!

Over the past couple years, we've all seen the incredible performance of Diffusion Models in Stable Diffusion, DALL-E 2, etc. A team at MIT recently published a new model that is inspired by physics much like diffusion models.

I wrote an introduction to Poisson Flow Generative Models as an explainer for the models, which use concepts from electrodynamics to generate data.

​

Some highlights:

  • 10-20x faster than diffusion models on image generation with comparable performance
  • Invertible mapping akin to normalizing flows allows for explicit likelihood estimate and a well-structured latent space
  • Operates on a deterministic ODE and has no stochastic element unlike diffusion models and score-based SDE approaches

Here are some example CelebA images being generated with PFGMs:

https://i.redd.it/9f826fz726w91.gif

Note that the highest resolution images explored in the paper are LSUN bedroom 256 x 256 images

Looking forward to answering any questions!

138

Comments

You must log in or register to comment.

Serverside t1_itw9tz2 wrote

Ok, I'll bite. It looks cool from what I see in the blog. How does the model being deterministic impact (or not impact) the generative capabilities? I would think that a deterministic mapping from original data to uniform angles would not perform as well when wanting to interpolate or extrapolate (for example, like VAEs vs normal autoencoders).

12

SleekEagle OP t1_itwepjh wrote

u/Serverside There are a few things to note here.

First, The non-stochasticity allows for likelihood evaluation. Second, it allows for the authors to use ODE solvers (RK45 in the case of PFGMs) instead of SDE solvers, potentially combined with an application-specific method like Langevin MCMC for score-based models. Further, for diffusion there are many discrete time steps that need to be evaluated in series (at least for the usual discrete time diffusion models). The result is that PFGMs are faster than these stochastic methods. Lastly, the particular ODE for PFGMs has weaker norm-time correlation than other ODEs, in turn making the sampling more robust.

As for the deterministic mapping, it is actually the reason (at least in part) that e.g. interpolations work for PFGMs. The mapping is a continuous transformation, mapping "near points to near points" by definition. I think the determinism ensures that interpolated paths in the latent space will transform to a well-behaved path in the data space, whereas a stochastic element would very likely break this path. The stochasticity in VAEs is useful to learn parameters of a distribution and is required to sample from this distribution, but once the point is sampled it is (usually) deterministically mapped back to the data space iirc.

20

Serverside t1_itwm4o3 wrote

I see. Thanks for the in depth response, and your answers make sense. The last follow up question I have is: Do PFGMs preserve the distribution of the data, or since it is transformed to a uniform distribution, is the original distribution of the data lost?

I know the other stochastic generative models usually try to match or preserve the distribution of data. Maybe you also somewhat answered this already in your second paragraph, but I just wanted to make sure I understood.

Again, your blog and code look neat. I look forward to toying with them on some data of my own.

7

SleekEagle OP t1_itxpvth wrote

My pleasure! I'm not sure I understand exactly what you're asking, could you try to rephrase it? In particular, I'm not sure what you mean by preservation of the data distribution.

Maybe this will help answer: Given an exact Poisson field generated by a continuous data distribution, PFGMs provide an exact deterministic mapping to/from a uniform hemisphere. While we do not know this Poisson field exactly, we can estimate given many data points sampled from the distribution. PFGMs therefore provide a deterministic mapping between the uniform hemisphere and the distribution that corresponds to the learned empirical field, but not exactly to the data distribution itself. Given a lot of data, though, we expect this approximate distribution to be very close to the true distribution (universal to all generative models)

Thanks for reading! I did have a little trouble getting the repo working on my local machine, so I might expect some trouble if I were you. I reached out to the authors while writing this article and I believe they are planning on continuing research into PFGMs, so don't forget to keep an eye out for future developments!

10

Serverside t1_itxsq3p wrote

Yeah you essentially answered what I was asking. I was basically asking if the output of a trained PFGM matched (or closely estimated) the empirical distribution of the training data. Since the end product of the “diffusion” was said to be a uniform distribution and the equations were ODEs not SDEs, I was having trouble wrapping my head around how the PFGM could be empirically matching the distribution. Thanks for answering all the questions!

5

andrew21w t1_itxt7q6 wrote

This looks neat as hell. Is there more literature where can I learn about this?

2

ThatInternetGuy t1_itxvv27 wrote

I'm not surprised at all. Simulated annealing is an important optimization technique that is inspired by the metal annealing process.

4

Ulfgardleo t1_ityn2zg wrote

i am not sure i buy the 10-20x faster claim as stated in the form of this post (but i did not have time to read the link, yet). This is because there are diffusion models that claim to be 10-40x faster than diffusion models.

5

SleekEagle OP t1_itzk3mv wrote

That claim is pulled directly from the paper. While I have not verified it myself, I would not be surprised if it were true if they are using discrete time diffusion models. I'm not very familiar with continuous time diffusion models, however, so maybe it would be a lot closer in that case!

1

SleekEagle OP t1_itzkagl wrote

This is the first paper on this approach! I spoke to the authors and they're planning on continuing research down this avenue (personally I think dropping a PFGM as the base generator for Imagen and then keeping Diffusion Models for the super resolution chain would be very cool), so be on the lookout for more papers!

3

Ulfgardleo t1_itzmz4u wrote

my comment was just a remark saying that "diffusion model" by itself is not informative, because there have been several approaches that already brought a lot of speed-up. The standard diffusion model is not state-of-the-art anymore.

1

andrew21w t1_itzrk2g wrote

Honestly, if one could deal with that, in theory you could do that with latent codes, essentially making a better GAN or whatnot (assuming I got how these models work of course)

2