Viewing a single comment thread. View all comments

UncleVesem1r t1_it5rffe wrote

I see! I understand why DDPM is good now. I should go back to the paper and pay more attention to the KL divergence part of it.

If I could borrow a few more minutes of your time, could you explain more about what's not as good about score matching?

So to be explicit, my understanding Langevin sampling is correct, i.e., if there's a model that can accurately model the score function, one should be able to recover the true data distribution. If this is true, then I guess the criticism regarding SM is about its objective function, i.e., there's no guarantee that it leads to accurate score function? But aren't the score matching algorithms (denoising, projection) supposed to be able to solve the objective function involving grad_x log p(x)?

Or perhaps Langevin sampling is the problem. The paper does say that with small enough noise and enough steps, we would end up in an exact sample from the data set. Yet if we don't have small enough noise and enough steps, perhaps we end up somewhere but it doesn't guarantee to be the true data distribution?

I really appreciate this! Thanks again.

1

Red-Portal t1_it5v31k wrote

>there's no guarantee that it leads to accurate score function? But aren't the score matching algorithms (denoising, projection) supposed to be able to solve the objective function involving grad_x log p(x)?

Oh no it's not. All it's doing is to minimize the mean-squares error against the score function. Minimizing this objective does not mean sampling using this score function will be a good idea; which it isn't. This is exactly why score modelling has to rely on adding noise. And by doing this, they converged to DDPM.

1

UncleVesem1r t1_it5wdui wrote

Very cool! I think the pitfalls mentioned in the SM paper also make more sense now.

Thank you kind sir/madam

1