mongoosefist t1_j71dbhq wrote on February 3, 2023 at 11:06 AM

Reply to comment by NitroXSC in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips

Differential privacy methods work in a way that's quite similar to the denoising process of diffusion models already. The problem is that in most Differential privacy methods they rely on the discreteness of data. The latent space of diffusion models is completely continuous, so there is no way to tell the difference between similar images, and thus you can't tell which ones are from the training data if any at all.

For example, if you're pretty sure the diffusion model has memorized an oil painting of Kermit the frog, there is no way for you to say with any reasonable amount of certainty whether images you are denoising that turn out to be oil paintings of Kermit are from actual pictures, or from the distribution of oil paintings overlapping with the distribution of pictures of Kermit from the latent space, because there is no hard point where one transitions to the other, or a meaningful difference in density between the distribution