Submitted by pm_me_your_pay_slips t3_10r57pn in MachineLearning
NitroXSC t1_j70z581 wrote
Reply to comment by mongoosefist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
https://en.m.wikipedia.org/wiki/Differential_privacy
Differential privacy has multiple methods of recovering the input data from output data, but that is most often only quite simple models. Hence it might be possible.
mongoosefist t1_j71dbhq wrote
Differential privacy methods work in a way that's quite similar to the denoising process of diffusion models already. The problem is that in most Differential privacy methods they rely on the discreteness of data. The latent space of diffusion models is completely continuous, so there is no way to tell the difference between similar images, and thus you can't tell which ones are from the training data if any at all.
For example, if you're pretty sure the diffusion model has memorized an oil painting of Kermit the frog, there is no way for you to say with any reasonable amount of certainty whether images you are denoising that turn out to be oil paintings of Kermit are from actual pictures, or from the distribution of oil paintings overlapping with the distribution of pictures of Kermit from the latent space, because there is no hard point where one transitions to the other, or a meaningful difference in density between the distribution
WikiSummarizerBot t1_j70z6dw wrote
>Differential privacy (DP) is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single individual, and therefore provides privacy.
^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)
Viewing a single comment thread. View all comments