Viewing a single comment thread. View all comments

NitroXSC t1_j6wdaje wrote

> Compute CLIP embeddings for the images in a training dataset.

A good follow-up question is to ask if it would be possible to recover a lot of the training data if you don't know the training data a priori.

18

mongoosefist t1_j6zfxe8 wrote

How would you know that you had recovered it if you didn't know the training data a priori?

1

NitroXSC t1_j70z581 wrote

https://en.m.wikipedia.org/wiki/Differential_privacy

Differential privacy has multiple methods of recovering the input data from output data, but that is most often only quite simple models. Hence it might be possible.

3

mongoosefist t1_j71dbhq wrote

Differential privacy methods work in a way that's quite similar to the denoising process of diffusion models already. The problem is that in most Differential privacy methods they rely on the discreteness of data. The latent space of diffusion models is completely continuous, so there is no way to tell the difference between similar images, and thus you can't tell which ones are from the training data if any at all.

For example, if you're pretty sure the diffusion model has memorized an oil painting of Kermit the frog, there is no way for you to say with any reasonable amount of certainty whether images you are denoising that turn out to be oil paintings of Kermit are from actual pictures, or from the distribution of oil paintings overlapping with the distribution of pictures of Kermit from the latent space, because there is no hard point where one transitions to the other, or a meaningful difference in density between the distribution

2

WikiSummarizerBot t1_j70z6dw wrote

Differential privacy

>Differential privacy (DP) is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single individual, and therefore provides privacy.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

1