ItsJustMeJerk t1_j6uymag wrote on February 2, 2023 at 1:44 AM

Reply to comment by Nhabls in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips

You're right, it's not exclusive. But I believe that while the the absolute amount of data memorized might go up with scale, it occupies a smaller fraction of the output because it's only used where verbatim recitation is necessary instead of as a crutch (I could be wrong though). Anyway, I don't think that crippling the model by removing all copyrighted data from the dataset is a good long-term solution. You don't keep students from plagiarizing by preventing them from looking at a source related to what they're writing.

ItsJustMeJerk t1_j6uqkv6 wrote on February 2, 2023 at 12:45 AM

Reply to comment by Nhabls in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips

Actually, data has shown after a certain size larger models end up generalizing more than smaller ones. It's called double descent.