sam__izdat t1_j9ngakh wrote on February 23, 2023 at 5:41 AM

Reply to comment by vyasnikhil96 in [R] Provable Copyright Protection for Generative Models by vyasnikhil96

I don't have any technical criticism that would be useful to you (and frankly it's above my pay grade), but to expand on what I meant when I said that it's a game of calvinball, there's some history here worth considering. Copyright has gone through myriad justifications.

If we wanted to detect offending content by the original standards of the Stationers' Company, then it may be useful to look for signs of sedition and heresy, since the stated purpose was "to stem the flow of seditious and heretical texts."

By the justification of the liberals who came after, typesetting, being a costly and error-prone process, forced their hand to protect the integrity of the text. So, if for some reason we wanted to take that goal seriously, it might make sense to look for certain kinds of dissimilarity instead: errors and distortions in reproductions. After all, that was the social purpose of the monopoly right.

If the purpose of the copyright regime today is to secure the profits of private capital in perpetuity, then simple metrics of similarity aren't enough to guarantee a virtual Blackstonian land right either.

For example:

> In our discussions, we refer to C ∈ C abstractly as a “piece of copyrighted data”, but do not specify it in more detail. For example, in an image generative model, does C correspond to a single artwork, or the full collected arts of some artists? The answer is the former. The reason is that if a generative model generates data that is influenced by the full collected artworks of X, but not by any single piece, then it is not considered a copyright violation. This is due to that it is not possible to copyright style or ideas, only a specific expression. Hence, we think of C as a piece of content that is of a similar scale to the outputs of the model.

That sounds reasonable. Is it true?

French and Belgian IP laws, for example, consider taking an original photo of a public space showing protected architecture a copyright violation. Prior to mid 2016, taking a panoramic photo with the Atomium in the background was copyright infringement. Distributing a night photo of the Eiffel tower is still copyright infringement today. So, how would you guarantee that a diffusion model fall within the boundaries of arbitrary rules when those tests of "substantial similarity" suddenly become a lot more ambiguous than anticipated?

vyasnikhil96 OP t1_j9oi593 wrote on February 23, 2023 at 1:20 PM

Thanks! this was an interesting read.

sam__izdat t1_j9q0nd6 wrote on February 23, 2023 at 7:21 PM

likewise, thanks for sharing your work