Viewing a single comment thread. View all comments

youre_a_pretty_panda t1_ixfharn wrote

I've said the same in similar articles but it bears repeating: This case will boil down to a few simple factors.

What is the output of the AI? Does it create something new or does it merely regurgitate copy-pasted output?

If it merely spits out pre-existing code then it is clearly a copyright infringement.

However, it should be very clearly noted that simply training an AI model on a dataset does not violate copyright law. The output is key. If the AI creates new versions of, say for example, paintings then those are now new and unique works if they are sufficiently distinct from the originals in the training data set (there is a long history of precedent for testing whether works of art are sufficiently distinct)

This is a fundamental point on which courts will inevitably have to settle. Anything else would not only stifle innovation (because small AI teams could never afford to pay exorbitant licensing fees for data sets while big corps could easily do so) but it would be bad law that flies in the face of centuries of precedent regarding the creation of new and derivative works.

People need to use their brains and see that what Microsoft is doing can be illegal and bad (if code is regurgitated) but, other projects which are training their AI on publicly available data sets are not breaking the law. It all depends on the output.

You cannot copyright a style and you can't police every AI in the world to ensure that no copyrighted work was ever used in their training. That would be a fools errand.

Output is key.

2