Comments

You must log in or register to comment.

A1-Delta t1_iwkzn5f wrote

I’m not sure there are any laws around what can and cannot be used as training data. It is a sort of grey area right now and current precedent (think copilot) is that you can use whatever you want without worrying about its source so long as your model is generating something new (not just selecting and presenting data you gave it).

6

blunzegg t1_iwl0kcr wrote

As long as it is for educational purposes, you can use YouTube content as far as I know. However, you say you wanna use a TV series so that might cause some problems if you do it without permission from the TV channel . I suggest you to do not go that way and try to find another possible solution.

2

Ronny_Jotten t1_iwl2wqw wrote

Buying a copy of a TV series doesn't give you any additional rights to use it for training a model. It just gives you a right to possess the copy, and to watch it. If you do train a model, and don't break any DRM / technological protection measures (in the US), and you don't distribute the model or anything generated by it, then it's ok. What you do with it at home is your business. If you do distribute it then...??? Nobody knows for sure about the legality, because it hasn't been tested thoroughly in the courts.

If your work is non-commercial, and has no potential impact on the sales of the TV series, or other economic damage to anyone, there is very little trouble you could get into. It may be seen as "fair use", though that's not a guarantee right now. The worst would be a "cease and desist" or DMCA takedown order, from the lawyers of the rights holders of the show. How likely that is to happen, or succeed if you challenged it in court, would depend on the details of your specific case.

5

Ronny_Jotten t1_iwl38kq wrote

I don't believe there is such a legal precedent as you describe. Regarding your specific example, there is currently a multi-billion dollar class-action lawsuit against Copilot, for commercial copyright infringement damages.

7

Ronny_Jotten t1_iwl4ai8 wrote

It's true that there are some exemptions for "fair use" of copyrighted material for educational purposes, but there are details to be aware of and rules to follow. There is no difference between a TV series and general YouTube content in terms of requiring permission (or not, if it's fair use), they are both copyrighted.

You are more likely to get away with any copyright infringement of some random youtuber, than a commercial TV show, but only because the latter has a much greater economic interest, and money to pay lawyers to stop you.

2

lfotofilter t1_iwl8uly wrote

Many academic datasets already use clips and dialogue from well-known TV shows. E.g. MELD. I think for academic use you should be fine.

8

A1-Delta t1_iwn3n9s wrote

Sorry for confusion, I didn’t mean a legal precedent I meant practice precedent. Specifically, I meant that the legality of these practices have not yet been determined. They are in a grey area. We’ll see if legal precedent is set by the lawsuit you referenced. It’s not at all obvious that current laws apply here.

1

ID4gotten t1_iwnkpnf wrote

Your university should provide legal consultation. The answer is probably yes under "fair use", but I think a lot of people here really don't know what they're talking about so beware of legal advice on reddit. "IANAL"

1

LetterRip t1_iwrvtt5 wrote

It may or may not be fair use. Academic usage is a fair use defense, but it will depend on the specific nature of the usage. What will the trained model be used for? Also is the result transformative? Short version talk to a lawyer.

Also different countries have different copyright laws, so it could be much different if you are not in the US.

1