Submitted by DarrenTitor t3_ywrv3u in MachineLearning
For acadamic usage. I'm curious if I will get into troubles by doing this.
Submitted by DarrenTitor t3_ywrv3u in MachineLearning
For acadamic usage. I'm curious if I will get into troubles by doing this.
As long as it is for educational purposes, you can use YouTube content as far as I know. However, you say you wanna use a TV series so that might cause some problems if you do it without permission from the TV channel . I suggest you to do not go that way and try to find another possible solution.
Buying a copy of a TV series doesn't give you any additional rights to use it for training a model. It just gives you a right to possess the copy, and to watch it. If you do train a model, and don't break any DRM / technological protection measures (in the US), and you don't distribute the model or anything generated by it, then it's ok. What you do with it at home is your business. If you do distribute it then...??? Nobody knows for sure about the legality, because it hasn't been tested thoroughly in the courts.
If your work is non-commercial, and has no potential impact on the sales of the TV series, or other economic damage to anyone, there is very little trouble you could get into. It may be seen as "fair use", though that's not a guarantee right now. The worst would be a "cease and desist" or DMCA takedown order, from the lawyers of the rights holders of the show. How likely that is to happen, or succeed if you challenged it in court, would depend on the details of your specific case.
I don't believe there is such a legal precedent as you describe. Regarding your specific example, there is currently a multi-billion dollar class-action lawsuit against Copilot, for commercial copyright infringement damages.
It's true that there are some exemptions for "fair use" of copyrighted material for educational purposes, but there are details to be aware of and rules to follow. There is no difference between a TV series and general YouTube content in terms of requiring permission (or not, if it's fair use), they are both copyrighted.
You are more likely to get away with any copyright infringement of some random youtuber, than a commercial TV show, but only because the latter has a much greater economic interest, and money to pay lawyers to stop you.
Many academic datasets already use clips and dialogue from well-known TV shows. E.g. MELD. I think for academic use you should be fine.
Sorry for confusion, I didn’t mean a legal precedent I meant practice precedent. Specifically, I meant that the legality of these practices have not yet been determined. They are in a grey area. We’ll see if legal precedent is set by the lawsuit you referenced. It’s not at all obvious that current laws apply here.
You may be interested in this video from Zyte that talks about 'transformational' works: https://www.zyte.com/learn/is-web-scraping-legal/
The above video also mentions that if you have accepted T&Cs that may change things.
Yannic last week also mentions how things can change if you have accepted T&Cs: https://www.youtube.com/watch?v=W5M-dvzpzSQ
It may or may not be fair use. Academic usage is a fair use defense, but it will depend on the specific nature of the usage. What will the trained model be used for? Also is the result transformative? Short version talk to a lawyer.
Also different countries have different copyright laws, so it could be much different if you are not in the US.
A1-Delta t1_iwkzn5f wrote
I’m not sure there are any laws around what can and cannot be used as training data. It is a sort of grey area right now and current precedent (think copilot) is that you can use whatever you want without worrying about its source so long as your model is generating something new (not just selecting and presenting data you gave it).