Viewing a single comment thread. View all comments

trias10 t1_j7ibyhq wrote

As it should be. If openness of internet means a few people become rich off the back of training on large swathes of data without explicit permission, then it should be stopped.

OpenAI should pay for their own labelled datasets, not harvest from the internet without explicit permission, to then sell back as GPT3 and get rich off of. This absolutely has to be punished and stopped.

−5

VeritaSimulacra t1_j7icqmp wrote

I agree with the goal, but I don’t think making the internet more closed is the way to go. The purpose of the internet is to be open. Making everything on the internet cost something would have a lot of negative effects on it. The solution to the powerful exploiting our openness isn’t to make it closed, but to regulate their usage of it.

3

trias10 t1_j7ifhdq wrote

I agree, hence I support this lawsuit and hope that Getty wins, which I hope leads to some laws vastly curtailing which data AI can be trained on, especially when that data comes from artists/creators, who are already some of the lowest paid members of society (unless they're the lucky 0.01% of that group).

−2

currentscurrents t1_j7ipcip wrote

OpenAI is doing a good thing. They've found a new and awesome way to use data from the open web, and they deserve their reward.

Getty's business model is outdated now, and the legal system shouldn't protect old industries from new inventions. Why search for a stock image that sorta kinda looks like what you want, when you could generate one that matches your exact specifications for free?

−1

trias10 t1_j7irgui wrote

What good thing is OpenAI doing exactly? I have yet to see any of their technologies being used for any sort of societal good. So far the only thing I have seen is cheating on homeworks and exams, faking legal documents, and serving as a dungeon master for D&D. The last one is kind of cool, but the first two are illegal.

Additionally, if you work in any kind of serious research division at a FAANG, you'd know there is a collective suspicion of OpenAI's work, as their recent papers (or lack thereof for ChatGPT) no longer describe the exact and specific data they used (beyond saying The Internet) and they no longer release their training code, making independent peer review and verification impossible, and causing many to question if their data is legally obtained. At any FAANG, you need to rope Legal into any discussion about data sources long before you begin training, and most data you see on the internet isn't actually usable unless there is an explicit licence allowing it, so a lot of data is off limits, but OpenAI seems to ignore that, hence they never discuss their data specifics anymore.

We live in a world of laws and multiple social contracts, you can't just do as you feel. Hopefully OpenAI is punished and restricted accordingly, and starts playing by the same rules as everyone else in the industry. Fanboys such as yourself aren't helpful to the progress of responsible, legal, and ethical AI research.

2

currentscurrents t1_j7ivm0a wrote

>the only thing I have seen is cheating on homeworks and exams, faking legal documents, and serving as a dungeon master for D&D. The last one is kind of cool, but the first two are illegal.

Well that's just cherry-picking. LLMs could do very socially-good things like act as an oracle for all internet knowledge or automate millions of jobs. (assuming they can get the accuracy issues worked out - which there are tons of researchers trying to do, some of whom are even on this sub)

By far the most promising use is allowing computers to understand and express complex ideas in plain english. We're already seeing uses of this, for example text-to-image generators use a language model to understand prompts and guide the generation process. Or how Github Copilit can turn instructions from english into implementations in code.

I expect we'll see them applied to many more applications in the years to come, especially once desktop computers get fast enough to run them locally.

>starts playing by the same rules as everyone else in the industry.

Everyone else in the industry is also training on copyrighted data, because there is no source of uncopyrighted data big enough to train these models.

Also, your brain is updating its weights based on the copyrighted data in my comment right now, and that doesn't violate my copyright. Why should AI be any different?

5