Viewing a single comment thread. View all comments

Ieris19 t1_jd2u1ei wrote

Say Google wants to develop an AI that writes books right? So they need a lot of text written by humans to train it right? Well, Google Docs is full of that.

Microsoft and OpenAI did the same thing with coding AI’s. They’ve trained GPT3 on GitHub code to get AIs to write code.

Google’s business is advertising after all, so just like they could train an AI to write, imagine how much data they can collect and feed to algorithms about you to target ads that they know will sway you. It’s not necessarily that an employee at Google is reading your emails. Or that the government is spying on you to catch criminals. The issue is more that an algorithm/AI is learning all about you and honing itself to recommend what you will consume, and thus, generate clicks and money for them.


BelgianBeerGuy t1_jd2wcva wrote

>> Say Google wants to develop an AI that writes books right? So they need a lot of text written by humans to train it right? Well, Google Docs is full of that.

I don’t think Google wants to train an AI on all the crap I wrote in Google docs. Let alone all the spelling and grammar errors people make in those docs.
For an AI that can write books, they probably just use actual books.


Ieris19 t1_jd2wndl wrote

Read my other comment. I was more trying to make an example rather than something anyone would actually wanna do.

It was more about illustrating that the use we have for data is not necessarily the same one a company has for it.

Never said it had to be a successful AI, or a good idea


kimbosdurag t1_jd2ukbd wrote

Interesting I didn't think about that. It also wouldn't be tough for them to just scrape blogs and news sites, sites like Wattpad that host writing. Lots of data out there for the taking. I'm very curious to see how ai evolves from this point out as a consumer product.


Ieris19 t1_jd2vf0v wrote

That was more an example, rather than something they would actually do. Of course there is a million other ways of doing it, but the more control you have over the data, the better you can develop an AI.

I mean, Google’s already mastered AI. People tend to think of natural intelligence (like humans) when they think about the development of AI.

AI is just a learning system. Google recommendations are a complex calculation on everything that you’ve recently interacted with to figure out the thing most likely you’ll want next.

Unless the function is completely static, which I doubt, it would be considered AI, even if it doesn’t attempt to imitate real intelligence. The function is probably given some weights from Google engineers (basically, what results are valuable), and through trial and error, the program is likely learning how to get more clicks. The more data it can process, the more users it has to try with, the faster it can advance.

This is of course pretty simplistic compered to the math behind how this works, but it gets the point across


StateChemist t1_jd3aga6 wrote

There is some potential legal issues if you scrape someone else’s data to train your AI. If your users signed the ToS there is no legal recourse so they can use anything.


IdlyOverthink t1_jd373mj wrote

This speculation borders on misinformation. According to Google's privacy policy they have no access to content you've saved in Google drive except where required by law, or with your explicit permission.

I'm not trying to defend a big corporation; it's likely that Google is doing other questionably ethical things, but comments like this which point in patently false directions distract from the actually important transgressions.

This is entirely different from a model being trained on public GitHub code; it's not possible without Google making claims that opens themselves up to litigation. (Companies won't do this... There's no reason to make themselves financially vulnerable like that.)


Ieris19 t1_jd39bbu wrote

Again, that is mostly an example. Of course, it wouldn’t even be a good idea to begin with.

But people seem confused, so now my question is how would I make it more obvious that is just a simplified example


IdlyOverthink t1_jd3qa8e wrote

I think your point is that "Google likes having [the data in the services OP asks about] because it could mine that data."

Per their own site:

>We never use the content you create and store in apps like Drive, Gmail, and Photos for any ads purposes.

Here's their source for how they don't use it for training an ML model either.

I think I would choose a different example to support your point because it implies too many (false) conditions, and in doing so establishes a non-existent precedent.

>Of course, it wouldn’t even be a good idea to begin with.

This still entertains the premise that they'd try, and I think that's what I'm trying to address. It's not that it's not a good idea, it can't be an idea. Google has made commitments to making this impossible, so worrying about the ethics, whether it's worth the cost, whether it's a worthy source, etc is a distraction from the actual possibilities/answers.

As said by others, Google Drive is a gateway drug into Google's other services. Beacuse of that, it can be private even from Google because Google uses data from those other services to train their models, and provide ads data.

For example, when you're working on a research paper, Google can glean your area of study (adjacent to "your interests), your level of education (and more) from your search keywords, the time you're searching, etc.


Ieris19 t1_jd3r6in wrote

The fact they they currently don’t need to and the fact that they don’t plan to, doesn’t mean they can’t. They’re sitting on a huge stockpile of stuff they can use, and thinking a company will store my gigabytes of data for years on end and never delete it and not even use it in hopes to get me to use their other products is ridiculous. They’re clearly using it in one way or another, whichever that way turns out to be.

No one expected Microsoft to run the same shit on all their products yet here we are regardless.


alchippa t1_jd2yxlh wrote

Suppose I stored some code in GitHub, can anyone else just take it like that? Can GitHub use my code for training without my permission? Or did I already grant permission in their fine print?


Ieris19 t1_jd2z28i wrote

That is precisely why they’re getting sued. We’re not sure if it’s legal, ethical or how copyright applies since it’s not using your code but learning from it