Viewing a single comment thread. View all comments

currentscurrents t1_j9ld7we wrote

>it spits out something that it mined from GitHub.

Having used GitHub Copilot a bunch, it's doing a lot more than just mining snippets. It learns patterns and can use them to creatively solve new problems.

It does memorize short snippets in some cases (especially when a snippet is repeated many times in training data), but in the general case it comes up with new code to match your specifications.

>I set all of my github projects to private but I don't know if that helps.

Honestly, kinda selfish. We'll all benefit from these powerful new tools and I don't appreciate you trying to hamper them.

4

Disastrous_Elk_6375 t1_j9nrm6w wrote

> It does memorize short snippets in some cases (especially when a snippet is repeated many times in training data)

And, to be fair, how can it not? How many different ways can you write a simple for loop to print some objects, or match a regex, call an API, and so on?

5

visarga t1_j9qxgt2 wrote

If you go down to individual words or characters, everything is reused. If you go up, usually a random 10 word snippet is nowhere else in the internet. But boilerplate and basic things might be replicated in all shapes and forms.

1

1973DodgeChallenger t1_j9lgjq4 wrote

Just for example, you work at a company that has spent millions investing in a proprietary software product. You're saying everyone should have access to the source code, through Chat GPT or otherwise?

Can I have all of your and your companies source code please. I'll send you my email address.

1

currentscurrents t1_j9pb0by wrote

You had your source code public until you got freaked out by ChatGPT, so you were entirely okay with publishing it for everyone to see.

ChatGPT doesn't even allow direct access to source code, it's just learning how to solve problems using existing source code as training examples.

1

visarga t1_j9qxt97 wrote

Well, you can't. Because it is really hard to extract any verbatim replications of training data from chatGPT. You need to put a considerable portion from the work as prompt, to put the model in the right place, and then sample your way ahead. Doesn't work for most stuff, like 99%.

1

visarga t1_j9qwzlf wrote

> Honestly, kinda selfish. We'll all benefit from these powerful new tools and I don't appreciate you trying to hamper them.

They took their little pebble from the beach back home, that'll show them.

0