RomanRiesen t1_iyw8l9k wrote on December 4, 2022 at 5:39 PM

That's quite funny.

CryptogeniK_ t1_iyyh76h wrote on December 5, 2022 at 2:58 AM

That would make the coolest honeypot

yaosio t1_iyxrw70 wrote on December 4, 2022 at 11:43 PM

They'll need to train an AI that can explain how it works.

VitaminD263 t1_iyzzhv1 wrote on December 5, 2022 at 1:34 PM

Does anybody have thoughts on how they might possibly have created data for this? I'm completely stumped about the knowledge it has of this kind of tech knowledge and don't think there's even remotely sufficient data on the web that would allow it to generate this kind of content. Did they use some self-learning environment in a terminal?!

PromiseChain t1_iz0a07b wrote on December 5, 2022 at 3:00 PM

>and don't think there's even remotely sufficient data on the web that would allow it to generate this kind of content

Why do you think that

VitaminD263 t1_iz0dmzt wrote on December 5, 2022 at 3:27 PM

Because it's making almost no errors on basically any kind of shell input, there just isn't enough data on the web to allow current language models to generate such accurate output imo.

liquiddandruff t1_iz3c7zc wrote on December 6, 2022 at 3:48 AM

Uh, how about all those guides and blogs on any number of command line utilities?

VitaminD263 t1_iz3p5wq wrote on December 6, 2022 at 5:53 AM

There's still not enough data. I believe it must have had access to some environment in which it could have executed commands. Compare how well ChatGPT performs on computing stuff and how badly it performs on other topics. E.g. is there significantly more data available on the web on just some specific kind of shell command (note that it generates the correct shell output for any kind of input) compared to say blog posts on real analysis? If you try to query chatgpt for its understanding of real analysis definitions it will abysmally fail, but there should be way more text available on that topic than some random shell command and definitely not enough data for any kind of input. I really don't believe that current generation language models are capable of learning the semantics of terminal commands.

vino_and_data t1_iz957hr wrote on December 7, 2022 at 11:48 AM

OMG. >>I believe it must have had access to some environment in which it could have executed commands.

Calm down maybe??!

VitaminD263 t1_izax31y wrote on December 7, 2022 at 7:41 PM

Calm down?

It's not as if I'm the only one claiming that. https://twitter.com/yoavgo/status/1599886211656491008

baconninja0 t1_iz4uwd3 wrote on December 6, 2022 at 2:26 PM

The shell commands found on websites will probably be more similar site to site than non-code topics, especially since I’m pretty sure a lot of code content farm sites just steal each other’s code anyways. This makes it much easier for the bot to learn than other topics because it sees the exact same command so many times, instead of just similar commands (which it has to learn are similar)

VitaminD263 t1_iz4w5i9 wrote on December 6, 2022 at 2:36 PM

Yea or you know you could just make up input, let it execute the code and get the output to create your training data...

PromiseChain t1_iz90i6y wrote on December 7, 2022 at 10:45 AM

You’re just demonstrating you don’t understand this technology. This is not piping anything into a terminal anywhere. There is no 3080 that actually got installed by OpenAI to provide this data, they explain their data transparently. This is modeled from stackoverflow answers most likely.

VitaminD263 t1_izb0tz8 wrote on December 7, 2022 at 8:06 PM

I'm not saying that it's using a terminal behind the scenes, I'm saying the data used to train this was likely generated by using an execution environment. There are serious NLP researchers believing this as well: https://twitter.com/yoavgo/status/1599886211656491008

2b100k t1_izri10k wrote on December 11, 2022 at 8:10 AM

Agree with you here, there are resources available online but I wouldn't think it's enough to train an AI on it's own.

I am very impressed by chatgpt, it immediately gave me the correct answer on how to resolve an issue when I accidentally skipped a step in installing Gentoo linux. It also gives really detailed answers on troubleshooting all sorts of linux programs.

It's hard to explain, it feels too accurate a lot of the time for answers that would have to be trained from relatively small amounts of data (for an AI)

vino_and_data t1_iz9533f wrote on December 7, 2022 at 11:47 AM

Hey!! This is the training data they used to train GPT3. It's only 45 TB of web data: https://imgur.com/rktCj8q

YellowChickn t1_iyyn2tf wrote on December 5, 2022 at 3:47 AM

Wow ok, was it also trained on code (snippets)?

vino_and_data t1_iz95cfc wrote on December 7, 2022 at 11:50 AM

Here is the training dataset on which it is trained. You can find details on GPT3 paper: https://arxiv.org/pdf/2005.14165.pdf

[D] OpenAI’s ChatGPT is unbelievable good in telling stories!

PromiseChain t1_iyw1m45 wrote on December 4, 2022 at 4:52 PM