Viewing a single comment thread. View all comments

VitaminD263 t1_iz0dmzt wrote

Because it's making almost no errors on basically any kind of shell input, there just isn't enough data on the web to allow current language models to generate such accurate output imo.

0

liquiddandruff t1_iz3c7zc wrote

Uh, how about all those guides and blogs on any number of command line utilities?

7

VitaminD263 t1_iz3p5wq wrote

There's still not enough data. I believe it must have had access to some environment in which it could have executed commands. Compare how well ChatGPT performs on computing stuff and how badly it performs on other topics. E.g. is there significantly more data available on the web on just some specific kind of shell command (note that it generates the correct shell output for any kind of input) compared to say blog posts on real analysis? If you try to query chatgpt for its understanding of real analysis definitions it will abysmally fail, but there should be way more text available on that topic than some random shell command and definitely not enough data for any kind of input. I really don't believe that current generation language models are capable of learning the semantics of terminal commands.

0

baconninja0 t1_iz4uwd3 wrote

The shell commands found on websites will probably be more similar site to site than non-code topics, especially since I’m pretty sure a lot of code content farm sites just steal each other’s code anyways. This makes it much easier for the bot to learn than other topics because it sees the exact same command so many times, instead of just similar commands (which it has to learn are similar)

1

VitaminD263 t1_iz4w5i9 wrote

Yea or you know you could just make up input, let it execute the code and get the output to create your training data...

−3

PromiseChain t1_iz90i6y wrote

You’re just demonstrating you don’t understand this technology. This is not piping anything into a terminal anywhere. There is no 3080 that actually got installed by OpenAI to provide this data, they explain their data transparently. This is modeled from stackoverflow answers most likely.

3

2b100k t1_izri10k wrote

Agree with you here, there are resources available online but I wouldn't think it's enough to train an AI on it's own.

I am very impressed by chatgpt, it immediately gave me the correct answer on how to resolve an issue when I accidentally skipped a step in installing Gentoo linux. It also gives really detailed answers on troubleshooting all sorts of linux programs.

It's hard to explain, it feels too accurate a lot of the time for answers that would have to be trained from relatively small amounts of data (for an AI)

1