Viewing a single comment thread. View all comments

lambertb t1_jds24dr wrote

It cannot solve all coding problems. But it can solve many problems. And if the user is reasonably experienced, even code with errors is useful because they can quickly be corrected. Preliminary evaluations show a 40% increase in developer productivity from GitHub Copilot. And that seems totally plausible to me.

57

enryu42 OP t1_jds3452 wrote

I absolutely agree that it is useful. Even CoPilot is amazing at autocompleting "dumb" boilerplate code, which is a nontrivial amount of the code overall. However, these problems are designed to be challenging (these are competitions after all), and require ideas/intelligence to be solved. Apparently GPT4 cannot do it at all, so IMO it would be a stretch to call whatever it is doing "intelligence".

18

dimsumham t1_jdsfpqb wrote

it's not. it's giving you answers to appear intelligent, many times in almost magical ways, but it doesn't "think" - especially in steps.

The MSFT paper notes that this is one of its clearest shortcomings - it can't do long range planning. At least not yet. But i think this is partially people expecting way too much of a single model.

14

Ciber_Ninja t1_jdu81zy wrote

It can in fact think in steps. All you have to do is ask it to. In fact, multiple papers have shown that asking it to think in steps provides a significant increase in the accuracy of it's answers.

1

audioen t1_jdujtbl wrote

Yes. Directly predicting the answer in one step from a question is a difficult ask. Decomposing the problem to discrete steps, and writing out these steps and then using these sub-answers to compose the final result is evidently simpler and likely requires less outright memorization and depth in network. I think it is also how humans work out answers, we can't just go from question to answer unless the question is simple or we have already memorized the answer.

Right now, we are asking the model to basically memorize everything, and hoping it generalizes something like cognition or reasoning in the deep layers of the network, and to degree this happens. But I think it will be easier to engineer good practical Q&A system by being more intelligent about the way LLM is used, perhaps just by recursively querying itself or using the results of this kind of recursive querying to generate vast synthetic datasets that can be used to train new networks that are designed to perform some kind of LLM + scratchpad for temporary results = answer type behavior.

One way to do it today with something like GPT4 might be to just ask it to write its own prompt. When the model gets the human question, the first prompt actually executed by AI could be "decompose the user's prompt to a simpler, easier to evaluate subtasks if necessary, then perform these subtasks, then respond".

3

Trotskyist t1_jdt8tx6 wrote

It's still an extremely useful tool if you accept its limitations, and I think it's being reductive to say it can only solve "dumb" problems or suggest boilerplate code.

I used GPT-4 the other day to refactor/optimize an extremely bespoke and fairly complicated geoprocessing script that we use at work that was written by a former employee who's no longer with the organization. Yes, it got some things wrong that had to be corrected (sometimes all it took was feeding it a stacktrace, other times this wasn't enough and I'd have to figure out the issue myself)

But at the end of the day (literally, this was over the course of an afternoon,) I'd managed to cut the runtime by more than half, using libraries I'd never before touched and wasn't previously familiar with. It probably would have taken a week to implement otherwise.

11

SWESWESWEh t1_jdt1ly6 wrote

So my guess is whoever ran the test did a bad job with the prompting. One of the examples they share, https://github.com/enryu43/llm_coding/blob/main/atcoder_eval/gpt4_solutions/abc294_c.py, the author complains it's not an efficient algorithm. So I just gave it a shot with ChatGPT:

Me: What does this code do?

def main(): N, M = map(int, input().split()) A = list(map(int, input().split())) B = list(map(int, input().split()))

C = sorted(A + B)
pos_A = [str(C.index(a) + 1) for a in A]
pos_B = [str(C.index(b) + 1) for b in B]

print(" ".join(pos_A))
print(" ".join(pos_B))

if name == "main": main()

ChatGPT: This code takes two lists of integers as input, merges them, sorts the merged list, and then prints the positions of the elements from the original lists in the sorted merged list.

Here's a step-by-step explanation of the code:

The main function is defined. Two integers, N and M, are read from the input and mapped to int using map(int, input().split()). Two lists, A and B, are read from the input, with each element being mapped to int using map(int, input().split()). A new list C is created by merging lists A and B and then sorting the resulting list using sorted(A + B). Two new lists, pos_A and pos_B, are created. These lists store the positions of the elements from lists A and B in the sorted list C. The positions are calculated using the index() method and are 1-indexed. The elements in pos_A and pos_B are joined into strings and printed, separated by a space. The if name == "main": statement checks if the script is being run directly (not imported as a module), and if so, calls the main function. Here's an example to illustrate how the code works:

Input: 4 3 2 1 4 3 5 6 7

Output: 1 2 4 3 5 6 7 The sorted merged list C is [1, 2, 3, 4, 5, 6, 7]. The positions of the elements from the original lists in the sorted list are 1, 2, 4, 3 for list A and 5, 6, 7 for list B.

Me: Could this code be re-factored to be my algorithmically efficient?

Yes, the code can be refactored to be more algorithmically efficient. The current implementation uses the index() method to find the positions of elements from lists A and B in the sorted list C. This approach has a time complexity of O(n^2) because index() is called for every element in both A and B.

To improve the algorithmic efficiency, you can use a dictionary to store the positions of the elements in the sorted list C. This will reduce the time complexity of finding the positions to O(n). Here's a refactored version of the code:

def main(): N, M = map(int, input().split()) A = list(map(int, input().split())) B = list(map(int, input().split()))

C = sorted(A + B)
positions = {value: index + 1 for index, value in enumerate(C)}

pos_A = [str(positions[a]) for a in A]
pos_B = [str(positions[b]) for b in B]

print(" ".join(pos_A))
print(" ".join(pos_B))

if name == "main": main()

The main changes in the refactored code are:

A positions dictionary is created that maps the values in the sorted list C to their 1-indexed positions. The pos_A and pos_B lists are created by directly looking up the positions of the elements from the positions dictionary. This refactored version of the code improves the algorithmic efficiency by reducing the time complexity of finding the positions of the elements from O(n^2) to O(n).


So it accurately analyzes that this code solves the problem without knowing the problem definition, and will spot the inefficiency if asked to actually write algorithmically efficient code. Seems fine to me

9

SWESWESWEh t1_jdt2m8y wrote

It often has errors, but if you just paste the errors into the chat it will generally fix them. In the early versions of chatGPT, I had issues with doing stuff like writing networking code in C++, but it still got me a lot of the way there.

I recently went over writing a high throughput async data pipeline in Java, and it did a great job of writing the code and even taught me a new design pattern. I had to make a few small changes here and there, but basically it turned a week of work into a couple hours. With the context of the written code there, I also had it write unit tests and documentation for me, and I was able to have it add more unit tests and also integration tests based on my feedback.

I'm fine with people underestimating how good ChatGPT is as a coding assistant, it just makes me look better because of how productive it makes me.

11

WokeAssBaller t1_jdvmmfp wrote

I don’t even roll yet but that 40% number, I would love to see how they calculated it.

I’ve tried gpt 4 on a lot of problems and it fails 9/10 times and I would be faster just googling it.

This stuff will be amazing it’s just not quite yet

1

WokeAssBaller t1_jdvodfn wrote

Yeah I don’t buy a survey, could be heavily biased

0

lambertb t1_je02zgn wrote

Have you used the tools yourself? I have, and a 40% increase in productivity is totally plausible, and often an underestimate considering I can now do things I would not have even tried previously. I encourage you to try them, with healthy skepticism and an open mind.

1

WokeAssBaller t1_je04bbu wrote

I’m and MLE and I’ve used it a bunch, it’s hardly ever actually useful. It gets close but it’s not there and it’s faster to google almost every time.

It will be useful in probably a year or two, but it needs to understand how to run its own experiments. Anyone who actually thinks this is useful right now is just buying hype

1

lambertb t1_je29sc5 wrote

Isn’t it possible that your experience is not representative? Are you using ChatGPT or GitHub copilot?

1

WokeAssBaller t1_je6fveg wrote

I doubt it, I do pretty standard engineering, whats more likely is there is selection bias in the survey and people are overestimating it due to hype.

I'd love to see an actual double blind study.

1

lambertb t1_je6jjps wrote

There can’t be a double blind study because the people using the copilot will know they’re using it.

1

WokeAssBaller t1_je783go wrote

Fair enough then give them problems to solve and measure their output. This feels like “90% of dentists claim crest improves your dental health”

I’ll take an independent study into consideration but today I find it more of a novelty

1

lambertb t1_je802eu wrote

I agree the survey study is nothing close to being definitive. And it does smack it marketing. Still, my own experience suggests that these tools will be transformative. At the same time, I’ve gotten lost down an AI rabbit hole where it would have been more efficient for me to just do it myself. On balance though, my assessment is that these are already very helpful tools, and they’ll only get better.

1

WokeAssBaller t1_je80fbg wrote

They will absolutely reshape the world in the next 5 years, all I'm saying is in its current state I haven't found it helpful. I'm sure in the next couple of years it's the main thing I will use

2