Submitted by IluvBsissa t3_11e7csf in singularity

We "know" GPT-4 may have a context window of 32 000 tokens at least. That's enough for writing a 50 pages short story or a small program.

How many token do we need for it to generate huge programs like a new social media, web navigator, specialized corporate software or even a whole operating system ? That's around 20-50 million lines of code max.

Of course, there might be a huge loss of consistency, and a large amount of bugs, but that's just a thought game.

When we are at it, how many tokens is your average line of code in LLM context ?

It's just that I really want to emulate some non-free browser extensions that help me in my studies, and it's probably around a million lines of code.

Thank you.

EDIT : Assuming 32 000 tokens = 50 pages = 1250 lines of code, that means we need 40 000 x bigger context window for generating 50 million lines of code, so around 1 280 000 000 tokens. That's quite a lot.



You must log in or register to comment.

basilgello t1_jacon0o wrote

Software is architecture defined in code. Minimal common sense reasoning is definitely not enough to write and maintain huge software codebases. And LLMs pass even these reasoning tests with "n-th order of understanding". Writing snippets is one thing, but forward-and reverse-engineering of complex problems is another because the number of possible ways to achieve the same result grows exponentially, but evaluating the optimality of each solution is another task different from what LLM does.


Lawjarp2 t1_jacugjc wrote

Context won't even matter. No single person wrote all those millions of lines of code. No single person needs to know all of it. Just the functionality of each module and how to use it is enough as context for others to use it and build their own module.

Essentially a 32k or even 8k context would itself be enough. But chatGPT as it is now is not robust.


challengethegods t1_jact6ez wrote

well, the context window is not as limiting as people seem to think. That's basically the range of text it can handle in a single instant - for example if someone asked you a trick question, and the predictable false answer pops into your head immediately - that's what a single call to an LLM is. Once people figure out how to recursively call the LLM inside of a larger system that's keeping track of longterm memory/goals/tools/modalities/etc it will suddenly be a lot smarter, and using that kind of system can have even GPT-3 write entire books.

The problem is, the overarching system has to also be AI and sophisticated enough to compliment the LLM in order to breach into a range where the recursive calls are coherent, and context window is eaten very quickly by reminding it of relevant things, to a point where writing 1 more sentence/line might take the entire context window just to have all the relevant information, or even an additional pass afterwards to check the extra line against another entire block of text... which basically summarizes to say that 8k context window is not 2x as good as 4k context window... it's much higher, because all of the reminders are a flat subtraction.

realworld layman example:
suppose you have $3900/month in costs, and revenue $4000/month =
$100/month you can spend on "something".
now increase to revenue to $8000/month,
suddenly you have 41x as much to spend


RabidHexley t1_jad8r8t wrote

> for example if someone asked you a trick question, and the predictable false answer pops into your head immediately - that's what a single call to an LLM is

Yep. This is the biggest issue with current consumer LLM implementations. We basically force the AI to word-vomit the first thing it thinks of. It's very good at getting things right in spite of that, but if it gets it wrong the system has no recourse. Coming to a correct conclusion, well-reasoned response, or even just coming to the conclusion that we don't know something requires multiple passes.


Borrowedshorts t1_jadchbs wrote

Humans don't have anywhere close to a 32,000 token context window, at least in terms of performing useful output from learned context. You don't need that big of a context window, you break the problem down into manageable steps to solve a problem.


Surur t1_jadaixl wrote

We already know having a LLM break down a task into steps dramatically improves accuracy, so that would be the obvious choice for a large software project - break down into steps and iterate down the project tree.


No_Ninja3309_NoNoYes t1_jacwzfq wrote

Purportedly Twitter has 20M LoC Scala. Scala is a JVM language that is somewhat more concise than Java. IDK how much of that is unit tests, documentation, and acceptance tests. Anyway style, programming language and culture matter. Some coders can be verbose, others just want to get the job done. You can write unreadable code in any language. This is fine for small projects because you can figure out what is going through trial and error. For Twitter it will not work. The bigger the teams the clearer and more defensive you have to code. Defensive code is verbose since you are checking for preconditions that might rarely occur. Some languages are more verbose than others.

But anyway no one codes bottom up. You usually start with a global design and iterate multiple times using mock ups if something is still vague. I don't think your question has an answer right now. Someone has to try it and see what the issues are.


sunplaysbass t1_jadcy74 wrote

This is not radically optimistic for the near term - not appropriate for this sub.


dasnihil t1_jacu838 wrote

comparing applications with "lines of code" is okay for laymen to do but software engineers know the challenge at hand to let an AI model build a chrome like codebase (

LLMs are good now, they can do miniscule things on a smaller context. what we need now is a bigger thinking machine that gets the big picture and makes use of LLM and other predictive networks to get things done while being focused on the big picture and bug fixes along the way. "bugs" are not just errors that the super intelligent AI will never make, but also adjustments and adaptations to technological improvements and improved algorithms.

but we can totally do the #lines of code vs tokens ->> LLM thing, it's a fun mental exercise but pointless.


TFenrir t1_jacuy0q wrote

I think this is really hard to predict, because there are many different paths forward. What if LLMs get good at writing directly minified code? What if they make their own software language? What happens with new architectures that maybe have something like....RETRO or similar memory stores built in. Heck even current vector stores allow for some really impressive things. There are tons of architectures that could potentially come into play that make the maximum context window of 32k tokens more than enough, or maybe 100k is needed. There was a paper I read a while back that was experimenting with context windows that large.

Also you should look into Google pitchfork, which is the code name for a project Google is working on that is essentially an LLM tied to a codebase, that can iteratively improve it through natural language requests.

My gut is, by this summer we will start to see very interesting small apps built with unique architectures that are LLMs iteratively improving a codebase. I don't know where it will go from there.


IluvBsissa OP t1_jacxdtx wrote

Oooh I totally forgot about the "Top-Secret" Pitchfork project ! I really hope it gets somewhere.


JVM_ t1_jad8flc wrote

I think it's much less.

Give me a piece of graph paper, laid out like the game Battleship.

Now, you want me to draw all the roads around your house. You could tell me to draw a road that goes through A1, A2, A3, A4, A5. But what if your road goes all the way to 100, you'd quickly switch to "Draw a road from A1 to A100"

I think this is where you can cut corners with the code generation as well.

"Make me a person object with a name and age. They can have friends who are also people" "Store this in a database that has scaling and load balancing based on X parameters"

I think the number of tokens required to generate software are much lower than you'd expect - but having the LLM understand the previous context and tailor it's response to what was previously generated would need to change from what we see today.