genshiryoku

genshiryoku t1_ja7pugt wrote

I agree with this. I'm a middle aged engineer and believe it or not there used to be a time when assembly was considered "automation of programming".

Before before assembly you would have to hot-wire individual 1s and 0s into the hardware to program which was a labor intensive jobs. You had to memorize the instructions and data sequence as strings of 1s and 0s.

Then assembly came along and suddenly a lot of the work was simplified to only writing a command that was equivalent to those instructions.

Then there was another big paradigm shift with "high level languages" like C and C compilers.

Essentially ever since C and other compiled languages existed most people haven't truly programmed anymore. Because essentially you're just communicating to a computer program what the computer program should actually program for you.

The C/C++ or Python code you're writing today? That's not actually programming. It's just you telling the computer what it should program for you.

In a way ChatGPT and other systems like it are just a newer higher level programming language. Because you're still communicating to the computer what it needs to program. But it's just in a more intuitive human way.

I don't think the job of programmer is going to go away at all. Just like Assembly didn't crash the occupation or C didn't crash the occupation. It's just yet another layer of abstraction on top of it.

As an old-school kind of guy I have to admit that I liked writing assembly more than C and I like C more than Python. And yet again I like Python more than typing into ChatGPT. But this is how software development has always been. You adapt to the new developments, you specialize into a very specific niche, or you exit the labor market and become a hobbyist.

Young people have too much anxiety about these things because the last ~15 years have been relatively stagnant in terms of big paradigm shifts within programming.

Big shifts like this used to happen every 2-3 years.

11

genshiryoku t1_ja6uw8w wrote

Not for Japanese. Due to how Japanese works it's essentially impossible to translate into English without having full context. This context isn't embedded within the language itself but conveyed through circumstance. This is why it's basically impossible to properly translate as AI models tend to hallucinate the missing context information and get it wrong.

3

genshiryoku t1_ja31syb wrote

I agree as a Japanese person that speaks English it's funny how extremely bad even the best AI tools are right now into translating Japanese into English. English to Japanese is a bit better but still not very good.

I recognize that it needs AGI to properly translate Japanese into English. Because Japanese lacks so much context that current AIs basically just "hallucinate" the missing context like how ChatGPT bullshits code when it doesn't know what to do.

10

genshiryoku t1_ja2pugm wrote

I disagree with this especially due to the popularity of Youtube and Tiktok where everyone has completely different video feeds based on their own interests.

I think the recommendation engine just generating the media you want to watch is the clear next step and something that traditional media can't compete with.

I think you wanting to connect with others over shared media consumption is just a sign of our generation and not shared by Gen Z in the same way.

1

genshiryoku t1_j9svy3v wrote

No the reason why the median prediction barely got down is because we still have the exact same bottleneck and issues on the path to AGI. These haven't been solved over the past 6 years. So while we have made great strides with scaling up Transformer and specifically Large Language Models that display emergent properties. The actual issue still plays behind the scenes.

The main issue and bottleneck is training data, we're rapidly running out of usable data on the internet with the biggest models already being trained on 30% of all relevant data on the internet. If rates continue like this we might run out of usable data between 2025-2027.

We know we can't use synthetic or AI generated data to train models on because of the overfitting problem that introduces. We essentially need to either find some way to generate orders of magnitude more data (Extremely hard problem if not outright impossible). Or we need to have breakthroughs in AI architecture so that the models need to be trained on fewer data (Still a hard problem and linear in nature).

The massive progress we're seeing currently is simply just scaling up models bigger and bigger and training them on more data but once the data stops flowing these models will rapidly stagnate and we will enter a new AI winter.

This is why the median prediction barely changed. We'd need to solve these fundamental bottlenecks and issues before we'll be able to achieve AGI.

Of course the outlier possibility of AGI already emerging before running out of training data over the next 2-4 years is also a slight possibility of course.

So essentially while the current progress and models are very cool and surprising they are essentially within the realm of expected growth, because no one was doubting the AI boom to slow down before the training data ran out. We're dreading 2-4 years from now when all usable internet data has essentially been exploited already.

8

genshiryoku t1_j6ahc38 wrote

I think the next 5 years will be one of explosive AI progress but sudden and rapid stagnation and an AI winter will follow after that.

The reason I think this is because we're rapidly running out of training data as bigger and bigger models essentially get trained on all the available data on the internet. After that data is used up there will be nothing new for bigger models to train on.

Since hardware is already stagnating and data will be running out the only way to make progress would be to make breakthroughs on the AI architectural front, which is going to be linear in nature again.

I'm a Computer Scientist by trade and while I work with AI systems on a daily basis and keep up with AI papers I'm not an AI expert so I could be wrong on this front.

13

genshiryoku t1_j6a85jx wrote

Because Moore's Law largely stopped around ~2005 when Dennard Scaling stopped being a thing. Meaning clockspeeds have hovered around the 4-5Ghz rate for the last 20 years time.

We have started coping by engaging in parallelism through multi-core systems but due to Amdahls Law there is a diminishing return associated with adding more cores to your system.

On the "Instructions Per Cycle" front we're only making slow linear progression similar to other non-IT industries so there's not a lot of gain to be had from this either.

The reason why 2003-2013 feels like a bigger step is because it was a bigger step than 2013-2023. At least from a hardware perspective.

The big innovation we've made however is using largely parallelized GPU cores to accelerate machine learning on the extremely large data sets large social media sites have which has resulted in the current AI boom.

But yeah you are correct in your assessment that computer technology have largely stagnated since about ~2005.

12

genshiryoku t1_j643mw8 wrote

I agree in AI models becoming commodities over time as has been seen with Stable Diffusion essentially disrupting the entire business model of paid image generation like Dall-E and Midjourney.

I completely agree with the investment case and burn rate of these AI companies not being worth it. And that just like historically with the industrial revolution. It won't be the AI companies benefiting from the creation of AI it will be the companies that can rapidly scale up their production with the use of AI.

It wasn't steam engine makers that benefited from the industrial revolution. It was factories that could quickly scale up with steam engine providing labor.

It won't be the AI companies benefiting from AI. It will be companies that have lots of intellectual workers that can quickly scale up with AI providing intellectual labor.

I actually expect law firms, medical field, schooling platforms and other almost purely intellectual firms to benefit the most from an economic windfall perspective.

30

genshiryoku t1_j5y03ci wrote

Problem is that AI benefits from economy of scale and there's an exponential increase in performance with bigger models.

What this means is that it's a "winner-takes-all" situation and you can't compete as a smaller entity without huge capital injection to generate the compute necessary to train large models.

The only alternative I can think of is a distributed computer like Seti@home where people volunteer their GPUs to collectively train large open source AI models.

As cryptocurrency mining has shown us, most people won't do that on volunteer basis so there'd need to be some sort of financial incentive but I wouldn't want to mix a neutral AI open source model with perverse financial incentives like crypto.

So essentially that is not going to happen and even StabilityAI is eventually going to have to incorporate like OpenAI to continue on their path, sadly enough.

8

genshiryoku t1_j57j6s1 wrote

It would be lower quality data but still usable if significantly altered. The question is. Why would you do this instead of just generating real data?

GPT is trained on human language it needs real interaction to learn from like the one we're having right now.

I'm also not saying that this isn't possible. We are AGI level intelligences and we absolutely consumed less data than GPT-3 did over our lifetimes so we know it's possible to reach AGI with relatively little data.

My original argument was merely that it's impossible with current transformer models like GPT and that we need another breakthrough in AI architecture to solve problems like this, not merely scale up current transformer models, because the training data is going to run out over the next couple of years as all of the internet will be used up.

0

genshiryoku t1_j57h1fb wrote

The "created data" is merely the AI mixing the training data in such a way that it "creates" something new. If the dataset is big enough this looks amazing and like the AI is actually creative and creating new things but from a mathematics perspective it's still just statistically somewhere in between the data it already has trained on.

Therefor it would be the same as feeding it its own data. To us it seems like completely new, and actually useable data though which is why ChatGPT is so exciting. But for AI training purposes it's useless.

1

genshiryoku t1_j57dtsz wrote

Without going to deep into it. This is a symptom of Transformer models. My argument was why transformer models like GPT can't scale up.

It has to do with the mathematics behind training AI. Essentially for every piece of data the AI refines itself but for copies of data it overcorrects itself which results in inefficiency or worse performance. With synthetic data it kinda acts the same as duplicate data in that it overcorrects and worsens its own performance.

If you are truly interested you can see for yourself here.

And yes AI researchers are looking for models to detect what data is synthetic on the internet because it's inevitable that new data will be machine generated which can't be used to train on. If we fail at that task we might even enter an "AI dark age" where models get worse and worse with time because the internet will be filled with AI generated garbage data that can't be trained on. Which is the worst case scenario.

4

genshiryoku t1_j56we7z wrote

The only reason Google doesn't have publicly usable models like ChatGPT is because Google rightfully realizes that it will suppress their business model of AD revenue based search which is still their core business model where most of their revenue comes from.

15

genshiryoku t1_j56btvq wrote

The problem is the total amount of data and the quality of the data. Humans using an AI like GPT-3 doesn't generate nearly enough data to properly train a new model, not even with decades of interaction.

The demand for training data scales logarithmically with the parameter scale of the transformer model. This essentially means that mathematically Transformer models are a losing strategy and isn't going to lead to AGI unless you had unlimited amount of training data, which we don't.

We need a different architecture.

9

genshiryoku t1_j55te6y wrote

Because GPT-3 was trained on almost all publicly available data and GPT-4 will be trained by transcribing all video footage on the internet and feeding it to it.

You can't scale the model up without scaling the training data with it. The bottleneck is the training data and we're running out of it.

It's not like the internet is suddenly going to 10x in size over the next couple of years. Especially as the global population is shrinking and most people are already connected online so not a lot of new data is made.

4