TwitchTvOmo1 t1_j9a8bq6 wrote

Eventually (and eventually is MUCH sooner than people realize) people will use AI to simulate their dead loved ones etc... Or simps will use it to simulate their e-girls. You give a LLM all the texts/online communication you had with that person, train it off them, give a 5 second voice recording, 1 picture, and boom. They'll have an avatar that looks just like them, their voice, and their style of talking. All of these are problems that have been solved already (except maybe the training speaking style from a text dataset, but judging from OpenAI's latest announcements its on the near horizon). Maybe feed it some of your memories too (in text form of course), kind of like a diary, so you can talk about the past like the AI actually lived it and was there, which adds to the immersion.

How long ago was it that we were seeing stuff like this in Black Mirror? A couple of years? A couple of years from now it's already reality. How crazy is that?


TwitchTvOmo1 OP t1_j8vrlmj wrote

I agree. Checked almost every sample from VALL-E and Eleven Labs is simply more realistic. More varied and natural inflections in the tone of voice.

The 1 thing that VALL-E seems to do better is the voice cloning. It also keeps the original sound noisescape in the cloned result (noise profile, EQ profile, etc). But it's debatable whether that should be called a feature or a bug. One could argue that getting a crystal clear pro-level recording quality on the cloned voice is the desired outcome.

Of course if your scope of application is fooling people with the cloned voice, then yeah you care about preserving the noise/EQ profile of the original sample too.

I also didn't like the "emotions" settings much as the outputs weren't very natural.


TwitchTvOmo1 OP t1_j8tdung wrote

>i'm not sure if has to be in real time. if you think about it, people use all different ways to fill up some time before they finally, after innumerable little pauses, sidebars and parentheticals (like this) they get to the point

Definitely. What I'm saying is, if we want full immersion, that at the very least it will need to be able to respond as fast as a human. And that is often nearly instant in natural conversations.

And of course even when it gets to the point where it can have instant responses, to keep the realism it will have a logic system where it decides how long it should pretend that it's "thinking" before it starts voicing out its response, according to the nature of the conversation and how long a regular human would need to think to respond to a particular statement.


TwitchTvOmo1 OP t1_j8taffb wrote

>No, that's not right. Nobody programmed the LLM how to respond, it is just based on training data. It is emergent behavior.

So if it was trained with no guidance/parameters whatsoever, what stops us from giving it parameters to follow certain styles? Nothing. It just makes more sense to start with a generalized model first before attempting to create fine-tunes of it that solve different problems. Many LLM providers like OpenAI already provide a "fine-tuning" api where you can submit labeled example completions to fine-tune your own version of their LLM.

And that's what I mean by fine-tuning. Fine tuning isn't asking the default model to behave in a certain way. You're not "editing" the model. Fine tuning is re-training the model with specific parameters.

Eventually larger models will be able to encompass different styles and you won't have to specifically create smaller fine-tuned versions of them. Technically you already could ask ChatGPT to act angry or talk like a nazi or pretend it's X person in Y situation etc, but the devs specifically restrict you from doing so. An earlier example of a way more primitive chatbot that didn't have such restriction is the shitstorm twitter bot that started talking like an anti-semitic 4chan user.

Here's another article by openAI from just today, describing pretty much what I just said.

>We believe that AI should be a useful tool for individual people, and thus customizable by each user up to limits defined by society. Therefore, we are developing an upgrade to ChatGPT to allow users to easily customize its behavior.


TwitchTvOmo1 OP t1_j8t85gq wrote

You have to remember that LLMs currently talk that way because it's just the default way their creators thought they should respond with. I don't see why it would be an issue at all to "fine-tune" any of these LLMs to write with a specific style that would sound more casual and normal. It's not that it's a limitation, they're just explicitly avoiding it for the current scope of applications.

In fact, in these AI LLM "games" that I'm envisioning, you would ask the AI to adopt certain styles to emulate certain social situations. Like ask it to pretend it's an angry customer and you have to convince it to come to a compromise (In the future I see AI services like these being used in job interviews for example to evaluate a candidate's skill). Or pretend it's your boss and you'll negotiate a salary increase. Pretend it's a girl that you're about to hit on, etc.

Social interaction and social engineering are about to be minmaxed just like you minmax your dps in a game by spending 10 hours in practice mode.

After a few years, practising social situations with an AI will be considered primitive as there'll be hardware "cheats" like let's say regular looking glasses that have a mini processor and mic, who are listening to what others around you are saying, and are generating the optimal response based on what it knows about that person's personality, current emotional state, and your end goals.

Admittedly I know nothing about the field but I highly doubt this is currently outside what we can do. It's just that nobody tried yet.


TwitchTvOmo1 OP t1_j8t7w4n wrote

The only limitation I see currently isn't how long it takes to generate audio. I'm sure that will be taken care of. It's how long it takes by a LLM to generate a response. I haven't tried Bing yet but with ChatGPT it's always 5+ seconds.

For a "realistic" conversation with an AI to be immersive, you need realistic response time. Which would be under 0.5 seconds. Not sure if any LLM can handle that by the end of the year.


TwitchTvOmo1 t1_j87ys5j wrote

>but there is no reason that Chat GPT could understand and common Stable diffusion to make art.

Gonna make a giant leap here and assume you meant to type

>but there is no reason that Chat GPT could not understand and command Stable diffusion to make art.

This is already done and it's already been implemented in the most popular web-ui for stable diffusion too. Granted the results aren't perfect yet.


TwitchTvOmo1 t1_j67rg0b wrote

>My personal and humble opinion is tools like these others will help musicians flourish for a good while, before the tools become so helpful that they actually begin disrupting the industry.

I never said the opposite. Industries aren't gonna go "poof" and disappear from one moment to the other. But it's already began. Diffusion models will be remembered as the beginning of the end of the digital art industry. MusicLM and other similar tools that will surface in the near future will be remembered as the beginning of the end of the music industry. And it's not a hell of a leap to say this is gonna happen within the current decade. Everything seems like a hell of a leap to our brains because we're not very good at grasping the concept of exponential growth. Our brains think linearly, but AI growth has been exponential for years now.


TwitchTvOmo1 t1_j67o8dg wrote

Reply to comment by CypherLH in Google not releasing MusicLM by Sieventer

Capitalism gonna capitalise mate. Every corporation will lobby billions doing their best to find a way to profit as much as possible off of AI. Even though it should be democratized.

Let's take this post for example. OP says he has no idea why they'd keep MusicLM private and how their usual argument of "it could be dangerous" doesn't really make sense here. It's because it's bullshit and that's not the reason they're keeping it private. It doesn't even have anything to do with the potential legal battles. The real reason is they know it's going to be a MASSIVE cash cow in the next 5 years and they'd be stupid not to milk it behind the scenes while acting like they're looking out for the world. Only chance of them releasing it is if a competitor like releases something similar for free. Then they would be forced to release theirs too (not for free of course) before erodes the entire market and they can no longer make the trillions they dreamed of.

Free market competition is the only hope there is. And that still looks a bit grim, considering the huge amounts of capital needed to make progress in these areas. And we all know which are the companies with those huge amounts of capital. The same ones that wanna squeeze every profitable penny out of AI progress.


TwitchTvOmo1 t1_j67nccl wrote

Reply to comment by CypherLH in Google not releasing MusicLM by Sieventer

For what it's worth, no matter how much copyright advocates scream and cry, it won't stop AI from replacing entire industries. Like it or not the music industry is next. It might slow us down, but it's happening one way or another.


TwitchTvOmo1 t1_j64wsuc wrote

There are many ways to exploit it. The question is... Are we just gonna let that happen? Or are we going to demand that AGI is democratized? The history of the world so far is unfortunately not on our side. The means of production have always belonged to the rich. AGI will be the ultimate "means of production" if you wanna call it that. Will the rich get richer or will AGI solve inequality too?

Find out on the next episode of "capitalism vs AI"


TwitchTvOmo1 t1_j64qrta wrote

Same way every corporation exploits someone else's invention for profit.

Idea think tanks. Identify a problem. Then come up with an idea (leveraged by that invention, in this case AGI) that solves the problem. Package that idea into a product (software or hardware). Sell product.

Now you might ask, "But isn't AGI GENERAL intelligence? Meaning that with just 1 product that utilizes AGI to its fullest, you will never need another product ever again?"

You're not wrong. But the conspiracist in me tells me companies might try to gimp AGI intentionally so they can have multiple "different" products using the same technology, instead of 1 holy grail of all products that you only sell once. The only hope of eliminating this conspiracy is... You guessed it, competition. As long as more than 1 company gets their hands on it, the "ultimate product" of AGI is unavoidable, as one company will always try to outdo the other's product.