You must log in or register to comment.

sumane12 t1_jds5lwr wrote

That delay kills me, far too long. I'm guessing gpt5 will have to be multimodal with sound so can recognise words and doesn't need to process into text


NWCoffeenut t1_jdsgb83 wrote

I think a good part of the latency was with the TTS system. The actual text response for the most part came back reasonably quickly.


illathon t1_jdsoud8 wrote

No most implementations of whisper are slow.


itsnotlupus t1_jdt280v wrote

Whisper is the speech recognition component.
I don't think he said what he's using for TTS, might be MacOS' builtin thingy.


eggsnomellettes t1_jdt5dxl wrote

They're using elevenlabs, which isn't local and hence a slow API call


tortoise888 t1_jdtp8yj wrote

If we eventually get open source Elevenlabs quality models running locally it's gonna be insane.


ebolathrowawayy t1_jdvfmrk wrote

There's also Tortoise TTS which can be run locally but idk how fast it is.


stupidcasey t1_jdsff4l wrote

I expect gpt-5 or 6 to be super multimodal where they train it on anything and everything we have data for, audio shur video of course crossword puzzles hell yeah pong yup car driving why not, I think the only thing stopping us is it takes to long and we’ll have more processing power by then.


pokeuser61 t1_jdskrfs wrote

If you ran this on the hardware that gpt5 will require, it wouldn’t have a delay.


RedditLovingSun t1_jdtn0z9 wrote

It looks like from the title bar he's using whisper api for transcribing his audio to a text query. That has to send a API request with the audio out and wait for the text to come back over the internet. I'm sure a local audio text transcriber would be considerably faster

Edit nvm whisper can be run locally so he's probably doing that


itsnotlupus t1_jdt2igm wrote

The model text output is(/can be) a stream, so it ought to be possible to pipe that text stream into a warmed up TTS system and start getting audio before the text is fully generated.


Drown_The_Gods t1_jdww8zc wrote

Use Talon Voice. The developer has their own engine that blows Whisper out of the water. Never worry about speed again. Don’t thank me, but do chuck them a few dollars if you find it useful.


moonpumper t1_jdsxn21 wrote

I just want a screen free phone that's basically just Jarvis. Read my texts to me, look shit up for me, keep track of and make appointments for me, give me stock quotes, tell me the news, just don't suck me into an infinite scroll anymore. If I need to see something cast it to a screen in my house. Done with phone screens.


RedditLovingSun t1_jdtnafr wrote

Can't wait till we get there with a better alpaca model + local transcription and audio generation + chatgpt style plugins for operating apps. All possible today we just have to wait for it to be developed


SkyeandJett t1_jdt2zli wrote

That was my thought. No more phone. Just the smart watch.


SnipingNinja t1_jdv68wu wrote

I actually have a concept in my mind, don't have all the skills needed but will be learning things in the next few months, hopefully I'm not too late when I'm done making my idea into reality.


SkyeandJett t1_jdv6nju wrote

This is probably just my anxiety but I feel like anything we think of or try to execute is going to be eclipsed before it can be realized. We're going to go overnight from this moment to indistinguishable from human androids and FDVR. This past couple of weeks has been overwhelming in the extreme.


SnipingNinja t1_jdvg55n wrote

You're right but I think that issue isn't relevant to this, having a locally running AI would be useful regardless of other innovations, and there's something to say about cyberpunkness of such a device


[deleted] t1_jdv9fmu wrote



moonpumper t1_jdva8ow wrote

With chat gpt type stuff how would it sound much different than a phone conversation? The whole idea is that the os responds to natural language, like talking to a personal assistant or secretary.


czmax t1_jdwcbuq wrote

I was hoping that wearables (like a watch) could do this for me. Or at least force development in that direction.

(Seems to not be panning out… but i still have hope. I’d love to only carry a watch for most of my day. Initially I’d go through screen withdrawal but in the long run I think life would be better).


Dwanyelle t1_jdsjgfb wrote

Yeah, I'd be surprised if we don't have something like that available publicly before the end of the year(if only cause big tech is slowly and unwieldy and things need to work their way through the proper paperwork


pokeuser61 t1_jdskvem wrote

It is both public and open source


Dwanyelle t1_jdsmha6 wrote

I should clarify, it will be a packaged product from a big tech person.

I could do this, sure, I can putz around on computers a bit, but once you can just click an "install" button in the Microsoft store, that's it


micseydel t1_jdsr6vx wrote

Big tech will offer it as a service instead of a locally-running system. That will mean latency, increased data use, and other... differences 😅


Dwanyelle t1_jdss5uk wrote

Oh, there will definitely be a ton of downsides, but convenience will not be one of them.


imlaggingsobad t1_jdu555l wrote

I'm like 100% certain that Apple, Google and Meta are making a JARVIS assistant that connects to AR glasses. It would be a revolutionary product and it's actually feasible imo.


_dekappatated t1_jdt8e99 wrote

TIL there was a B programming language


GoSouthYoungMan t1_jdu54fg wrote

And before there was B, there was APL: A Programming Language. (This is not a joke.)


Grecu69 t1_jds1x7q wrote

This looks like a slightly better version of siri imo


HarbingerDe t1_jdst2fh wrote

It's a significantly better version of Siri.

GPT-4 can borderline pass the Turing Test and Siri can barely do... anything?


kevinzvilt t1_jdsv83o wrote

Me: Siri, set my alarm for 7am.

Siri: Here is a list of videos titled Tom Tom Solo by River Banks!


JDP87 t1_jdt6uhc wrote

At least you're getting an answer.

Working on that. Something went wrong. Please try again.


DaffyDuck t1_jdu15vr wrote

13b parameter llama is not as good as GPT4.


[deleted] t1_jdta7a5 wrote

Samantha >>>>>>>>>>


the_funambule t1_jdtjah6 wrote

ChatGPT states Samantha is the most accurate representation of AI in movies


InfoOnAI t1_jdtab19 wrote

I've been trying to set something similar up.


_Alasdair t1_jdy9mfd wrote

I built something exactly like this back when GPT3 API came out. Was pretty cool but eventually got bored with it because it couldn't do anything. I tried hooking it up to external apis to get real world live data but by the end everything was so complicated and slow that I gave up.

Hopefully with the GPT4 plugins we can now make something actually useful. It's gonna be awesome.


HesThePianoMan t1_jdtr4iy wrote

This is nothing special, just sounds like Google assistant


Sigma_Atheist t1_jdswekv wrote

Marvel is cringe. Can we use some other name to compare stuff like this to?


Anjz t1_jdsynyi wrote

You mean you don't like MODOK?

How about we just name it Dan? Dan's a cool guy.