Submitted by minimaxir t3_11fbccz in MachineLearning
harharveryfunny t1_jamab7m wrote
Reply to comment by londons_explorer in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
The two pair up very well though - now that there's a natural language API, you could leverage that for speech->text->ChatGPT. From what I've seen of the Whisper demos, it seems to be the best out there by quite a margin. Does anything else perform as well?
fasttosmile t1_janaaex wrote
GCP, speechmatics, rev, otter.ai, assemblyai etc. etc. offer similar or better performance, as well as streaming and a much more rich output.
MonstarGaming t1_jap8605 wrote
That seems to be the gist of this entire thread. This is the first API most of /r/machinelearning have heard of so it must be best on the market. /s
To your point, there are companies who have been developing speech-to-text for decades. The capability is so unremarkable that most (all?) cloud providers have a speech-to-text offering already and it easily integrates with their other services.
I know this is a hot take, but I don't think OpenAI has a business strategy. They're deploying expensive models that directly compete with entrenched, big tech companies. They can't be thinking they're going to take market share away from GCP, AWS, Azure with technologies that all three offer already, right? Right???
fasttosmile t1_japaes4 wrote
To be fair, they are technically very competent and the pricing is very cheap. And their marketing is great.
But yeah dealing with B2B customers (where the money is) and integrating feedback from them is a very different thing than what they've been doing so far. They might be angling to serve as a platform for AI companies that then have to deal with average customers. That way they get to only deal with people who understand the limitations of AI. Could work. Will change the company to be less researchy though.
soobardo t1_japo5w5 wrote
Yes, they pair up perfectly. Whisper detects anything I babble to it, english or french and it's surprisingly fast. I've wrapped a loop that:
listens micro -> whisper STT -> chatgpt -> lang detect -> Google TTS -> speaker
With noise/silence detection, it's a complete hands-off experience, like chatting with a real person. Delay is ~ 5s for all calls. "Glueing" the APIs is straightforward and intuitive.
Viewing a single comment thread. View all comments