Educational-Net303 t1_jair4wf wrote on March 1, 2023 at 6:58 PM

#2,139,539

Definitely a loss-leader to cut off Claude/bard, electricity alone would cost more than that. Expect a rise in price in 1 or 2 months

harharveryfunny t1_jairuhd wrote on March 1, 2023 at 7:02 PM

#2,139,565

It says they've cut their costs by 90%, and are passing that saving onto the user. I'd have to guess that they are making money on this, not just treating it as a loss-leader for other more expensive models.

The way the API works is that you have to send the entire conversation each time, and the tokens you will be billed for include both those you send and the API's response (which you are likely to append to the conversation and send back to them, getting billed again and again as the conversation progresses). By the time you've hit the 4K token limit of this API, there will have been a bunch of back and forth - you'll have paid a lot more than 4K * 0.2c/1K for the conversation. It's easy to imagine chat-based API's becoming very widespread and the billable volume becoming huge. OpenAI are using Microsoft Azure compute, who may see a large spike in usage/profits out of this.

It'll be interesting to see how this pricing, and that of competitors evolves. Interesting to see also some of OpenAI's annual price plans outlined elsewhere such as $800K/yr for their 8K token limit "DV" model (DaVinci 4.0?), and $1.5M/yr for the 32K token limit "DV" model.

JackBlemming t1_jaisvp4 wrote on March 1, 2023 at 7:09 PM

#2,139,609

Replying to Educational-Net303 (#2,139,539)

Definitely. This is so they can become entrenched and collect massive amounts of data. It also discourages competition, since they won't be able to compete against these artificially low prices. This is not good for the community. This would be equivalent to opening up a restaurant and giving away food for free, then jacking up prices when the adjacent restaurants go bankrupt. OpenAI are not good guys.

I will rescind my comment and personally apologize if they release ChatGPT code, but we all know that will never happen, unless they have a better product lined up.

lostmsu t1_jaj0dw2 wrote on March 1, 2023 at 7:56 PM

#2,139,869

Replying to Educational-Net303 (#2,139,539)

I would love an electricity estimate for running GPT-3-sized models with optimal configuration.

According to my own estimate, electricity cost for a lifetime (~5y) of a 350W GPU is between $1k-$1.6k. Which means for enterprise-class GPUs electricity is dwarfed by the cost of the GPU itself.

LetterRip t1_jaj1kp3 wrote on March 1, 2023 at 8:04 PM

#2,139,908

> I have no idea how OpenAI can make money on this.

Quantizing to mixed int8/int4 - 70% hardware reduction and 3x speed increase compared to float16 with essentially no loss in quality.

A*.3/3 = 10% of the cost.

Switch from quadratic to memory efficient attention. 10x-20x increase in batch size.

So we are talking it taking about 1% of the resources and a 10x price reduction - they should be 90% more profitable compared to when they introduced GPT-3.

edit - see MS DeepSpeed MII - showing a 40x per token cost reduction for Bloom-176B vs default implementation

https://github.com/microsoft/DeepSpeed-MII

Also there are additional ways to reduce cost not covered above - pruning, graph optimization, teacher student distillation. I think teacher student distillation is extremely likely given reports that it has difficulty with more complex prompts.

jturp-sc t1_jaj2w4j wrote on March 1, 2023 at 8:12 PM

#2,139,960

Glad to see them make ChatGPT accessible via API and go back to update their documentation to be more clear on which model is which.

I had an exhausting number of conversations with confused product managers, engineers and marketing managers on "No, we're not using ChatGPT".

Timdegreat t1_jaj3gpr wrote on March 1, 2023 at 8:15 PM

#2,139,977

Will we be able to generate embeddings using the ChatGPT API?

jturp-sc t1_jaj45ek wrote on March 1, 2023 at 8:20 PM

#2,139,998

Replying to JackBlemming (#2,139,609)

The entry costs have always been so high that LLMs as a service was going to be a winner-take-most marketplace.

I think the best hope is to see other major players enter the space either commercially or as FOSS. I think the former is more likely, and I was really hoping that we would see PaLM on GCP or even something crazier like a Meta-Amazon partnership for LLaMa on AWS.

Unfortunately, I don't think any of those orgs will pivot fast enough until some damage is done.

visarga t1_jaj4bqs wrote on March 1, 2023 at 8:21 PM

#2,140,010

Replying to harharveryfunny (#2,139,565)

> $1.5M/yr

The inference cost is probably 10% of that.

visarga t1_jaj4lxx wrote on March 1, 2023 at 8:22 PM

#2,140,028

Replying to Timdegreat (#2,139,977)

Not this time. Still text-embedding-ada-002

harharveryfunny t1_jaj8bk2 wrote on March 1, 2023 at 8:45 PM

#2,140,151

Replying to Educational-Net303 (#2,139,539)

Could you put any numbers to that ?

What are the FLOPS per token inference for a given prompt length (for a given model)?

What do those FLOPS translate to in terms of run time on Azure's GPUs (V100's ?)

What is the GPU power consumption and data center electricity costs ?

Even with these numbers can we really relate this to their $/token pricing scheme ? The pricing page mentions this 90% cost reduction being for the "gpt-3.5-turbo" model vs the earlier davinci-text-3.5 (?) one - do we even know the architectural details to get the FLOPs ?

NoLifeGamer2 t1_jaj9i1b wrote on March 1, 2023 at 8:52 PM

#2,140,218

Replying to visarga (#2,140,028)

Gotta love getting those "Model currently busy" errors for only a single request

WarProfessional3278 t1_jaj9nnt wrote on March 1, 2023 at 8:53 PM

#2,140,223

Replying to harharveryfunny (#2,140,151)

Rough estimate: with one 400w gpu and $0.14/hr electricity, you are looking at ~0.00016/sec here. That's the price for running the GPU alone, not accounting server costs etc.

I'm not sure if there are any reliable estimate on FLOPS per token inference, though I will be happy to be proven wrong :)

luckyj t1_jajaz53 wrote on March 1, 2023 at 9:01 PM

#2,140,273

Replying to harharveryfunny (#2,139,565)

But that (sending the whole or part of the conversation history) is exactly what we had to do with text-davinci if we wanted to give it some type of memory. It's the same thing with a different format, and 10% of the price... And having tested it, it's more like chatgpt (I'm sorry, I'm a language model type of replies), which I'm not very fond of. But the price... Hard to resist. I've just ported my bot to this new model and will play with it for a few days

minimaxir OP t1_jajcf4s wrote on March 1, 2023 at 9:10 PM

#2,140,314

Replying to LetterRip (#2,139,908)

It's safe to assume that some of those techniques were already used in previous iterations of GPT-3/ChatGPT.

Purplekeyboard t1_jajcnb5 wrote on March 1, 2023 at 9:12 PM

#2,140,321

Replying to JackBlemming (#2,139,609)

> This is not good for the community.

When GPT-3 first came out and prices were posted, everyone complained about how expensive it was, and that it was prohibitively expensive for a lot of uses. Now it's too cheap? What is the acceptable price range?

badabummbadabing t1_jajdjmr wrote on March 1, 2023 at 9:17 PM

#2,140,354

Replying to jturp-sc (#2,139,998)

Honestly, I have become a lot more optimistic regarding the prospect of monopolies in this space.

When we were still in the phase of 'just add even more parameters', the future seemed to be headed that way. With Chinchilla scaling (and looking at results of e.g. LLaMA), things look quite a bit more optimistic. Consider that ChatGPT is reportedly much lighter than GPT3. At some point, the availability of data will be the bottleneck (which is where an early entry into the market can help getting an advantage in terms of collecting said data), whereas compute will become cheaper and cheaper.

The training costs lie in the low millions (10M was the cited number for GPT3), which is a joke compared to the startup costs of many, many industries. So while this won't be something that anyone can train, I think it's more likely that there will be a few big players (rather than a single one) going forward.

I think one big question is whether OpenAI can leverage user interaction for training purposes -- if that is the case, they can gain an advantage that will be much harder to catch up to.

LetterRip t1_jajezib wrote on March 1, 2023 at 9:26 PM

#2,140,426

Replying to minimaxir (#2,140,314)

June 11, 2020 is the date of the GPT-3 API was introduced. No int4 support and the Ampere architecture with int8 support had only been introduced weeks prior. So the pricing was set based on float16 architecture.

Memory efficient attention is from a few months ago.

ChatGPT was just introduced a few months ago.

The question was 'how OpenAI' could be making a profit, if they were making a profit on GPT-3 2020 pricing; then they should be making 90% more profit per token on the new pricing.

currentscurrents t1_jajfjr5 wrote on March 1, 2023 at 9:29 PM

#2,140,448

Replying to lostmsu (#2,139,869)

Problem is we don't actually know how big ChatGPT is.

I strongly doubt they're running the full 175B model, you can prune/distill a lot without affecting performance.

Derpy_Snout t1_jajfxrw wrote on March 1, 2023 at 9:32 PM

#2,140,456

Replying to JackBlemming (#2,139,609)

> This would be equivalent to opening up a restaurant and giving away food for free, then jacking up prices when the adjacent restaurants go bankrupt.

The good old Walmart strategy

JackBlemming t1_jajg4dz wrote on March 1, 2023 at 9:33 PM

#2,140,462

Replying to Purplekeyboard (#2,140,321)

It's not about the price, it's about the strategy. Google maps API was dirt cheap so nobody competed, then they cranked up prices 1400% once they had years of advantage and market lock in. That's not ok.

If OpenAI keeps prices stable, nobody will complain, but this is likely a market capturing play. They even said they were losing money on every request, but maybe that's not true anymore.

currentscurrents t1_jajg818 wrote on March 1, 2023 at 9:33 PM

#2,140,465

Replying to harharveryfunny (#2,139,565)

> It says they've cut their costs by 90%

Honestly this seems very possible. The original GPT-3 made very inefficient use of its parameters, and since then people have come up with a lot of ways to optimize LLMs.

[deleted] t1_jajgqsv wrote on March 1, 2023 at 9:37 PM

#2,140,474

Replying to JackBlemming (#2,139,609)

[deleted]

bmc2 t1_jajj03y wrote on March 1, 2023 at 9:50 PM

#2,140,566

Replying to Educational-Net303 (#2,139,539)

They raised $10B. They can afford to eat the costs.

bmc2 t1_jajjjvd wrote on March 1, 2023 at 9:54 PM

#2,140,585

Replying to [deleted] (#2,140,474)

Training based on submitted data is going to be curtailed according to their announcement:

“Data submitted through the API is no longer used for service improvements (including model training) unless the organization opts in”

VertexMachine t1_jajjq8b wrote on March 1, 2023 at 9:55 PM

#2,140,595

Replying to JackBlemming (#2,139,609)

Yea, but one thing is not adding up. It's not like I can go to a competitor and get access to similar level of quality API.

Plus if it's a price war... with Google.. that would be stupid. Even with Microsoft's money, Alphabet Inc is not someone you want to go to war on undercutting prices.

Also they updated their polices on using users data, so the data gathering argument doesn't seem valid as well (if you trust them)

Edit: ah, btw. I don't say that there is no ulterior motive here. I don't really trust "Open"AI since the "GPT2-is-to-dangerous-to-release" bs (and corporate restructuring). Just that I don't think is that simple.

[deleted] t1_jajmeil wrote on March 1, 2023 at 10:12 PM

#2,140,691

Replying to harharveryfunny (#2,139,565)

[deleted]

Thunderbird120 t1_jajok9y wrote on March 1, 2023 at 10:26 PM

#2,140,760

Replying to LetterRip (#2,139,908)

I'm curious which memory efficient transformer variant they've figured out how to leverage at scale. They're obviously using one of them since they're offering models with 32k context but it's not clear which one.

astrange t1_jajpps3 wrote on March 1, 2023 at 10:34 PM

#2,140,797

Replying to VertexMachine (#2,140,595)

"They're just gathering data" is literally never true. That kind of data isn't good for anything.

farmingvillein t1_jajtmly wrote on March 1, 2023 at 11:01 PM

#2,140,964

Replying to VertexMachine (#2,140,595)

> Plus if it's a price war... with Google.. that would be stupid

If it is a price war strategy...my guess is that they're not worried about Google.

Or, put another way, if it is Google versus OpenAI, openai is pretty happy about the resulting duopoly. Crushing everyone else in the womb, though, would be valuable.

andreichiffa t1_jajuk03 wrote on March 1, 2023 at 11:07 PM

#2,141,001

Replying to LetterRip (#2,139,908)

That, and the fact that OpenAI/MS want to completely dominate LLM market, in the same way Microsoft dominated OS/browser market in the late 90s/early 2000s.

Beli_Mawrr t1_jajvgax wrote on March 1, 2023 at 11:14 PM

#2,141,034

Replying to JackBlemming (#2,139,609)

I use the API as a dev. I can say that if Bard works anything like OpenAI, it will be super easy to switch.

farmingvillein t1_jajw0yj wrote on March 1, 2023 at 11:17 PM

#2,141,051

Replying to badabummbadabing (#2,140,354)

> The training costs lie in the low millions (10M was the cited number for GPT3), which is a joke compared to the startup costs of many, many industries. So while this won't be something that anyone can train, I think it's more likely that there will be a few big players (rather than a single one) going forward.

Yeah, I think there are two big additional unknowns here:

How hard is it to optimize inference costs? If--for sake of argument--for $100M you can drop your inference unit costs by 10x, that could end up being a very large and very hidden barrier to entry.
How much will SOTA LLMs really cost to train in, say, 1-2-3 years? And how much will SOTA matter?

The current generation will, presumably, get cheaper and easier to train.

But if it turns out that, say, multimodal training at scale is critical to leveling up performance across all modes, that could jack up training costs really, really quickly--e.g., think the costs to suck down and train against a large subset of public video. Potentially layer in synthetic data from agents exploring worlds (basically, videogames...), as well.

Now, it could be that the incremental gains to, say, language are not that high--in which case the LLM (at least as these models exist right now) business probably heavily commoditizes over the next few years.

[deleted] t1_jajwxlq wrote on March 1, 2023 at 11:24 PM

#2,141,081

[removed]

[deleted] t1_jak0est wrote on March 1, 2023 at 11:49 PM

#2,141,219

[removed]

peanutbutterjambread t1_jak2p3i wrote on March 2, 2023 at 12:05 AM

#2,141,306

Cool

elsrda t1_jak6drt wrote on March 2, 2023 at 12:32 AM

#2,141,448

Replying to [deleted] (#2,141,219)

Indeed, at least not for now.

EDIT: source

[deleted] t1_jak7jf3 wrote on March 2, 2023 at 12:41 AM

#2,141,485

Replying to elsrda (#2,141,448)

[removed]

lucidraisin t1_jakb7h4 wrote on March 2, 2023 at 1:08 AM

#2,141,625

Replying to Thunderbird120 (#2,140,760)

it is flash attention (Tri Dao et al)

Thunderbird120 t1_jakbyew wrote on March 2, 2023 at 1:14 AM

#2,141,659

Replying to lucidraisin (#2,141,625)

You're better qualified to know than nearly anyone who posts here, but is flash attention really all that's necessary to make that feasible?

caedin8 t1_jakcasg wrote on March 2, 2023 at 1:16 AM

#2,141,676

It's exciting to see that ChatGPT's cost is 1/10th that of GPT-3 API, which is a huge advantage for developers who are looking for high-quality language models at an affordable price. OpenAI's commitment to providing top-notch AI tools while keeping costs low is commendable and will undoubtedly attract more developers to the platform. It's clear that ChatGPT is a superior option for developers, and OpenAI's dedication to innovation and affordability is sure to make it a top choice for many in the AI community.

lucidraisin t1_jakdtf7 wrote on March 2, 2023 at 1:27 AM

#2,141,727

Replying to Thunderbird120 (#2,141,659)

yes

edit: it was also used to train Llama. there is no reason not to use it at this point, for both training and fine-tuning / inference

big_ol_tender t1_jakmlmc wrote on March 2, 2023 at 2:32 AM

#2,142,093

Replying to caedin8 (#2,141,676)

-totally not chatgpt

[deleted] t1_jako73i wrote on March 2, 2023 at 2:43 AM

#2,142,169

Replying to badabummbadabing (#2,140,354)

[removed]

TrueBirch t1_jakosce wrote on March 2, 2023 at 2:48 AM

#2,142,196

Replying to astrange (#2,140,797)

I worked in adtech. It's often true.

MonstarGaming t1_jakqs01 wrote on March 2, 2023 at 3:03 AM

#2,142,289

>I have no idea how OpenAI can make money on this.

Personally, I don't think they can. What is the main use case for chat bots? How many people are going to pay $20/month to talk to a chatbot? I mean, chatbots aren't exactly new... anybody who wanted to chat with one before ChatGPT could have and yet there wasn't an industry for it. Couple that with it not being possible to know whether its answers are fact or fiction and I just don't see the major value proposition.

I'm not overly concerned one way or another, I just don't think the business case is very strong.

GrumpyMcGillicuddy t1_jakqy81 wrote on March 2, 2023 at 3:04 AM

#2,142,297

Replying to caedin8 (#2,141,676)

Uhhhh

xGovernor t1_jaksctz wrote on March 2, 2023 at 3:15 AM

#2,142,347

I've been tinkering with DaVinci but even with turbo/premium using gpt3.5turbo api requires a credit card added to the account. Excited to fool with it, however I typically use 2048-4000 tokens on DaVinci 3.

xGovernor t1_jaksopw wrote on March 2, 2023 at 3:18 AM

#2,142,361

Replying to harharveryfunny (#2,139,565)

Oh boy what I got away with. I have been using hundreds of thousands of tokens, augmenting parameters and only ever spent 20 bucks. I feel pretty lucky.

bjergerk1ng t1_jakszgr wrote on March 2, 2023 at 3:20 AM

#2,142,374

Replying to LetterRip (#2,139,908)

Is it possible that they also switched from non-chinchilla-optimal davinci to chinchilla-optimal chatgpt? That is at least 4x smaller

cv4u t1_jakzhqj wrote on March 2, 2023 at 4:14 AM

#2,142,638

Replying to LetterRip (#2,139,908)

LLMs can quantize to 8 bit or 4 bit?

LetterRip t1_jal4vgs wrote on March 2, 2023 at 5:04 AM

#2,142,855

Replying to cv4u (#2,142,638)

Yep, or a mix between the two.

GLM-130B quantized to int4, OPT and BLOOM int8,

https://arxiv.org/pdf/2210.02414.pdf

Often you'll want to keep the first and last layer as int8 and can do everything else int4. You can quantize based on the layers sensitivity, etc. I also (vaguely) recall a mix of 8bit for weights, and 4bits for biases (or vice versa?),

Here is a survey on quantization methods, for mixed int8/int4 see the section IV. ADVANCED CONCEPTS: QUANTIZATION BELOW 8 BITS

https://arxiv.org/pdf/2103.13630.pdf

Here is a talk on auto48 (automatic mixed int4/int8 quantization)

https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41611/

LetterRip t1_jal4y8i wrote on March 2, 2023 at 5:05 AM

#2,142,859

Replying to bjergerk1ng (#2,142,374)

Certainly that is also a possibility. Or they might have done teacher student distillation.

MysteryInc152 t1_jal7d3p wrote on March 2, 2023 at 5:29 AM

#2,142,956

Replying to currentscurrents (#2,140,448)

Distillation doesn't work for token predicting language models for some reason.

currentscurrents t1_jalajj3 wrote on March 2, 2023 at 6:03 AM

#2,143,080

Replying to MysteryInc152 (#2,142,956)

DistillBERT worked though?

MysteryInc152 t1_jalau7e wrote on March 2, 2023 at 6:07 AM

#2,143,089

Replying to currentscurrents (#2,143,080)

Sorry i meant the really large scale models. Nobody has gotten a gpt-3/chinchilla etc scale model to actually distill properly.

Lychee7 t1_jalbr7l wrote on March 2, 2023 at 6:17 AM

#2,143,129

Criteria for tokens ? Complex, longer the prompt more tokens it'll use ?

fmai t1_jalcs0x wrote on March 2, 2023 at 6:29 AM

#2,143,164

Replying to lucidraisin (#2,141,625)

AFAIK, flash attention is just a very efficient implementation of attention, so still quadratic in the sequence length. Can this be a sustainable solution for when context windows go to 100s of thousands?

iTrooz_ t1_jale5ca wrote on March 2, 2023 at 6:45 AM

#2,143,208

I hope the API doesn't have the same restrictions as https://chat.openai.com

visarga t1_jalg9iu wrote on March 2, 2023 at 7:11 AM

#2,143,266

Replying to fmai (#2,143,164)

I think the main pain point was memory usage.

Trotskyist t1_jalk4j5 wrote on March 2, 2023 at 8:02 AM

#2,143,392

Replying to Lychee7 (#2,143,129)

A token is (roughly) 4 characters. Both prompt and result are counted.

jinnyjuice t1_jalkbvu wrote on March 2, 2023 at 8:04 AM

#2,143,404

Replying to LetterRip (#2,140,426)

How do we know these technical improvements result in 90% extra revenue? I feel I'm missing some link here.

[deleted] t1_jall6xi wrote on March 2, 2023 at 8:16 AM

#2,143,425

Replying to jinnyjuice (#2,143,404)

[deleted]

Hsemar t1_jalp8as wrote on March 2, 2023 at 9:12 AM

#2,143,532

Replying to lucidraisin (#2,141,625)

but does flash attention help with auto-regressive generation? My understanding was that it prevents materializing the large kv dot product during training. At inference (one token at a time) with kv caching this shouldn't be that relevant right?

WarAndGeese t1_jalq339 wrote on March 2, 2023 at 9:25 AM

#2,143,547

Replying to Educational-Net303 (#2,139,539)

Don't let it demotivate competitors. They are making money somehow, and planning to make massive amounts more. Hence the space is ripe for tons of competition, and those other companies would also be on track to make tons of money. Hence, jump in competitors, the market is waiting for you.

Stakbrok t1_jam0bpq wrote on March 2, 2023 at 11:42 AM

#2,143,823

Replying to iTrooz_ (#2,143,208)

You can edit what it replied of course (and then hope it builds off of that and keeps that specific vibe going, which always works in the playground) but damn, they locked it down tight. 😅

Even when you edit the primer/setup into something crazy (you are a grumpy or deranged or whatever assistant) and change some things it said into something crazy, it overrides the custom mood you set for it and goes right back to its ever serious ChatGPT mode. Even sometimes apologizing for saying something out of character (and by that it means the thing you 'made it say' by editing, so it believes it said that)

Smallpaul t1_jam673c wrote on March 2, 2023 at 12:45 PM

#2,144,011

Replying to jinnyjuice (#2,143,404)

I think you are using the word revenue when you mean profit.

Smallpaul t1_jam6et8 wrote on March 2, 2023 at 12:47 PM

#2,144,019

Replying to andreichiffa (#2,141,001)

They’ll need a stronger story around lock-in if that’s their strategy. One way would be to add structured and unstructured data storage to the APIs.

Smallpaul t1_jam6mjl wrote on March 2, 2023 at 12:49 PM

#2,144,032

Replying to Educational-Net303 (#2,139,539)

1 of 2 months??? How would that short time achieve the goal against well-funded competitors?

It would need to be multiple years of undercutting and even that might not be enough to lock google out.

londons_explorer t1_jam6oyr wrote on March 2, 2023 at 12:49 PM

#2,144,033

Replying to LetterRip (#2,142,855)

Aren't biases only a tiny tiny fraction of the total memory usage? Is it even worth trying to quantize them more than weights?

londons_explorer t1_jam6r8g wrote on March 2, 2023 at 12:50 PM

#2,144,037

Replying to LetterRip (#2,142,855)

Don't you mean the other way around?

Im2bored17 t1_jam6y5y wrote on March 2, 2023 at 12:52 PM

#2,144,044

Replying to xGovernor (#2,142,361)

$20.00 / ($0.002/ 1k tokens) = 10m tokens. If you only used a few hundred k, you got scammed hard lol

Smallpaul t1_jam7abr wrote on March 2, 2023 at 12:55 PM

#2,144,058

Replying to WarAndGeese (#2,143,547)

> Don't let it demotivate competitors. They are making money somehow,

What makes you so confident?

Smallpaul t1_jam83rb wrote on March 2, 2023 at 1:03 PM

#2,144,096

Replying to MonstarGaming (#2,142,289)

I guess you haven’t visited any B2C websites in the last 5 years.

But also: there is a world model behind the chatbot which can translate between human languages, between computer languages, can compose marketing copy, summarise text...

londons_explorer t1_jam8409 wrote on March 2, 2023 at 1:03 PM

#2,144,097

It was an interesting business decision to make a blog post announcing two rather different products (ChatGPT API and Whisper) at the same time...

ChatGPT is a best-in-class, or even only-in-class chatbot API... While Whisper is one of many hosted speech to text solutions.

harharveryfunny t1_jamab7m wrote on March 2, 2023 at 1:22 PM

#2,144,186

Replying to londons_explorer (#2,144,097)

The two pair up very well though - now that there's a natural language API, you could leverage that for speech->text->ChatGPT. From what I've seen of the Whisper demos, it seems to be the best out there by quite a margin. Does anything else perform as well?

ShowerVagina t1_jamiqb4 wrote on March 2, 2023 at 2:28 PM

#2,144,549

Replying to jturp-sc (#2,139,960)

> I had an exhausting number of conversations with confused product managers, engineers and marketing managers on “No, we’re not using ChatGPT”.

They use your conversations for further training which means if you use it to help you with proprietary code or documentation, you're effectively disclosing that.

---AI--- t1_jamo555 wrote on March 2, 2023 at 3:07 PM

#2,144,787

Replying to ShowerVagina (#2,144,549)

OpenAI updated their page to promise they will stop doing that.

Dekans t1_jamokhr wrote on March 2, 2023 at 3:10 PM

#2,144,807

Replying to fmai (#2,143,164)

> We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

...

> FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).

In the paper bold is done using the block-sparse version. The Path-X (16K length) is done using regular FlashAttention.

tomd_96 t1_jamp6kt wrote on March 2, 2023 at 3:14 PM

#2,144,842

Replying to LetterRip (#2,142,855)

Where was this introduced?

[deleted] t1_jamt0wc wrote on March 2, 2023 at 3:40 PM

#2,145,014

Replying to LetterRip (#2,142,859)

[deleted]

ShowerVagina t1_jamts00 wrote on March 2, 2023 at 3:45 PM

#2,145,058

Replying to ---AI--- (#2,144,787)

Is that for everyone or just API/Enterprise users?

lucidraisin t1_jamtx7b wrote on March 2, 2023 at 3:46 PM

#2,145,067

Replying to fmai (#2,143,164)

it cannot, the compute still scales quadratically although the memory bottleneck is now gone. however, i see everyone training at 8k or even 16k within two years, which is more than plenty for previously inaccessible problems. for context lengths at the next order of magnitude (say genomics at million basepairs), we will have to see if linear attention (rwkv) pans out, or if recurrent + memory architectures make a comeback.

ShowerVagina t1_jamyp12 wrote on March 2, 2023 at 4:17 PM

#2,145,271

Replying to Stakbrok (#2,143,823)

I might be in the minority but I strongly believe in unfiltered AI (or a minimal filter, only blocking thing like directions to cool drugs or make weapons). I know they filter it for liability reasons but I wish they didn't.

sebzim4500 t1_jan01xr wrote on March 2, 2023 at 4:26 PM

#2,145,346

Replying to Timdegreat (#2,139,977)

Would you even want to? Sounds like overkill to me, but maybe I am missing some use case of the embeddings.

Timdegreat t1_jan7sel wrote on March 2, 2023 at 5:16 PM

#2,145,740

Replying to sebzim4500 (#2,145,346)

You can use the embeddings to search through documents. First, create embeddings of your documents. Then create an embedding of your search query. Do a similarity measurement between the document embeddings and the search embedding. Surface the top N documents.

sebzim4500 t1_jan85s7 wrote on March 2, 2023 at 5:18 PM

#2,145,765

Replying to Timdegreat (#2,145,740)

Yeah, I get that's that embeddings are used for semantic search but would you really want to use a model as big as ChatGPT to compute the embeddings? (Given how cheap and effective Ada is)

fasttosmile t1_janaaex wrote on March 2, 2023 at 5:31 PM

#2,145,857

Replying to harharveryfunny (#2,144,186)

GCP, speechmatics, rev, otter.ai, assemblyai etc. etc. offer similar or better performance, as well as streaming and a much more rich output.

Pikalima t1_janc14v wrote on March 2, 2023 at 5:43 PM

#2,145,941

Replying to [deleted] (#2,145,014)

I’d say we need an /r/VXJunkies equivalent for statistical learning theory, but the real deal is close enough.

Timdegreat t1_jangbi7 wrote on March 2, 2023 at 6:10 PM

#2,146,100

Replying to sebzim4500 (#2,145,765)

You got a point there! I haven't given it too much thought really -- I def need to check out ada.

But wouldn't the ChatGPT embeddings still be better? Given that they're cheap, why not use the better option?

LetterRip t1_jani50o wrote on March 2, 2023 at 6:23 PM

#2,146,156

Replying to jinnyjuice (#2,143,404)

We don't know the supply demand curve, so we can't know for sure that the revenue increased.

LetterRip t1_janljeo wrote on March 2, 2023 at 6:49 PM

#2,146,311

Replying to lucidraisin (#2,145,067)

Ah, I'd not seen the Block Recurrent Transformers paper before, interesting.

Sea_Alarm_4725 t1_janmlir wrote on March 2, 2023 at 6:56 PM

#2,146,374

I can’t seem to find anywhere what the token limit per request is? With davinci is something like 4k tokens, what about this new chatgpt api?

minimaxir OP t1_jann3ze wrote on March 2, 2023 at 7:00 PM

#2,146,399

Replying to Sea_Alarm_4725 (#2,146,374)

4k

[deleted] t1_jao2iuo wrote on March 2, 2023 at 8:43 PM

#2,147,063

Replying to [deleted] (#2,141,485)

[removed]

MonstarGaming t1_jap3jzc wrote on March 3, 2023 at 12:59 AM

#2,148,442

Replying to Smallpaul (#2,144,096)

>I guess you haven’t visited any B2C websites in the last 5 years.

I have and that is exactly my point. The main use case is B2C websites, NOT individuals, and there are already very mature products in that space. OpenAI needs to develop a lot of bells, whistles, and integration points with existing technologies (salesforce, service now, etc.) before they can be competitive in that market.

>can translate between human languages

Very valuable, but Google and Microsoft both offer this for free.

>between computer languages

This is niche, but it does seem like an untapped, albeit small, market.

>can compose marketing

Also niche. That being said, would it save time? Marketing materials are highly curated.

>summarise text...

Is this a problem a regular person would pay to have fixed? The maximum input size is 2048 tokens / ~1,500 words / three pages. Assuming an average person pastes in the maximum input, they're summarizing material that would take them 6 minutes to read (Google is saying the average person reads 250 words per minutes). Mind you it isn't saving 6 minutes, they still need to read all of the content ChatGPT produces. Wouldn't the average person just skim the document if they wanted to save time?

To your point, it is clearly a capable technology, but that wasn't my argument. There have been troves of capable technologies that were ultimately unprofitable. While I believe it can be successful in the B2C market, I don't think the value proposition is nearly as strong for individuals.

Anyhow, only time will tell.

MonstarGaming t1_jap8605 wrote on March 3, 2023 at 1:34 AM

#2,148,613

Replying to fasttosmile (#2,145,857)

That seems to be the gist of this entire thread. This is the first API most of /r/machinelearning have heard of so it must be best on the market. /s

To your point, there are companies who have been developing speech-to-text for decades. The capability is so unremarkable that most (all?) cloud providers have a speech-to-text offering already and it easily integrates with their other services.

I know this is a hot take, but I don't think OpenAI has a business strategy. They're deploying expensive models that directly compete with entrenched, big tech companies. They can't be thinking they're going to take market share away from GCP, AWS, Azure with technologies that all three offer already, right? Right???

[deleted] t1_jap8ttt wrote on March 3, 2023 at 1:39 AM

#2,148,639

Replying to MonstarGaming (#2,148,442)

[removed]

[deleted] t1_jap9jyg wrote on March 3, 2023 at 1:44 AM

#2,148,661

Replying to [deleted] (#2,141,485)

[removed]

[deleted] t1_jap9wft wrote on March 3, 2023 at 1:47 AM

#2,148,676

Replying to Im2bored17 (#2,144,044)

[removed]

[deleted] t1_japa07x wrote on March 3, 2023 at 1:48 AM

#2,148,678

Replying to harharveryfunny (#2,139,565)

[removed]

[deleted] t1_japabmm wrote on March 3, 2023 at 1:50 AM

#2,148,693

Replying to jturp-sc (#2,139,998)

[removed]

fasttosmile t1_japaes4 wrote on March 3, 2023 at 1:51 AM

#2,148,697

Replying to MonstarGaming (#2,148,613)

To be fair, they are technically very competent and the pricing is very cheap. And their marketing is great.

But yeah dealing with B2B customers (where the money is) and integrating feedback from them is a very different thing than what they've been doing so far. They might be angling to serve as a platform for AI companies that then have to deal with average customers. That way they get to only deal with people who understand the limitations of AI. Could work. Will change the company to be less researchy though.

[deleted] t1_japaq3w wrote on March 3, 2023 at 1:53 AM

#2,148,709

Replying to farmingvillein (#2,141,051)

[removed]

[deleted] t1_japasem wrote on March 3, 2023 at 1:54 AM

#2,148,711

Replying to farmingvillein (#2,141,051)

[removed]

[deleted] t1_japauo6 wrote on March 3, 2023 at 1:54 AM

#2,148,715

Replying to farmingvillein (#2,141,051)

[removed]

MonstarGaming t1_japbd46 wrote on March 3, 2023 at 1:58 AM

#2,148,740

Replying to WarAndGeese (#2,143,547)

>They are making money somehow

Extremely doubtful. Microsoft went in for $10B at a $29B valuation. We have seen pre-revenue companies IPO for far more than that. Microsoft's $10B deal is probably the only thing keeping them afloat.

>Hence the space is ripe for tons of competition

I think you should look up which big tech companies already offer chatbots. You'll find the space is already very competitive. Sure, they aren't large, generative language models, but they target the B2C market that ChatGPT is attempting to compete in.

MonstarGaming t1_japjnn4 wrote on March 3, 2023 at 3:02 AM

#2,149,066

Replying to [deleted] (#2,148,639)

Nice, nothing demonstrates the Dunning-Kruger effect quite like a string of insults.

For whatever its worth, that argument is exceedingly weak. I'll let you brainstorm on why that might be. I don't have interest in debating with someone who so obviously lacks tact.

soobardo t1_japo5w5 wrote on March 3, 2023 at 3:39 AM

#2,149,253

Replying to harharveryfunny (#2,144,186)

Yes, they pair up perfectly. Whisper detects anything I babble to it, english or french and it's surprisingly fast. I've wrapped a loop that:

listens micro -> whisper STT -> chatgpt -> lang detect -> Google TTS -> speaker

With noise/silence detection, it's a complete hands-off experience, like chatting with a real person. Delay is ~ 5s for all calls. "Glueing" the APIs is straightforward and intuitive.

farmingvillein t1_japqcq1 wrote on March 3, 2023 at 3:58 AM

#2,149,339

Replying to Timdegreat (#2,146,100)

> But wouldn't the ChatGPT embeddings still be better? Given that they're cheap, why not use the better option?

Usually, to get the best embeddings, you need to train them somewhat differently than you do a "normal" LLM. So ChatGPT may not(?) be "best" right now, for that application.

qqYn7PIE57zkf6kn t1_japrx5u wrote on March 3, 2023 at 4:11 AM

#2,149,410

Replying to [deleted] (#2,141,485)

What does system message mean?

Bluebotlabs t1_jar58e4 wrote on March 3, 2023 at 1:39 PM

#2,151,101

Doesn't the number of tokens increase exponentially with chat history?

[deleted] t1_jarikhz wrote on March 3, 2023 at 3:21 PM

#2,151,891

Replying to Hsemar (#2,143,532)

[deleted]

[deleted] t1_jarj0kn wrote on March 3, 2023 at 3:24 PM

#2,151,908

Replying to [deleted] (#2,145,014)

[deleted]

[deleted] t1_jarkcfb wrote on March 3, 2023 at 3:33 PM

#2,151,978

Replying to qqYn7PIE57zkf6kn (#2,149,410)

[deleted]

[deleted] t1_jarmz1h wrote on March 3, 2023 at 3:51 PM

#2,152,116

Replying to Im2bored17 (#2,144,044)

[deleted]

minimaxir OP t1_jaru4ch wrote on March 3, 2023 at 4:38 PM

#2,152,433

Replying to Bluebotlabs (#2,151,101)

More cumulatively than exponentially but yes.

With the new prices that's not a big deal.

Bluebotlabs t1_jarufrq wrote on March 3, 2023 at 4:40 PM

#2,152,452

Replying to minimaxir (#2,152,433)

My mistake, I was confused with the system I was.using for chat history lol

---AI--- t1_jasgezh wrote on March 3, 2023 at 7:02 PM

#2,153,490

Replying to ShowerVagina (#2,145,058)

I only saw it mentioned in the context of API/Enterprise users.

xGovernor t1_jasx7r9 wrote on March 3, 2023 at 8:52 PM

#2,154,171

Replying to Im2bored17 (#2,144,044)

You needed the secret api key, included with the plus edition. Prior to Whispers I don't believe you could obtain a secret key. Also gave early access to new features and provides me turbo day one. Also I've used to much more and got turbo to work with my plus subscription.

Had to find a workaround. Don't feel scammed. Plus I've been having too much fun with it.

Thin_Sky t1_jav7a6e wrote on March 4, 2023 at 9:02 AM

#2,157,417

Replying to harharveryfunny (#2,139,565)

Where do you find info on these 8k and 32k token prices? Is this listed on their page or is it leaked from consultations?

CellWithoutCulture t1_javhjpc wrote on March 4, 2023 at 11:29 AM

#2,157,704

Replying to LetterRip (#2,139,908)

I mean... why were they not doing this already? They would have to code it but it seems like low hanging fruit

> memory efficient attention. 10x-20x increase in batch size.

That seems large, which paper has that?

harharveryfunny t1_javmsab wrote on March 4, 2023 at 12:34 PM

#2,157,887

Replying to Thin_Sky (#2,157,417)

It's a leak, but seems to be legitimate.

https://twitter.com/transitive_bs/status/1628118163874516992

LetterRip t1_javpxbv wrote on March 4, 2023 at 1:07 PM

#2,158,012

Replying to CellWithoutCulture (#2,157,704)

> I mean... why were they not doing this already? They would have to code it but it seems like low hanging fruit

GPT-3 came out in 2020 (they had their initial price then a modest price drop early on).

Flash attention is June of 2022.

Quantization we've only figured out how to do it fairly lossless recently (especially int4). Tim Dettmers LLM int8 is from August 2022.

https://arxiv.org/abs/2208.07339

> That seems large, which paper has that?

See

https://github.com/HazyResearch/flash-attention/raw/main/assets/flashattn_memory.jpg

>We show memory savings in this graph (note that memory footprint is the same no matter if you use dropout or masking). Memory savings are proportional to sequence length -- since standard attention has memory quadratic in sequence length, whereas FlashAttention has memory linear in sequence length. We see 10X memory savings at sequence length 2K, and 20X at 4K. As a result, FlashAttention can scale to much longer sequence lengths.

https://github.com/HazyResearch/flash-attention

CellWithoutCulture t1_javqw9s wrote on March 4, 2023 at 1:17 PM

#2,158,049

Replying to LetterRip (#2,158,012)

Fantastic reply, it's great to see all those concrete advances thst made it intro prod. Thanks for sharing.

Thin_Sky t1_jaxmuu6 wrote on March 4, 2023 at 9:24 PM

#2,161,326

Replying to harharveryfunny (#2,157,887)

Thanks!

earslap t1_jb0qamw wrote on March 5, 2023 at 3:32 PM

#2,165,503

Replying to qqYn7PIE57zkf6kn (#2,149,410)

When you feed messages into the API, there are different "roles" to tag each message ("assistant", "user", "system"). So you provide content and tell it from which "role" the content comes from. The model continues from there using the role "assistant". There is a token limit (limited by the model) so if your context exceeds that (combined token size of all roles), you'll need to inject salient context from the conversation using the appropriate role.

bdambrosio94563 t1_jb2ct4n wrote on March 5, 2023 at 10:16 PM

#2,168,001

I've spent the last week exploring gpt-3.5-turbo. Went back to text-davinci. (1) gpt-3.5-turbo is incredibly heavily censored. For example, good luck getting anything medical out of it other than 'consult your local medical professional'. It also is much more reluctant to play a role. (2) As is well documented, it is much more resistant to few-shot training. Since I use it in several roles, including google search information extraction and response-composition, I find it very dissappointing.

Luckily, my use case is as my personal companion / advisor / coach, so my usage is low enough I can afford text-davinci. Sure wish there was a middle-ground, though.

Smallpaul t1_jb5rab7 wrote on March 6, 2023 at 5:17 PM

#2,173,350

Replying to MonstarGaming (#2,149,066)

https://www.vox.com/technology/2023/3/6/23624015/silicon-valley-generative-ai-chat-gpt-crypto-hype-trend

Akbartus t1_jbs0hkp wrote on March 11, 2023 at 6:41 AM

#2,206,987

Cannot agree. It is not a deal at all. Such a pricing strategy in the long term is very profitable for its creators. But it does not matter for those who would like to use it, but due to financial situation cannot afford using such APIs for a longer period of time (think about people beyond rich countries). Moreover 1k tokens can be generated in just one small talk in a matter of a few seconds...

Comments