I serve the AI industry, primarily building, configuring and selling GPU-accelerated workstations/servers and cloud instances.

Most people and companies buy and rent these things based on necessity. *You can't really dig holes effectively if you don't have a shovel kind of thing.*

I'm obviously not the only provider in the market. And I'm not one of the largest. Some choose me because I save them a lot of money and some choose me because I'm really really good at what I do(configuring and optimizing). (Yes, I'm confident enough to put that out there.)

When I'm taking care of an upgrade situation, it's usually because of one of two things.

The hardware is outdated and needs a refresh to be able to support modern processing tools.
The client's project is scaling and they need more compute power or VRAM (generally).

My question is there anyone (or companies) out there who actually cares to upgrade based on speed?

Like is anyone going through the upgrading process simply because they want to train their models faster(save time)? Or bring more value to their clients by having their models inference faster?

I'd like anyone's opinion on this but if you fit the description of this type of client, I'd like to know the thought process of upgrading. Whether you've been through it in the past or something you're going through now.

Comments

You must log in or register to comment.

LastVariation t1_iu1mbuj wrote on October 27, 2022 at 9:46 PM

Inference speed is important because that's what goes to production typically. If you're already waiting hours-days on training, then it probably takes an order of magnitude improvement to make the investment worth it. As a private consumer, my bottleneck to upgrade is my steam library.

GPUaccelerated OP t1_iu1s1xs wrote on October 27, 2022 at 10:28 PM

Thanks for the comment! You're definitely right.

Ogawaa t1_iu21yuz wrote on October 27, 2022 at 11:45 PM

Generally speaking, since response times and expected peak requests are usually defined in advance upgrading for speed will only happen if current hardware cant meet the requirements.

It's not really a "bring more value" thing, the customer will either have a clear target throughtput figure or we'll work one out together that satisfies what they want to do (usually trying to make it as low as possible because fast hardware = money) and then it's just a question of whether the current hardware is enough or not.

GPUaccelerated OP t1_iu4z9qq wrote on October 28, 2022 at 4:16 PM

Yup. The thought framework you just described is popular based on my experience. But you worded it in a way that really makes it understood.

Thank you for sharing!

cnapun t1_iu3cqta wrote on October 28, 2022 at 6:47 AM

I'm probably not the target demographic here (work in mid-size? tech), but I have a couple vague thoughts:

training speed == dev velocity, train more models -> either get things ready faster or make model better in same time
training speed == training cost if you're using on-demand compute. Depending on the company, they might not use on-demand (or might not care about cost). What i usually have seen happen is this never-ending cycle of slow training -> optimize -> add something that ends up leading to performance regression (maybe a new feature slows dataloading -> optimize again -> ... forever. Because of this, i think fundamental training optimizations can be useful, but it's super easy to introduce regressions and just accept them bc it's not usually a priority
For realtime systems powered by ML, latency == engagement. You can get substantial improvements in engagement from running inference of ranking models faster

GPUaccelerated OP t1_iu4ywuh wrote on October 28, 2022 at 4:13 PM

Your points are super valid. This Is what I'm generally understanding.

Adding features and optimizing look like a viscous circle more often than not.

Thank you for commenting!

LordDGarcia t1_iu3hp6v wrote on October 28, 2022 at 7:56 AM

In my experience (automating quality control in automotive industry), inference time is critical.

Every step in a production line must be perfectly timed. If the factory must produce, let say, 30 cars per hour to achieve profits, even a 1s delay in a single step may turn to inmediate losses.

On the other hand, that 1s gain may translate to huge profits.

Generally, clients specify the total amount of time available to perform all the calculations (Data acquisition, preprocessing, inference and postprocessing) so we have to find for the optimal solution in each case. (Spoiler: the best solution and the optimal one are rarely the same)

The lower the inference time is, the better the output of the other processes are, which translates in a better overall performance.

An improvement in training speed may allow you to train more models, optimizing the developing stage --> Hardware and human resources optimization.

To sum up:

Inference speed --> Normally bottleneck in industry implementations.

Training speed --> Better use of resources. Better service to clients.

Idk how different industries work, but I guess it is similar.

P.S. This is my opinion based on the experience I have in my niche. 😬

GPUaccelerated OP t1_iu4y6sh wrote on October 28, 2022 at 4:08 PM

You put so much into perspective. And It's rare that I get contact with your industry. I'd love to learn more.

Thank you for your well described comment. I definitely appreciate it!

BackgroundChemist t1_iu3f3rz wrote on October 28, 2022 at 7:19 AM

The impact of training time is not linear so neither are the benefits of speeding up. For example, going from 1hr to 5 minutes would be useful for experimentation/early development phases. However once I am training a model for Production then 12 hours overnight is fine. I have other things to do to fill the time. I think what is useful for faster training is to be able to see that the model is converging.

Inference time is important up to a point but performance engineering is about steady optimisation over the whole system. You can reach a floor on one part like inference and still have work in network or cpu-bound stages.

GPUaccelerated OP t1_iu4ygne wrote on October 28, 2022 at 4:10 PM

Yeah that makes a lot of sense because we're not just dealing with one bottleneck. There are many possibilities, as you stated.

Thank you for your comment!

PassionatePossum t1_iu3k7ga wrote on October 28, 2022 at 8:33 AM

For me, speed is only important when it comes to inference. And for the problems I work on (medical imaging) I don't scale the hardware to the model, I scale the model to the hardware.

I don't train models that often and when I do I don't really care if it takes 2 days or 2,5 days. Of course, if you are working on really large-scale problems or need to search through a huge parameter space you will care about speed but I doubt the average guy is that interested. During training the most important resource for me is VRAM.

GPUaccelerated OP t1_iu4xju0 wrote on October 28, 2022 at 4:04 PM

This is what I'm seeing the most. Which makes so much sense for your use case.

Thank you for sharing!

_DarthBob_ t1_iu4jm4y wrote on October 28, 2022 at 2:30 PM

We have customers for whom prediction time is critical

As long as we meet the SLA though they're happy. So it's a balance of cost and required level of performance. If the increased speed can pay for itself then they will pay but at the moment it's pretty much faster than human perception.

When I worked in algo trading though speed was everything but there you're looking in router FPGAs for fastest possible speed.

GPUaccelerated OP t1_iu4wwwx wrote on October 28, 2022 at 4:00 PM

Ok yeah that's what I'm understanding. Thank you for your comment!

badabummbadabing t1_iu3ubql wrote on October 28, 2022 at 10:51 AM

My company cares about training time. We are iterating on one main model, and training a series of experiments in a shorter amount of time allows you to have a faster iteration cycle. I think many people absolutely underappreciate this. There are also critical training times which you may need to hit in order to make real use of that. For example, if your training time is below on the order of 2 days, you may be able to get 2 (or even 3) iteration cycles in per week. A training time of 3 days reduces this to 1-2 iteration cycles per week. A training time of 4 days means that you can only realistically achieve 1 iteration cycle per week.

Another way of thinking about this is that doubling your training speed also doubles the amount of hardware you have at your disposal, and halves the cost per experiment.

GPUaccelerated OP t1_iu4xa3i wrote on October 28, 2022 at 4:02 PM

This perspective and use case is really important to note. Thank you for sharing! Your last comment makes so much sense.

Appropriate_Ant_4629 t1_iu50aox wrote on October 28, 2022 at 4:23 PM

I think you'll find 2 kinds of customers:

Those developing their own models that they mostly sell to others (whether directly or as part of a consulting service) -- they'll care a lot about training time but much less about inference time.
Those using models, often from others (birt out of the box) or lightly fine-tuned models (legalbert) -- they'll care about inference time, but won't care about training time.

GPUaccelerated OP t1_iuimwx3 wrote on October 31, 2022 at 4:47 PM

The way you separated it in 2 categories is very useful for understanding. Thank you!

Throwaway00000000028 t1_iuhquh6 wrote on October 31, 2022 at 12:58 PM

It depends on the product of course. If you're making software for self-driving cars then of course the latency is going to be important. If you're writing software to review people's emails, maybe latency isn't so important.

GPUaccelerated OP t1_iuimz8s wrote on October 31, 2022 at 4:47 PM

Right!