a4mula t1_ixpmkaz wrote on November 25, 2022 at 8:28 AM

#663,984

It's been a few months but back when I looked into this Colab+ offered access to the V100s on a priority basis. You're guaranteed at least 24 concurrent hours on one per month. Anything past that is prioritized based on useage.

As to if it matters? Sure. Good luck training on the p100s. Not only are they significantly slower, 2-3x as much but they're limited to 32GB of VRAM where the V100s are extended to 53GB.

This can place limitations on training beyond just speed. It means you might have to break larger jobs in smaller tasks.

If you're doing this for more than just a passing interest, it's a great investment.

edit:

I missed the part where you said non-AI. What kind of coding are you doing that requires CUDA and gpu access if not ML?

blry2468 OP t1_ixpntx4 wrote on November 25, 2022 at 8:47 AM

#664,128

Replying to a4mula (#663,984)

My problem is just that my code takes 42 hours estimated run time. It is completely conventional python code. I was just wondering if this colab + would be beneficial in this scenario cause idk what the colab + actually does to conventional code.

a4mula t1_ixpoeud wrote on November 25, 2022 at 8:55 AM

#664,183

Replying to blry2468 (#664,128)

Have you stopped to consider that perhaps there's an alternative approach to more effective algorithms?

Unless you're doing something along the lines of SQL calls to the world's largest async database, your code probably shouldn't require 42 hours to complete.

Not that there isn't code like that. But those aren't being run on either local pcs or colab.

Can you explain in two sentences or less what the gist of this program is?

blry2468 OP t1_ixpouh4 wrote on November 25, 2022 at 9:02 AM

#664,214

Replying to a4mula (#664,183)

The program is running a radar simulation and signal processing algo which runs many loops to generate an ROC graph to check efficiency of radar detection method. The base code of simulation and detection takes 30s to run and there are multiple for loops around it to generate the data points for a graph with axises of probability of detection, probability of false alarm and Signal to noise ratio varying. This means 3 for loops, each within each other, one with 25 repetitions, one with 30 and one with 10. This sums the time to 42 hours.

a4mula t1_ixppup5 wrote on November 25, 2022 at 9:17 AM

#664,307

Replying to blry2468 (#664,214)

Have you researched loopless coding at all? If nothing else, are you practicing sound early exit strategies?

If it's not proprietary code, or if you can slap together a pseudo version that's okay for public consumption you might paste it up to something like stackoverflow.

Nested loops are standard practice, on small datasets.

This is not that.

I'd take a peek at this wiki on nested optimization to get an idea how how you might get around it.

If not, again stackoverflow is a great resource full of expertise in things like optimization.

Pocok5 t1_ixpumno wrote on November 25, 2022 at 10:31 AM

#664,685

Replying to blry2468 (#664,214)

> It is completely conventional python code.

So, you're not doing any GPU compute at all? Only CPU? See if your algorithm can be parallelized on a GPU (a good sign that you can do so is doing the same operation over elements of huge arrays where computing the result depends only on the input array - a convolution is such, trying to do fill a vector with a fibonacci sequence is not).

blry2468 OP t1_ixpvkpt wrote on November 25, 2022 at 10:45 AM

#664,763

Replying to Pocok5 (#664,685)

Unfortunately my code is not running the same operation over again but with changes at each time it loops for graph plotting of different points. The code also draws from multiple other code files not just an array and so idt it can be converted to be gpu utalised?

blry2468 OP t1_ixpvnjr wrote on November 25, 2022 at 10:47 AM

#664,770

Replying to Pocok5 (#664,685)

What is the requirements for it to be gpu utalised thou? Like from what I know, usually my codes only use gpu if its a AI or machine learning code not normal conventional coding?

Pocok5 t1_ixpxqo9 wrote on November 25, 2022 at 11:18 AM

#664,965

Replying to blry2468 (#664,770)

A GPU can do a shitton of data-parallel stuff. If you find yourself doing the same operation over a ton of data points, it's worth thinking about whether you can do it all at the same time. Since you are doing python, check Numba https://carpentries-incubator.github.io/lesson-gpu-programming/03-numba/index.html

https://numba.readthedocs.io/en/stable/cuda/cudapysupported.html

blry2468 OP t1_ixpyuub wrote on November 25, 2022 at 11:34 AM

#665,065

Replying to Pocok5 (#664,965)

Yes, except my for loops use the number of times it is looped and the loop step as variables, so it does not repeat the same operation over and over again cause it changes each time it loops by abit. So thats the issue.

Pocok5 t1_ixq2k5l wrote on November 25, 2022 at 12:22 PM

#665,434

Replying to blry2468 (#665,065)

If it's just that then it's no issue, in fact it is integral to how CUDA works (I'm assuming loop step is constant over one run of a loop). You get the index of the current thread and you can use it - for example the CUDA prime check example is "check the first N integers for primes" -> start N threads and do a prime check algorithm on the thread index. The only problem happens if your loop #n+1 uses data calculated during the #n loop.

ELI5: Whats does compute credits in Google Colab actually do to the code running speed as opposed to normal free Colab

Comments