Hello people,

I am currently attending a Data Science course and to finish I have to write a paper about a project that I am currently working on. I write the code in VSCode and I use .ipynb notebooks.

So I am basically training a few ML models after a long data preprocessing which worked out fine. But as soon as I run my hyperparameter tuning code, my PC takes a lot of time. Right now I am running hyperparameter tuning for RandomForest and it already runs for 21 hours.

Is there any possibility for me to run my code somewhere else? I read abour Heroku, but that seems to be too much than what I am looking for. I am getting a bit nervous, because I want to get this paper done. The worst case is that I have to buy a new PC.

Thank you so much!

Comments

You must log in or register to comment.

BlazeObsidian t1_j6xbu8f wrote on February 2, 2023 at 3:42 PM

You can try Kaggle notebooks and Google Colab notebooks but they don't persist for that long. They typically shut down after 6 hours. You'll have to periodically save your best model/hyperparameters but that might be a viable free option.

Google Colab also has a paid option where you can upgrade the RAM, GPU etc.. to meet your needs.

But I am curious as to why it's taking 21 hours. Have you checked in your course forums/discussions for the expected time ?

https://www.kaggle.com/

https://colab.research.google.com/

qalis t1_j6xkna5 wrote on February 2, 2023 at 4:38 PM

If you are tuning hyperparams for RF for 21 hours, you are doing something wrong. RFs often do not require any tuning at all! Additionally, are you using all available cores? Are you using some better HPO algorithm like TPE in Optima or Hyperopt?

Emergency-Dig-5262 OP t1_j6xx6w6 wrote on February 2, 2023 at 5:55 PM

I am using GridSearchCV. I don't know Hyperopt or TPE, but I will definitely do some research about them. Thank you!

Good call on the cores. That's the next thing I will check out!

godx119 t1_j6xywl5 wrote on February 2, 2023 at 6:06 PM

I was able to get $100 credit on Azure by signing up as a student, I would think that would cover whatever resources you need for your project.

thevillagersid t1_j6xgoi5 wrote on February 2, 2023 at 4:13 PM

Look into getting setup with a cloud solution, like Amazon Sage Maker or something similar. The potential gains to moving to a more powerful machine will be strongly dependent, however, on what is causing your code to take so long to execute. If it's just taking a long time because you're searching over a massive, high dimensional grid, moving to a better machine might offer limited improvements, and you might need to look into splitting the workload across a cluster of machines.

ggf31416 t1_j741sxn wrote on February 3, 2023 at 10:31 PM

One possibility is GPU acceleration using the cuML framework, but if you are must use a specific framework like sklearn it won't be feasible. https://medium.com/rapids-ai/accelerating-random-forests-up-to-45x-using-cuml-dfb782a31bea

There are some alternatives such as Google, AWS, Gradient, you may be able to get student credits. Also, even if you don't need a GPU, you can rent an instances with many CPU cores at Vast.ai for cheap (even with the GPU it's cheaper than a CPU only AWS instance with the same amount of cores), for example the cheapest instance with 16vCPU is < $0.20/hour and only needs a credit card. The main issue with vast.ai is that you should save your results before shutting down the instance because they are tied to the machine which may become unavailable.