grenouillefolle t1_j6i36hx wrote on January 30, 2023 at 2:20 PM

I have a (seemingly) simple question concerning systematic studies for classification problems. Is there any literature (books, papers) describing an approach for systematic studies on classifiers, such as varying the size of the training sample, number of input variables, size of the correlation between input variables and classes on simulated data, type of classifier, configuration of parameters of the algorithm etc.?

The goal is to prove the robustness and limitations of the method before training on real data. While I have a good feeling of what can and should be done, I want to point a beginner in the right direction for a project without doing all the hard work myself.

qalis t1_j6iqvql wrote on January 30, 2023 at 5:00 PM

Somewhat more limited than your question, but I know two such papers: "Tunability: Importance of Hyperparameters of Machine Learning Algorithms" P. Probst et al., and "Hyperparameters and tuning strategies for random forest" P. Probst et al.

Both are on Arxiv. First one concerns tunability of multiple ML algorithms, i.e. how sensitive are they in general to hyperparameter choice. Second one delves deeper into the same area, but specifically for random forests, gathering results from many other works. Using those ideas, I was able to dramatically decrease the computational resources for tuning by better designing hyperparameter grids.