BrisklyBrusque t1_j43dsux wrote

You might enjoy “Well-Tuned Simple Nets Excel on Tabular Data”

Authors wrote a computer routine that leverages BOHB (Bayesian optimization and Hyberband) to search an enormous search space of possible neural network architectures. The authors allowed the routine to select different regularization techniques, including many ensemble techniques like dropout, snapshot ensembles, and others that render the choice of parameter initializations less critical. However, authors used the same optimizer (AdamW) in all experiments.

Not exactly what you are looking for but hopefully interesting.


BrisklyBrusque t1_j0hx440 wrote

Yes, lots. For example, in 2019 a paper introduced a new split rule for categorical variables that reduces computational complexity.

A lot of researchers are also exploring adjacent tree ensembles such as extremely randomized trees (2006) and Bayesian additive regression trees (2008). The former is very similar to random forests. There is a strong possibility other tree ensembles have yet to be discovered!

If you’re a fan of computer science / optimized code, there is a great deal of research concerning making tree models faster. The ranger library in R was introduced as an improvement on the randomForest package. There is also interest in making random forests scale up to millions of variables, to deal with genetics data.

Hummingbird is a Microsoft project that seeks to refactor common machine learning methods using tensor algebra, so those methods can take advantage of GPUs. I don’t know if they got around to random forests yet.

Random forests raise a lot of questions about the relationship between ensemble diversity and ensemble accuracy, about which there are many mysteries.


BrisklyBrusque t1_iy3slot wrote

Well sometimes dimension reduction is used to maintain the most important aspects of the data in as few vectors as possible, particularly when we want to visualize high-dimensional data or escape the curse of dimensionality. Other times dimension reduction is more of a regularization technique. Think of self-organizing maps, RBMs, autoencoders, and other neural nets that learn a representation of the data, which can then be passed to another neural net as the new training sample.

So dimension reduction is itself a technique with many distinct applications.


BrisklyBrusque t1_iv6ogqg wrote

2007-2010: Deep learning begins to win computer vision competitions. In my eyes, this is what put deep learning on the map for a lot of people, and kicked off the renaissance we see today.

2016ish: categorical embeddings/entity embeddings. For tabular data with categorical variables, categorical embeddings are faster and more accurate than one-hot-encoding, and preserve the natural relationships between factors by mapping them to a low dimensional space


BrisklyBrusque t1_is7g6gk wrote

SVMs prevailed against neural networks in a big image classification contest in 2006. Then they fell out of favor, with other learning algorithms like



•Decision stumps

•Multivariate adaptive regression splines

•Flexible discriminant analysis



Not sure which of these will come back, but it’s funny how often ideas are rediscovered (like neural networks themselves, which were branded as multilayer perceptrons initially)