Submitted by GraciousReformer t3_118pof6 in MachineLearning
1bir t1_j9jbi5e wrote
Apparently* decision trees are also capable of [universal function approximation](https://cstheory.stackexchange.com/a/46405).
Whether the algorithms for training them do that as well as the ones for deep NNs in practice is a separate issue.
*Haven't seen (& probably wouldn't understand) a proof.
GraciousReformer OP t1_j9jg13p wrote
Then why not use decision trees instead of DL?
uhules t1_j9jpkun wrote
Before why, ask if. GBDTs are very widely used.
1bir t1_j9jnu2h wrote
>Whether the algorithms for training them do that as well as the ones for deep NNs in practice is a separate issue.
For supervised learning the big problem with decision trees (RFs, GBTs etc) seems to be representation learning
Brudaks t1_j9k6mo0 wrote
Because being an universal function approximator is not sufficient to be useful in practice, and IMHO is not even really a particularly interesting property; we don't care if something can approximate any function, we care whether it approximates the thing needed for a particular task; and in any case being able to approximate it is a necessary but not a sufficient condition. We care about efficiency of approximation (e.g. a single-layer perceptron is an universal approximator iff you assume an impractical number of neurons), but even more important than how well the function can be approximated with a limited number of parameters is how well you can actually learn these parameters - this differs a lot for different models, and we don't care about how well a model would fit the function with optimal parameters, we care about how well it fits the function with the parameter values we can realistically identify with a bounded amount of computation.
That being said, we do use decision trees instead of DL; for some types of tasks the former outperform the latter and for other types of tasks its the other way around.
[deleted] t1_j9kdvxd wrote
[deleted]
VirtualHat t1_j9lp6z3 wrote
There was a really good paper a few years ago that identifies some biases in how DNNs learn might explain why they work so well in practice as compared to alternatives. Essentially they are biased towards smoother solutions, which is often what is wanted.
This is still an area of active research, though. I think it's fair to say we still don't quite know why DNNs work as well as they do.
Viewing a single comment thread. View all comments