Viewing a single comment thread. View all comments

GraciousReformer OP t1_j9jr4i4 wrote

"for example on tabular data where discontinuities are common, DL performs worse than alternatives, even if with more data it would eventually approximate a discontinuity." True. Is there references on this issue?

1

yldedly t1_j9jr821 wrote

This one is pretty good: https://arxiv.org/abs/2207.08815

1

GraciousReformer OP t1_j9jrhjd wrote

This is a great point. Thank you. So do you mean that DL work for language models only when they get a large amount of data?

2

GraciousReformer OP t1_j9k1srq wrote

But then what is the difference from the result that NN works better for ImageNet?

1

yldedly t1_j9k3orr wrote

Not sure what you're asking. CNNs have inductive biases suited for images.

3

GraciousReformer OP t1_j9k4974 wrote

So it works for images but not for tabular data?

1

yldedly t1_j9k5n8n wrote

It depends a lot on what you mean by works. You can get a low test error with NNs on tabular data if you have enough of it. For smaller datasets, you'll get a lower test error using tree ensembles. For low out-of-distribution error neither will work.

3