Been an industry data scientist for 6 years in fintech and gaming.
In fintech, I sensed a need for interpretability and robustness. Also, I was not working with a lot of data(~500k observations to train models). Consequently, I got into the habit of building tree-based models by default, specifically xgboost. Used explainability techniques such as shap to explain models.

After moving to online gaming, the scrutiny is less and the scale is far more. I now have the freedom to use deep learning. I need to be able to demonstrate the effectiveness using experiments, but beyond that, do not need explainability at a granular level. Advantages I see with using deep learning-

Custom loss functions - basically any differentiable loss function can be trained on. This has huge advantages when the business goal is not aligned with the loss functions out of the box
Learning Embeddings - The ability to condense features into dense, latent representations which can be used for any number of use cases
Multiple outputs per model - tweaking the architecture

See all this, Deep learning seems to offer a lot of advantages, even if the performance might be similar to tree-based methods. What do you guys think?

Comments

You must log in or register to comment.

blablanonymous t1_j1msyd5 wrote on December 25, 2022 at 6:17 PM

Can’t you also create custom loss functions for XGBoost? I’ve never used it myself but it seems as easy as doing it for an ANN
Is it always trivial to get meaningful embeddings? Does taking the last hidden layer of ANN guarantee that representation will be useful in many different contexts? I think it might need more work than you expect. I’m actually looking for a write up about what conditions needs to be met for a hidden layer to provide meaningful embeddings. I think using a triplet loss intuitively favors that but I’m not sure in general.
XGBoost allows for this too, doesn’t it? The scikit-learn API definitely at least let’s you create MultiOutput models very easily. Granted it can be silly to have multiple models under the hood but whatever works.

Sorry I’m playing devil’s advocate here, but the vibe I’m getting from your post is that you’re excited to finally getting to play with DNN. Which I can relate to. But don’t get lost in that intellectual excitement: at the end of the day, people want you to solve a business problem. The fastest you can get to a good solution the better.

In the end it’s all about trade offs. People who employ you just want the best value for their money.

Naive-Progress4549 t1_j1mniyu wrote on December 25, 2022 at 5:35 PM

I think you need a comprehensive benchmark, you might find your deep learning model to miserably fail even in a simple scenario. Thus I would recommend to double check the requirements, if your business does not particularly care about some possible bad predictions, then it should be fine, otherwise I would look for some more deterministic models.

Maximum-Ruin-9590 t1_j1mzp41 wrote on December 25, 2022 at 7:09 PM

You can write your own custom functions for tree based models.

Here is an pretty good industry show case with lightGBM. Found it on reddit:

https://doordash.engineering/2021/06/29/managing-supply-and-demand-balance-through-machine-learning/

It comments on their custom optimizer.

Maximum-Ruin-9590 t1_j1mzyh7 wrote on December 25, 2022 at 7:11 PM

Btw depending on the task, you can also check out temporal fusion transformers (forecasting, nlp, Image recognition)

CriticalTemperature1 t1_j1nf8g1 wrote on December 25, 2022 at 9:11 PM

Take a look at TabPFN, which uses meta-learned networks for tabular data prediction: https://www.automl.org/tabpfn-a-transformer-that-solves-small-tabular-classification-problems-in-a-second/

blablanonymous t1_j1nno5z wrote on December 25, 2022 at 10:20 PM

What’s the TLDR for this? I’ve meant to try it but never got around to

CriticalTemperature1 t1_j1nq72n wrote on December 25, 2022 at 10:41 PM

The authors trained a transformer that takes a tabular data set and can learn SoTA embeddings on categorical data in under a second

cthorrez t1_j1ossr4 wrote on December 26, 2022 at 4:25 AM

for datasets with ~1000 rows and 10 columns iirc

PiracyPolicy2 t1_j1pe1fa wrote on December 26, 2022 at 8:51 AM

Yep exactly. Been keeping an site out but afaik, still the case. And I mean, what’s the point of that

Ill-Branch-3323 t1_j1pu22w wrote on December 26, 2022 at 12:47 PM

Small correction - 100 columns, and only classification (not regression) so yeah, not very practically useful yet, though a really cool idea

blablanonymous t1_j1oc0sp wrote on December 26, 2022 at 1:49 AM

Sweet

jbreezeai t1_j1n20mh wrote on December 25, 2022 at 7:28 PM

So one consistent feedback I have got from financial services customers is transparency and explain ability. My understanding is these 2 factors are reason why the adoption of dl in low. Especially when you have tons of audit and regulatory requirements.

rshah4 t1_j1nbfkn wrote on December 25, 2022 at 8:41 PM

I am with you. While I generally favor trees for tabular data, there are some advantages of deep learning as you mentioned. I haven't heard many success stories out of industry for moving away from trees to deep learning, outside of Sean Taylor talking about using deep learning at Lyft. My guess is the extra complexity of using deep learning is probably only useful in a small set of use cases.

Deep learning is probably also useful in multimodal use cases. If people are using deep learning for tabular because of these advantages, I would love to hear about it.

yunguta t1_j1rm1lp wrote on December 26, 2022 at 9:18 PM

I agree with you, another benefit I might add is the scalability to very large data.

To give an example, the team I work on processes point cloud data, which is easily in the millions or billions of points for a single dataset. Random forest is popular to do per-point classification of the point cloud. However, you need distributed computing for large real-world datasets (think large geographic extents), whereas with a simple MLP, you can train the model in batches with a GPU. Multi-GPU is the next step for scaling. Very natural progression here. Inference is still blazing fast here too.

I personally see DL models as a flexible and modular way to build models with the benefit of improving a model through deeper networks, different activation functions, and network modules. If you need to go simple, just use less layers :-)

As others have mentioned, use the tool which fits the problem. But, a neural network does have the advantages you mentioned, and should also be considered.

zenonu t1_j1o62bm wrote on December 26, 2022 at 12:55 AM

Don't get attached to any particular method. Train and combine them all (ensemble learning) to get the best possible loss on your validation data you could hope for.

Agitated-Ad-7202 t1_j1prnbh wrote on December 26, 2022 at 12:16 PM

Yeah, good luck maintaining those ensembles in production!

Agreeable-Ad-7110 t1_j25sh94 wrote on December 29, 2022 at 9:33 PM

Sorry, I’m not following, what’s the problem here? I had done this but the model inference only had to be run once in the morning every morning so it was a little different. Is there something else I’m missing with the maintenance of the ensemble?

acardosoj t1_j1sibr6 wrote on December 27, 2022 at 1:28 AM

For me, the only time deep learning was worth all the work was when I had a large unlabelled dataset and I've done pre-training with it before the main task.