raman_boom t1_iu87p7v wrote on October 29, 2022 at 9:09 AM

I am a senior data analyst, working in NLP domain, chatbots .

Most of our current problems are not classical NLP problems like text classification or machine translation. We will think about a business problem and really think it is possible to solve it with ML or stats, but after research, we may not be able to terrific results to convince the product manager to implement it as a feature. May be It could be our poor quality research, but the point is is there any way I could know before hand that a particular problem can be solved with ML.

Another problem is the dataset size, we have limited data set and as usual ML models need more data to give good results, and it would be great if we have a scientific way of telling that if I get n data point my algorithm would work with a particular accuracy.

FierceQuanta t1_iu9nz5z wrote on October 29, 2022 at 5:17 PM

It is a really specific thing and depends on your settings, but maybe it could help you with the last problem. You can get some theoretical results on the amount of data needed for testing (that obviously gives you only very general idea on the amount of training data) using Hoeffding inequality.

Pancosmicpsychonaut t1_iua0cxl wrote on October 29, 2022 at 6:44 PM

Could you elaborate on what analysis you already do to determine if ML might be useful? I’m currently at that stage in a project myself.

raman_boom t1_iua2lfp wrote on October 29, 2022 at 7:00 PM

There aren't any with me, we can use EDA with T-sne and all, but still, I will go ahead and create simple models to try out and see the results.