Hey, I’m a Senior ML research engineer currently working at the intersection of the automotive and security industries. It’s the weekend and I’m just being curious. I’m wondering about other people who are working on building products with ML/AI at the core. What are the bottlenecks people working in our field regularly face in their project development lifecycle? Is it data collection, QA, internal tooling, mode development, real-world performance evaluation etc.? What tools do you wish existed to help clear these bottlenecks? Tell me about them! Maybe they already exist and someone might be able to point you in the right direction!

How about this response template: What’s your role? What industry does most of your work fall into? How big is your team? What area of the entire ML project lifecycle do you think stops you from doing great work the most? If you found a genie lamp and could wish into existence three tools (no matter how technically difficult to create) to support you what would they be?

Comments

You must log in or register to comment.

raman_boom t1_iu87p7v wrote on October 29, 2022 at 9:09 AM

I am a senior data analyst, working in NLP domain, chatbots .

Most of our current problems are not classical NLP problems like text classification or machine translation. We will think about a business problem and really think it is possible to solve it with ML or stats, but after research, we may not be able to terrific results to convince the product manager to implement it as a feature. May be It could be our poor quality research, but the point is is there any way I could know before hand that a particular problem can be solved with ML.

Another problem is the dataset size, we have limited data set and as usual ML models need more data to give good results, and it would be great if we have a scientific way of telling that if I get n data point my algorithm would work with a particular accuracy.

FierceQuanta t1_iu9nz5z wrote on October 29, 2022 at 5:17 PM

It is a really specific thing and depends on your settings, but maybe it could help you with the last problem. You can get some theoretical results on the amount of data needed for testing (that obviously gives you only very general idea on the amount of training data) using Hoeffding inequality.

Pancosmicpsychonaut t1_iua0cxl wrote on October 29, 2022 at 6:44 PM

Could you elaborate on what analysis you already do to determine if ML might be useful? I’m currently at that stage in a project myself.

raman_boom t1_iua2lfp wrote on October 29, 2022 at 7:00 PM

There aren't any with me, we can use EDA with T-sne and all, but still, I will go ahead and create simple models to try out and see the results.

DigThatData t1_iuadfxx wrote on October 29, 2022 at 8:18 PM

i'd like to see more researchers publish their code with a setup.py

slippu t1_iu9t2mu wrote on October 29, 2022 at 5:52 PM

i need a tool that allows me to use less tools please

biggieshiba t1_iub2r5m wrote on October 29, 2022 at 11:27 PM

Then you need a ML expert to code python tools for you that encapsulate many tools with your preferred settings

biggieshiba t1_iub2sgl wrote on October 29, 2022 at 11:28 PM

(Or open source and a lot of effort)

Fine-Topic-6127 OP t1_iua1e31 wrote on October 29, 2022 at 6:51 PM

What tools do you use?

DigThatData t1_iue34mz wrote on October 30, 2022 at 5:08 PM

it's called "python"

biggieshiba t1_iu8524i wrote on October 29, 2022 at 8:29 AM

Hello, I'm learning as a hobbyist and want to go to production with my trained model.

I know front and back end coding but serving and scaling a model in production seems daunting. I'm looking at AWS right now but it doesn't seem like the easiest tool to deploy ML models. I thought it would be much easier to deploy a model! (real world performance is another problem I will have to study soon)

cantfindaname2take t1_iu9uyhv wrote on October 29, 2022 at 6:06 PM

In my experience it's much easier to serve models through either AWS Lambda with containers or just upload your model in a container to an EC2 instance.

biggieshiba t1_iub2lcd wrote on October 29, 2022 at 11:26 PM

Thanks mate I will learn this then. I find AwS is always a bit long to learn, but let's go, in the long run it always pays. Didn't want to go for something more complicated then I needed, I thought easier (and more expensive) services existed.

Salt_Meat2979 t1_iu8glb0 wrote on October 29, 2022 at 11:14 AM

I’m staff ML engineer , I am working in cyber security domain . Our product is b2b , and there are lot of Ml challenges . Primary is find potential use case where ML uniquely can solve the problem . Second is validate your solution as its b2b , we have limited number of users & getting feedback is difficult . In cyber security domain false positive & false negatives both are costly , so need to tune algo according to each customer . We have followed augmented AI approach rather than fully automated so that customer can understand why some activity is suspicious and he can take action if it’s genuine .

In the last two weeks I am working on new product , I can’t able to come up with ML use case which can convince my Manger . I am exploring on competitor products to get more exposure now .

ZombieRickyB t1_iufdnzi wrote on October 30, 2022 at 10:17 PM

Research scientist in industry but I'll talk about something that I lead outside of my job, a project in computer vision designed to basically create a good open source alternative to a bunch of bio-imaging tools behind paywalls. I maintain the code base but work with my old lab to keep expanding and developing (at this point, I mostly do software work/troubleshoot methods from old papers that have weird issues). Currently focusing on anthropology but long term I want to push into computational neuroscience because I have a bunch of problems with that field from both the perspective as and adjacent researcher, and as family of a neurology patient...

Biggest problem is by far lack of existing tools outside of MATLAB, mostly for visualization and signal processing. Python is okay but not quite there in my experience. MATLAB alternatives are more or less in the same boat. The main challenge is that "performance evaluation" is inevitably qualitative. The benchmarks used to publish results aren't quite meaningless but they're mostly for show/make ML people happy. Practitioners don't really seem to care. That leads to a situation where I need really good 3D visualization tools that are interactive. Python hasn't been good for that. Current "free" solution is to go into Javascript but other issues arise because things just aren't configured properly for the space I work in.

The other big challenge is a really high bar for data quality prior to subsequent analysis. Most of my actual work here is spent in filling in niches of computer vision where almost nothing in top conferences/journals applies. This creates another challenge, since I have to do things from scratch. Say I'm registering 3D objects, because that's a lot of what I care about. Textures look mostly okay but have a problem using state of the art method? Rejection because prior work indicates that major findings in the past have been heavily biased by tiny issues...very little margin for error.

Then there's compute. To get meaningful attention, I have to assuming the users will have mediocre compute resources, most certainly no GPUs. Also a big limitation. Can't pre-train much of anything either.

If I had a genie, I'd mostly wish for interactivity in Python to be better because that would take care of a lot of headache...or for Julia or something to be more mature. Lots of interesting research to do and can be done in a short order, bottlenecked by lack of existing tools for intuitive interface design + lack of time to do them, even if I ended up getting paid.