Viewing a single comment thread. View all comments

Super-Martingale OP t1_isgacv9 wrote

In the past, I did fuzzy matching plus a manual selection for smaller lists like a few thousand strings. But for millions of rows, this is just impossible. So we are wondering whether AI-based approaches can help.

2

hjmb t1_isgaxow wrote

I would be wary - AI approaches tend to give you plausible answers, not true answers. Also it may be worth updating your post to make it clear that you're looking for AI solutions to your problem, rather than looking for data cleaning advice for a dataset that you are going to feed into a machine learning system (which is what I inferred)

1

Super-Martingale OP t1_isgey5g wrote

There is definitely a tradeoff between accuracy and efficiency. We are not sure which approach would be better, so want to keep the discussion broad.

1