Viewing a single comment thread. View all comments

Mozillah0096 t1_ivtqxxc wrote

Thanks for the efforts .I have read on one article of medium that coco dataset has lot of errors in it .Is it true ?

17

that_username__taken t1_ivts6r0 wrote

yup, there are a lot of inconsistencies there. Engineers never learn about the importance of annotation/data quality, in best case they skim over the topic. In reality successful companies spend most of their budget on data annotation.

22

iknowjerome OP t1_ivtti9x wrote

Every dataset has errors and inconsistencies. It is true that some have more than others, but what really matters is how that affects the end goal. Sometimes, the level of inconsistencies doesn't impact model performance as much as one would expect. In other cases, it is the main cause of a poor model performance, at least in one area (for instance, for a specific set of classes). I totally agree with you that companies that succeed in putting and maintaining AI models in production pay particular attention to the quality of the datasets that are created for training and testing purposes.

12

that_username__taken t1_ivttxzf wrote

Yeah I agree, but finding those errors at the end of the cycle is extremely painful and time consuming.

2

iknowjerome OP t1_ivtw0xs wrote

The trick is not to wait for the end of the cycle to make the appropriate adjustments. And there are now a number of solutions on the market that help with understanding and visualizing your image/video data and labels.

5

Mozillah0096 t1_ivtxgd3 wrote

u/iknowjerome can u tell me those solutions which u are talking about

1

jonas__m t1_ix5ey4i wrote

cleanlab is an open-source python library that checks data and label quality

2

Mozillah0096 t1_ivtt7mq wrote

do u have any recommendation for the best data annotation tool .
like which can extract features like selecting the boxes of specific types or providing the details like boxes overlapped

1

iknowjerome OP t1_ivu165k wrote

It really depends on what you are trying to achieve, what your budget is, and where you are in your model development cycle.
Nevertheless, I would recommend starting in self-service mode with the simplest tool you can find. This might be something like CVAT, though there are a number of other options (paid, free, SaaS, etc.) out there that a simple google search will return. Once you're ready to scale, you might want to consider handing off your annotations to specialized company like Sama. And yes, we also do 3D annotations. :)
(disclaimer: I work for Sama)

3

that_username__taken t1_ivttp7n wrote

Really depends on the size and the budget of the project. If both are large enough you should really outsource this task and superannotate has both a great platform(free for academics,I used this) and according to g2 they are the highest rated. A friend of mine told me that if you want automotive data then scale is more specialized there

2