iknowjerome
iknowjerome OP t1_ivweoyi wrote
Reply to comment by Mozillah0096 in [R] A relabelling of the COCO 2017 dataset by iknowjerome
iknowjerome OP t1_ivuua0z wrote
Reply to comment by Gaudy_ in [R] A relabelling of the COCO 2017 dataset by iknowjerome
>company
Looking forward to hearing what you think. Just to be clear. I'm always reluctant to calling one dataset better than another because it always depends on what you're trying to achieve with it. With Sama-Coco, we were trying to fix some of the misclassification errors when possible but we also put a significant amount of effort in drawing precise polygons around the objects of interest because of experiments we are currently running. And, of course, we wanted to capture as many instances of the COCO classes as possible. This resulted in a dataset with close to 25% more object instances than the original COCO 2017 dataset. But it's not to say that we solved all "errors" in COCO. :)
iknowjerome OP t1_ivu165k wrote
Reply to comment by Mozillah0096 in [R] A relabelling of the COCO 2017 dataset by iknowjerome
It really depends on what you are trying to achieve, what your budget is, and where you are in your model development cycle.
Nevertheless, I would recommend starting in self-service mode with the simplest tool you can find. This might be something like CVAT, though there are a number of other options (paid, free, SaaS, etc.) out there that a simple google search will return. Once you're ready to scale, you might want to consider handing off your annotations to specialized company like Sama. And yes, we also do 3D annotations. :)
(disclaimer: I work for Sama)
iknowjerome OP t1_ivtytgc wrote
Reply to comment by DigThatData in [R] A relabelling of the COCO 2017 dataset by iknowjerome
That's a great suggestion. We will eventually post more detail about this. It will make more sense when divulged at the same time we report on the results of some data quality experiments we are currently running. Stay tuned! :)
iknowjerome OP t1_ivtw0xs wrote
Reply to comment by that_username__taken in [R] A relabelling of the COCO 2017 dataset by iknowjerome
The trick is not to wait for the end of the cycle to make the appropriate adjustments. And there are now a number of solutions on the market that help with understanding and visualizing your image/video data and labels.
iknowjerome OP t1_ivtti9x wrote
Reply to comment by that_username__taken in [R] A relabelling of the COCO 2017 dataset by iknowjerome
Every dataset has errors and inconsistencies. It is true that some have more than others, but what really matters is how that affects the end goal. Sometimes, the level of inconsistencies doesn't impact model performance as much as one would expect. In other cases, it is the main cause of a poor model performance, at least in one area (for instance, for a specific set of classes). I totally agree with you that companies that succeed in putting and maintaining AI models in production pay particular attention to the quality of the datasets that are created for training and testing purposes.
iknowjerome OP t1_ivylty6 wrote
Reply to comment by adamzhangchao in [R] A relabelling of the COCO 2017 dataset by iknowjerome
We had our annotating associates look at every image and label and correct any mistakes they would find. In some cases, they would start from existing annotations. In other cases, they would decide to start from scratch.