jake_1001001 t1_iso7foq wrote on October 17, 2022 at 1:32 PM

Do you have a labeled dataset for training? (Bent or straight)

Why use segmentation? Please clarify the task definition, it is currently quite vague. Plaine object detection should be adaquate for cropping your object as most DL frameworks take rectangular inputs, but this may be unnecessary depending on your dataset and input. If you are worried about background, you shouldn't and such information may be important for the model to determine relative shape or will just be considard noise if your training set is large enough and matches your expected input distribution.

For embeddings, you could use a pretrained contrastive supervised image encoder like vit.

Clustering can be done by training a linear classifier with a CE loss with bent or straight labeled images via finetuning, linear probing, or domain adaptation (adapter or retrain the norms). The loss will find the class centroids for you and provide a nice porbability output. Of course you could train a Kmeans classifier if youd like on the embeddings instead.

vocdex OP t1_isofmvp wrote on October 17, 2022 at 2:35 PM

Thanks for suggestions.

I don't have a labeled dataset but I can create one, for sure. The object here is asparagus in a greenhouse farm.

Here's the situation: I am using segmentation because in the future, I want to use this segmentation with depth maps to create point clouds. I have tried to do so with only bounding box detections but due to the presence of background and foreground pixels (different depth image values), I am getting quite bad point clouds. Then, I applied simple depth value based filter to crop out only the object without any back/foreground. This works but doesn't generalize well to all situations.

I thought that instance segmentation would give me only the object pixels and I can fuse this with depth values in order to get point clouds.

Moreover, there could be different clusters other than bent vs straight. So, I want the clustering algorithm to find those clusters in an unsupervised fashion. If this doesn't work, then yes, I guess I'll have to create a dataset and train another bent vs straight classification model.

Thanks for reading till here!

jake_1001001 t1_isoot8f wrote on October 17, 2022 at 3:43 PM

Aha, ok, use of segmentations to extract the object point cloud seems good and I have used similar approach for face reconstruction l.

Have you tried 3D approachs (ridgid and non rigid alignment)? How similar are the objects? you could use the dense alignment error to determine if the object is the same as a streight one.

But if we go back to image based methods, if your segmentation model is good, it may provide good embeddings already in the encoder. You could take those embeddings and compute thier distance to the embeddings of templates (straight, bent, etc). Kmeans may not cluster as you expect if there is a high variance in samples (shape, size, color, etc), which is why supervised methods could be preferred. Templates provide a prototype for your class to compute distance/similarity to (Euclidean, cosine similarity) . It is crude, but could work in constrained settings.

vocdex OP t1_isrjl7v wrote on October 18, 2022 at 4:16 AM

Ah, haven't considered 3D approaches but definitely check them out. Objects are quite similar (green color, just the shape is different). Thank you for your help