jake_1001001

jake_1001001 t1_isoot8f wrote

Aha, ok, use of segmentations to extract the object point cloud seems good and I have used similar approach for face reconstruction l.

Have you tried 3D approachs (ridgid and non rigid alignment)? How similar are the objects? you could use the dense alignment error to determine if the object is the same as a streight one.

But if we go back to image based methods, if your segmentation model is good, it may provide good embeddings already in the encoder. You could take those embeddings and compute thier distance to the embeddings of templates (straight, bent, etc). Kmeans may not cluster as you expect if there is a high variance in samples (shape, size, color, etc), which is why supervised methods could be preferred. Templates provide a prototype for your class to compute distance/similarity to (Euclidean, cosine similarity) . It is crude, but could work in constrained settings.

2

jake_1001001 t1_iso7foq wrote

Do you have a labeled dataset for training? (Bent or straight)

Why use segmentation? Please clarify the task definition, it is currently quite vague. Plaine object detection should be adaquate for cropping your object as most DL frameworks take rectangular inputs, but this may be unnecessary depending on your dataset and input. If you are worried about background, you shouldn't and such information may be important for the model to determine relative shape or will just be considard noise if your training set is large enough and matches your expected input distribution.

For embeddings, you could use a pretrained contrastive supervised image encoder like vit.

Clustering can be done by training a linear classifier with a CE loss with bent or straight labeled images via finetuning, linear probing, or domain adaptation (adapter or retrain the norms). The loss will find the class centroids for you and provide a nice porbability output. Of course you could train a Kmeans classifier if youd like on the embeddings instead.

1