Viewing a single comment thread. View all comments

BreakingCiphers t1_j32mj4h wrote

For every image in your database, you could use features from the penultimate layer in a CNN, index them.

Then to search over images, simply calculate the distance between the query image features and the database features.

This can be expensive computationally and memory wise if you have a lot of images. Some solutions could be to cluster your database embeddings, use sparse matrices, use approximate KNN, add some explore-exploit heuristics (take the images with the lowest distance compared to the first 37% images in the database, this cuts down search time by up to 63%, but might not be great). There is possibly more out there in SoTA, but I am not up to date there.

19

Sepf1ns t1_j32rcxp wrote

> Some solutions could be to cluster your database embeddings, use sparse matrices, use approximate KNN, add some explore-exploit heuristics

Pretty sure Faiss can help with that

Edit:

I'd recommend this Course to anyone who wants to try it out.

6

BreakingCiphers t1_j32sw4k wrote

Looks cool, thanks for pointing it out

2

DanTycoon t1_j37aaxd wrote

I've used Faiss before to retrieve similar images based on CLIP embeddings (so I could do text-to-image searches). It works okay, but it doesn't order the results very well. It had 'favorite' images it preferred returning over everything else. So, for my use case, I found Faiss worked best as a good first-pass tool as opposed to a complete solution here.

If you do this approach, I would recommend asking Faiss to retrieve a few more images than you need, then calculating cosine similarity yourself on the images Faiss retrieves to get the 'best' matched images.

Edit: Also this was the tutorial I followed to get Faiss working. I found it pretty easy to follow and adapt to CLIP.

1

PHEEEEELLLLLEEEEP t1_j3mirpm wrote

>If you do this approach, I would recommend asking Faiss to retrieve a few more images than you need, then calculating cosine similarity yourself on the images Faiss retrieves to get the 'best' matched images.

Why not just index by cosine distance in the first place?

1

DanTycoon t1_j3mlz40 wrote

Well, if you're storing 1 million images in the database, it's going to take a long time to do the cosine distance for all 1 million images. FAISS will give you very roughly the 1000 nearest and you can do the cosine distance from there. My usage was anybody could enter any text phrase and search my dataset. I can't precompute the cosine distance for every query somebody might make.

1

PHEEEEELLLLLEEEEP t1_j3mtoyy wrote

What I mean is that faiss can compute knn for a variety of metrics including cosine distance. So you can just directly index by cosine distance instead of L2

1

DanTycoon t1_j3mx169 wrote

Ah, I see. I didn’t know. I guess you could do it that way.

1

[deleted] t1_j33ai5r wrote

[deleted]

1

Sepf1ns t1_j33qv9r wrote

Yeah, I think so. However I don't know if it can scale as well as faiss

1

BrotherAmazing t1_j34t0cm wrote

Depends if they want to match nearly exact images or match images that are just similar in visual appearance to a human. If it is the latter, then the distances in these later layers need not be close for similar images. A popular example of this is adversarial images.

2