DanTycoon

DanTycoon t1_j3mlz40 wrote

Well, if you're storing 1 million images in the database, it's going to take a long time to do the cosine distance for all 1 million images. FAISS will give you very roughly the 1000 nearest and you can do the cosine distance from there. My usage was anybody could enter any text phrase and search my dataset. I can't precompute the cosine distance for every query somebody might make.

1

DanTycoon t1_j37aaxd wrote

I've used Faiss before to retrieve similar images based on CLIP embeddings (so I could do text-to-image searches). It works okay, but it doesn't order the results very well. It had 'favorite' images it preferred returning over everything else. So, for my use case, I found Faiss worked best as a good first-pass tool as opposed to a complete solution here.

If you do this approach, I would recommend asking Faiss to retrieve a few more images than you need, then calculating cosine similarity yourself on the images Faiss retrieves to get the 'best' matched images.

Edit: Also this was the tutorial I followed to get Faiss working. I found it pretty easy to follow and adapt to CLIP.

1