Hi all,

I have an e-commerce product data. It contains product description and product type. I’m using embeddings with ANN (annoy) to find similar products. However, I don’t know how to implement evaluation of vector search results. There are some metrics such as hit rate, recall but like I said above I’m confused to use them. Most of the examples I come across has a label (interaction data, explicit score etc.) therefore they can calculate metrics. Any ideas or recommendations will be appreciated!

Comments

You must log in or register to comment.

vwings t1_j5gowwn wrote on January 22, 2023 at 9:54 PM

For such retrieval systems, you would usually use Top-1, Top-5 or Top-something accuracy. Concretely, you have a list of product types (embeddings) in your database (let's say 100 or 100,000 whatever). Then you get your product description, you embed it with your ANN and then you compare it with all product type embs. then you check on which rank the correct product type ends up. From that you can calculate mean rank or top-k accuracy ..

silverstone1903 OP t1_j5iw6bx wrote on January 23, 2023 at 8:58 AM

Makes sense, I'll try. Thanks!

Original_Rip_8182 t1_j5i0ol5 wrote on January 23, 2023 at 3:21 AM

For top-k product search you could also follow this: Index all product embeddings through faiss. To get a top match for a given product, take it's embedding & query it with built faiss index, you'll get top-k matches from it. This is way faster than brute force comparision between each pair.

Faiss: https://github.com/facebookresearch/faiss

silverstone1903 OP t1_j5ivdmx wrote on January 23, 2023 at 8:47 AM

Thank you for your answer. What is the difference from using annoy? I'm experimenting with annoy, faiss, and hsnw. The performance is not the thing just because I can't measure the quality of retrievals 🤷🏻‍♂️

Kacper-Lukawski t1_j5itq5h wrote on January 23, 2023 at 8:24 AM

You need some ground truth labels to evaluate the quality of the semantic search. It might be a relevancy score or just binary information that a particular item is relevant. But you don't need to label all our data points.

There is a great article describing the metrics: https://neptune.ai/blog/recommender-systems-metrics I use that as a reference quite often. And if you are interested in a more step-by-step introduction, here is an article I wrote: https://qdrant.tech/articles/qa-with-cohere-and-qdrant/ It's an end-to-end solution, but some basic quality measurement is also included.

silverstone1903 OP t1_j5iw1wl wrote on January 23, 2023 at 8:56 AM

Thanks for the links. I'll check them out asap.