Just_CurioussSss t1_izi8rhr wrote on December 9, 2022 at 8:13 AM

In your article, you mentioned that "The search is mainly based on a v0 semantic algorithm (using TfIdf model mainly).... So the usage was pretty slow and the models were heavy (not the best user experience)."

Quick question: Have you heard of tensor search? It uses 2 key algorithms: CLIP and SBERT, where every components of the tensor can be associated with specific parts of a document, image, or video. Not only can this improve search semantics, but it can provide other key information like localization and explainability, without using text as an intermediate representation.

You can look them up: https://github.com/marqo-ai/marqo
Website: https://www.marqo.ai

Just_CurioussSss t1_izi8sim wrote on December 9, 2022 at 8:13 AM

Also, TFIDF is lexical/algorithmic search (aka keywords-based search). It's faster, but has a lower accuracy and relevance outputs than tensor-based search. On the other hand, Marqo, with tensor-based search (where you can get the vectors from SBERT for example), allows semantic search by understanding the meaning of the text, rather than the keywords. Thus, users can search with questions, related terms or with images, audio or videos directly (or any combination thereof), allowing a better user experience and better relevant search yields.

Cyalas OP t1_izihswz wrote on December 9, 2022 at 10:23 AM

Thanks for your comments :)

I've used tensor-based search before using Faiss Index and finetuned bert models (it's still in the code). As I mentioned in my article, that slowed down a bit the process since, each time a field is chosen, the bert model is loaded and took about 4 seconds more. That's why I switched to TF IDF. But I plan to optimize the tensor-search part more (I'll check Marqot!), hopefully with the help of the open source community :)