mostlyhydrogen

mostlyhydrogen OP t1_j7km5j2 wrote

Thanks for the offer! This is a work project, though. I'm working with images. I can't give too many details due to confidentiality, but we're sub-billion images scale.

Usability is determined by trained annotators. If they find an object of interest and want to harvest more training data, they do a reverse image search across the whole training data and tag true matches.

1

mostlyhydrogen OP t1_j7fxwyx wrote

>ScaNN interface features

Nope. Notice that the results have shape (10000, 20) instead of (20,). That is just doing a batched query i.e. "for each of these 10k input vectors, find me 20 neighbors". What I need is a joint query, i.e. "given these 10k positive examples, give me an additional 20 candidate samples".

2

mostlyhydrogen OP t1_j7238p8 wrote

As you probably know, ANN search often returns irrelevant data. How might I iteratively refine the search with human feedback: marking samples as "relevant" or "irrelevant" and repeating the search.

I've done a lit search and haven't found anything, maybe because I am using the wrong keywords.

1