Description
Spinoff from #15836
A KNN query short-circuits the HNSW search if the "expected" number of nodes visited is >= number of filtered nodes.
A similarity-based vector query (i.e. [Byte|Float]VectorSimilarityQuery) attempts to find all vectors with a score above a threshold (for Euclidean similarity, this can be imagined as all vectors within a radius of the query vector).
Assuming document vectors are evenly spread out across the n-dimensional space, should vector similarity scores form a normal distribution?
If so, can we estimate the proportion of nodes visited using area under the curve (from resultSimilarity -> ∞) of a normal distribution? (and apply the same short circuit logic)