-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Are we undercounting RAM used by vectors buffered in an in-memory segment? #15901
Description
I'm looking at a fun segment trace of a nearly pure flat (no HNSW graph) vectors Lucene index (production index for Amazon product search) and see weirdly that the flushed segments show ~112% RAM efficiency.
RAM efficiency measures size-on-disk of the newly flushed segment against size-in-ram before it was flushed. For postings heavy indices it's usually poor, maybe 20-30%, because in-RAM postings storage has overhead because they need to grow/realloc as new documents are indexed, appending integers to tons of tiny int[] virtual slices (one or two or three per unique term in the document being inverted). On disk those become well packed.
But for flat vectors, I think we just buffer the vectors in RAM until flush, so the RAM accounting ought to be simple?
This is with 7-bit quantization (aside: why do we now have signed 7 bit, and unsigned 8 bit, KnnVectorsFormat options? is there really a difference? oh hmm the javadocs says its for backwards compatibility ... shouldn't it (7 bit) be marked deprecated (new Lucene indices shouldn't choose it)?), and @msokolov pointed out we increase index size to store the one-byte-per-dimension quantized vectors in addition to the float32 vectors. But then I would have expected 125% RAM efficiency? So I'm confused why I see halfway (~112%)...
