Panorama Optimizations#5041
Conversation
| } | ||
|
|
||
| float lower_bound = exact_distances[idx] - cauchy_schwarz_bound; | ||
| if constexpr (C::is_max) { |
There was a problem hiding this comment.
Unfortunately C::cmp() kills autovectorization here so we resort to this workaround.
e9b483f to
0a4914d
Compare
| write_ivf_header(ivfp, f); | ||
| WRITE1(ivfp->n_levels); | ||
| WRITE1(ivfp->batch_size); | ||
| if (ivfp->batch_size == Panorama::kDefaultBatchSize) { |
There was a problem hiding this comment.
For backward compatibility.
| * accelerating the refinement stage. | ||
| */ | ||
| struct Panorama { | ||
| static constexpr size_t kDefaultBatchSize = 128; |
There was a problem hiding this comment.
I'm considering defining kLegacyDefaultBatchSize = 128 and kDefaultBatchSize = 1024 to update the default and have a fallback for the old indexes which were created with 128. Is such a change in default behavior allowed (IVF128,FlatPanorama8 would then silently use 1024 batch_size instead of 128)?
| } | ||
|
|
||
| template <typename Lambda> | ||
| inline auto with_bool(bool value, Lambda&& fn) { |
There was a problem hiding this comment.
I'm curious if there's a more appropriate location to define this.
|
Hello @mnorris11 , could we please get a review for this one :) It's blocking the work of other PRs we are working on. Thanks! |
sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime |
Thanks, @mnorris11, conflicts should be resolved now :) |
@aknayar It seems like 1 hopefully quick error: |
@mnorris11 Should be fixed now :) |
|
@mnorris11 has imported this pull request. If you are a Meta employee, you can view this in D101753364. |
|
@aknayar this is merging. There are a few small changes I had to make to get it to build and pass lints internally too, so you might want to double check the actual committed code in an hour or two:
|
|
@mnorris11 merged this pull request in 0622841. |
|
Thanks @mnorris11 . We will make a similar PR for HNSW in the following days that similarly reduces the overhead to become pretty much negligible. |
|
@mnorris11 Thanks, just tested the merged version locally and it seems the optimizations are still intact. |
Note: Should be merged before #4970 (IVFPQPanorama).
Changes
Performance
This PR implements various optimizations to Panorama (L2Flat and IVFFlat).
if constexpr (C::is_max)instead ofC::cmpfor autovectorized pruning._pext_u64.active_indicesindirection) to let it autovectorize.IndexFlat/IVFFlatScannerPanorama.batch_sizeas a parameter for IVFFlatPanorama (for consistency withIndexFlatPanoramabut also because 1024batch_sizecan improve performance).Other
kDefaultBatchSizeonce inPanorama.h(previously defined in 5 separate locations).bench_flat_l2_panorama.pyandbench_ivf_flat_panorama.pyto acceptgist1Morsift1Mas dataset to bench on.Results
Together, these optimizations enable powerful additional speedups, especially on lower-dimensional datasets like SIFT (128d), by dramatically minimizing Panorama's overhead:
GIST1M (IVF128, nlist=128, nlevels=16)
SIFT1M (IVF128, nlist=128, nlevels=8)
Raw Data
Collected by running the new benches on
mainand this branch. On main, you cannot specifybatch_sizeso remove the{1024}from the factory string in the new benches to run them there. The results above are calculated from the following raw data as follows:nprobe((original ms per query) / (pano ms per query))Before (
main)GIST1M:
SIFT1M:
After (
optimize-pano)GIST1M:
SIFT1M: