Skip to content

Panorama Optimizations#5041

Closed
aknayar wants to merge 27 commits intofacebookresearch:mainfrom
aknayar:optimize-pano
Closed

Panorama Optimizations#5041
aknayar wants to merge 27 commits intofacebookresearch:mainfrom
aknayar:optimize-pano

Conversation

@aknayar
Copy link
Copy Markdown
Contributor

@aknayar aknayar commented Apr 3, 2026

Note: Should be merged before #4970 (IVFPQPanorama).

Changes

Performance

This PR implements various optimizations to Panorama (L2Flat and IVFFlat).

  1. Disaggregate distance computation from pruning decisions to avoid branches in distance computation hotpath.
  2. Early batch processing termination when no points are remaining.
  3. Manually unrolled distance kernel.
  4. Template distance computation on level width for autovectorization.
  5. if constexpr (C::is_max) instead of C::cmp for autovectorized pruning.
  6. Byteset for vectorized compacting of active indices using _pext_u64.
  7. Template distance computation and pruning on first level (no active_indices indirection) to let it autovectorize.
  8. Hoist buffer allocations into IndexFlat/IVFFlatScannerPanorama.
  9. Expose batch_size as a parameter for IVFFlatPanorama (for consistency with IndexFlatPanorama but also because 1024 batch_size can improve performance).

Other

  • Define kDefaultBatchSize once in Panorama.h (previously defined in 5 separate locations).
  • Allow bench_flat_l2_panorama.py and bench_ivf_flat_panorama.py to accept gist1M or sift1M as dataset to bench on.

Results

Together, these optimizations enable powerful additional speedups, especially on lower-dimensional datasets like SIFT (128d), by dramatically minimizing Panorama's overhead:

GIST1M (IVF128, nlist=128, nlevels=16)

nprobe Recall@10 Old Speedup New Speedup Additional Speedup
1 0.1439 3.92x 3.93x 1.00x
2 0.2605 4.71x 5.19x 1.10x
4 0.4369 5.53x 6.75x 1.22x
8 0.6470 6.37x 8.21x 1.29x
16 0.8780 7.30x 9.74x 1.33x
32 0.9764 8.33x 11.29x 1.36x
64 0.9868 9.38x 12.74x 1.36x

SIFT1M (IVF128, nlist=128, nlevels=8)

nprobe Recall@10 Old Speedup New Speedup Additional Speedup
1 0.2678 1.20x 1.81x 1.52x
2 0.4584 1.38x 2.23x 1.62x
4 0.6855 1.59x 2.70x 1.70x
8 0.8760 1.83x 3.44x 1.88x
16 0.9679 2.11x 4.72x 2.24x
32 0.9855 2.44x 5.61x 2.30x
64 0.9861 2.74x 6.39x 2.33x

Raw Data

Collected by running the new benches on main and this branch. On main, you cannot specify batch_size so remove the {1024} from the factory string in the new benches to run them there. The results above are calculated from the following raw data as follows:

  1. For each experiment (i.e., GIST (old) or SIFT (new), calculate the Panorama speedups for each nprobe ((original ms per query) / (pano ms per query))
  2. For each pairing of (old) and (new) results, calculate the additional speedup by calculating (new speedup) / (old speedup).

Before (main)

GIST1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.145200, speed: 2.705442 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.260800, speed: 5.456891 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.441900, speed: 10.895120 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.648200, speed: 21.676788 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.878000, speed: 43.142261 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.975400, speed: 84.498397 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986800, speed: 160.092644 ms/query, dims scanned: 100.00%
======PCA960,IVF128,FlatPanorama16
	nprobe   1, Recall@10: 0.143900, speed: 0.689507 ms/query, dims scanned: 12.96%
	nprobe   2, Recall@10: 0.260500, speed: 1.158416 ms/query, dims scanned: 11.18%
	nprobe   4, Recall@10: 0.436900, speed: 1.968814 ms/query, dims scanned: 9.90%
	nprobe   8, Recall@10: 0.647000, speed: 3.401469 ms/query, dims scanned: 8.91%
	nprobe  16, Recall@10: 0.878000, speed: 5.912757 ms/query, dims scanned: 8.10%
	nprobe  32, Recall@10: 0.976400, speed: 10.147847 ms/query, dims scanned: 7.44%
	nprobe  64, Recall@10: 0.986800, speed: 17.074573 ms/query, dims scanned: 6.93%

SIFT1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.267480, speed: 0.285990 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.457520, speed: 0.564067 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.685320, speed: 1.111833 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.877210, speed: 2.195088 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.967730, speed: 4.338444 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.985400, speed: 8.500538 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986100, speed: 16.349893 ms/query, dims scanned: 100.00%
======PCA128,IVF128,FlatPanorama8
	nprobe   1, Recall@10: 0.267670, speed: 0.239243 ms/query, dims scanned: 27.97%
	nprobe   2, Recall@10: 0.458320, speed: 0.408590 ms/query, dims scanned: 24.42%
	nprobe   4, Recall@10: 0.685480, speed: 0.699694 ms/query, dims scanned: 21.50%
	nprobe   8, Recall@10: 0.875930, speed: 1.197310 ms/query, dims scanned: 19.06%
	nprobe  16, Recall@10: 0.967760, speed: 2.055968 ms/query, dims scanned: 16.98%
	nprobe  32, Recall@10: 0.985370, speed: 3.481555 ms/query, dims scanned: 15.26%
	nprobe  64, Recall@10: 0.985980, speed: 5.977346 ms/query, dims scanned: 14.02%

After (optimize-pano)

GIST1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.145200, speed: 2.625779 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.260800, speed: 5.285007 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.441900, speed: 10.555867 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.648200, speed: 21.012494 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.878000, speed: 41.794143 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.975400, speed: 81.865038 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986800, speed: 155.067333 ms/query, dims scanned: 100.00%
======PCA960,IVF128,FlatPanorama16_1024
	nprobe   1, Recall@10: 0.143900, speed: 0.668800 ms/query, dims scanned: 20.33%
	nprobe   2, Recall@10: 0.260500, speed: 1.018440 ms/query, dims scanned: 14.81%
	nprobe   4, Recall@10: 0.436900, speed: 1.563622 ms/query, dims scanned: 11.72%
	nprobe   8, Recall@10: 0.647000, speed: 2.557981 ms/query, dims scanned: 9.82%
	nprobe  16, Recall@10: 0.878000, speed: 4.292616 ms/query, dims scanned: 8.56%
	nprobe  32, Recall@10: 0.976400, speed: 7.248832 ms/query, dims scanned: 7.68%
	nprobe  64, Recall@10: 0.986800, speed: 12.171319 ms/query, dims scanned: 7.06%

SIFT1M:

======IVF128,Flat
        nprobe   1, Recall@10: 0.267480, speed: 0.295904 ms/query, dims scanned: 100.00%
        nprobe   2, Recall@10: 0.457520, speed: 0.583204 ms/query, dims scanned: 100.00%
        nprobe   4, Recall@10: 0.685320, speed: 1.150055 ms/query, dims scanned: 100.00%
        nprobe   8, Recall@10: 0.877210, speed: 2.425575 ms/query, dims scanned: 100.00%
        nprobe  16, Recall@10: 0.967730, speed: 5.509365 ms/query, dims scanned: 100.00%
        nprobe  32, Recall@10: 0.985400, speed: 10.794491 ms/query, dims scanned: 100.00%
        nprobe  64, Recall@10: 0.986100, speed: 20.727924 ms/query, dims scanned: 100.00%
======PCA128,IVF128,FlatPanorama8_1024
        nprobe   1, Recall@10: 0.267750, speed: 0.163266 ms/query, dims scanned: 34.97%
        nprobe   2, Recall@10: 0.458370, speed: 0.261109 ms/query, dims scanned: 27.97%
        nprobe   4, Recall@10: 0.685540, speed: 0.425977 ms/query, dims scanned: 23.30%
        nprobe   8, Recall@10: 0.875990, speed: 0.704580 ms/query, dims scanned: 19.98%
        nprobe  16, Recall@10: 0.967860, speed: 1.167465 ms/query, dims scanned: 17.45%
        nprobe  32, Recall@10: 0.985470, speed: 1.925296 ms/query, dims scanned: 15.50%
        nprobe  64, Recall@10: 0.986080, speed: 3.245793 ms/query, dims scanned: 14.14%

@meta-cla meta-cla Bot added the CLA Signed label Apr 3, 2026
@aknayar aknayar marked this pull request as draft April 3, 2026 22:43
Comment thread faiss/impl/Panorama.h
}

float lower_bound = exact_distances[idx] - cauchy_schwarz_bound;
if constexpr (C::is_max) {
Copy link
Copy Markdown
Contributor Author

@aknayar aknayar Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately C::cmp() kills autovectorization here so we resort to this workaround.

@aknayar aknayar force-pushed the optimize-pano branch 2 times, most recently from e9b483f to 0a4914d Compare April 4, 2026 18:52
write_ivf_header(ivfp, f);
WRITE1(ivfp->n_levels);
WRITE1(ivfp->batch_size);
if (ivfp->batch_size == Panorama::kDefaultBatchSize) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility.

@aknayar aknayar marked this pull request as ready for review April 4, 2026 19:34
Comment thread faiss/impl/Panorama.h
* accelerating the refinement stage.
*/
struct Panorama {
static constexpr size_t kDefaultBatchSize = 128;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering defining kLegacyDefaultBatchSize = 128 and kDefaultBatchSize = 1024 to update the default and have a fallback for the old indexes which were created with 128. Is such a change in default behavior allowed (IVF128,FlatPanorama8 would then silently use 1024 batch_size instead of 128)?

Comment thread faiss/impl/Panorama.h
}

template <typename Lambda>
inline auto with_bool(bool value, Lambda&& fn) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if there's a more appropriate location to define this.

Comment thread faiss/CMakeLists.txt
@AlSchlo
Copy link
Copy Markdown
Contributor

AlSchlo commented Apr 20, 2026

Hello @mnorris11 , could we please get a review for this one :)

It's blocking the work of other PRs we are working on. Thanks!

@mnorris11
Copy link
Copy Markdown
Contributor

mnorris11 commented Apr 20, 2026

Hello @mnorris11 , could we please get a review for this one :)

It's blocking the work of other PRs we are working on. Thanks!

sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime
Edit: seems like I cannot import the PR into our internal codebase for tests until conflicts are resolved.

@aknayar
Copy link
Copy Markdown
Contributor Author

aknayar commented Apr 21, 2026

Hello @mnorris11 , could we please get a review for this one :)
It's blocking the work of other PRs we are working on. Thanks!

sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime Edit: seems like I cannot import the PR into our internal codebase for tests until conflicts are resolved.

Thanks, @mnorris11, conflicts should be resolved now :)

@mnorris11
Copy link
Copy Markdown
Contributor

mnorris11 commented Apr 21, 2026

Hello @mnorris11 , could we please get a review for this one :)
It's blocking the work of other PRs we are working on. Thanks!

sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime Edit: seems like I cannot import the PR into our internal codebase for tests until conflicts are resolved.

Thanks, @mnorris11, conflicts should be resolved now :)

@aknayar It seems like 1 hopefully quick error:

/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp: In function 'std::unique_ptr<faiss::InvertedLists> faiss::read_InvertedLists_up(IOReader*, int)':
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:528:62: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  528 |                     ((sizes[i] + ArrayInvertedListsPanorama::kBatchSize - 1) /
      |                                                              ^~~~~~~~~~
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:529:50: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  529 |                      ArrayInvertedListsPanorama::kBatchSize) *
      |                                                  ^~~~~~~~~~
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:530:49: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  530 |                     ArrayInvertedListsPanorama::kBatchSize;
      |                                                 ^~~~~~~~~~

@aknayar
Copy link
Copy Markdown
Contributor Author

aknayar commented Apr 21, 2026

Hello @mnorris11 , could we please get a review for this one :)
It's blocking the work of other PRs we are working on. Thanks!

sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime Edit: seems like I cannot import the PR into our internal codebase for tests until conflicts are resolved.

Thanks, @mnorris11, conflicts should be resolved now :)

@aknayar It seems like 1 hopefully quick error:

/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp: In function 'std::unique_ptr<faiss::InvertedLists> faiss::read_InvertedLists_up(IOReader*, int)':
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:528:62: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  528 |                     ((sizes[i] + ArrayInvertedListsPanorama::kBatchSize - 1) /
      |                                                              ^~~~~~~~~~
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:529:50: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  529 |                      ArrayInvertedListsPanorama::kBatchSize) *
      |                                                  ^~~~~~~~~~
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:530:49: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  530 |                     ArrayInvertedListsPanorama::kBatchSize;
      |                                                 ^~~~~~~~~~

@mnorris11 Should be fixed now :)

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 21, 2026

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this in D101753364.

@mnorris11
Copy link
Copy Markdown
Contributor

@aknayar this is merging. There are a few small changes I had to make to get it to build and pass lints internally too, so you might want to double check the actual committed code in an hour or two:

  • there is an additional check for COMPILE_SIMD_AVX2
  • static inline size_t compact_active_kernel( -> inline size_t compact_active_kernel(

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 22, 2026

@mnorris11 merged this pull request in 0622841.

@AlSchlo
Copy link
Copy Markdown
Contributor

AlSchlo commented Apr 22, 2026

Thanks @mnorris11 . We will make a similar PR for HNSW in the following days that similarly reduces the overhead to become pretty much negligible.

@aknayar
Copy link
Copy Markdown
Contributor Author

aknayar commented Apr 22, 2026

@mnorris11 Thanks, just tested the merged version locally and it seems the optimizations are still intact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants