Panorama Optimizations by aknayar · Pull Request #5041 · facebookresearch/faiss

aknayar · 2026-04-03T22:43:07Z

Note: Should be merged before #4970 (IVFPQPanorama).

Changes

Performance

This PR implements various optimizations to Panorama (L2Flat and IVFFlat).

Disaggregate distance computation from pruning decisions to avoid branches in distance computation hotpath.
Early batch processing termination when no points are remaining.
Manually unrolled distance kernel.
Template distance computation on level width for autovectorization.
if constexpr (C::is_max) instead of C::cmp for autovectorized pruning.
Byteset for vectorized compacting of active indices using _pext_u64.
Template distance computation and pruning on first level (no active_indices indirection) to let it autovectorize.
Hoist buffer allocations into IndexFlat/IVFFlatScannerPanorama.
Expose batch_size as a parameter for IVFFlatPanorama (for consistency with IndexFlatPanorama but also because 1024 batch_size can improve performance).

Other

Define kDefaultBatchSize once in Panorama.h (previously defined in 5 separate locations).
Allow bench_flat_l2_panorama.py and bench_ivf_flat_panorama.py to accept gist1M or sift1M as dataset to bench on.

Results

Together, these optimizations enable powerful additional speedups, especially on lower-dimensional datasets like SIFT (128d), by dramatically minimizing Panorama's overhead:

GIST1M (IVF128, nlist=128, nlevels=16)

nprobe	Recall@10	Old Speedup	New Speedup	Additional Speedup
1	0.1439	3.92x	3.93x	1.00x
2	0.2605	4.71x	5.19x	1.10x
4	0.4369	5.53x	6.75x	1.22x
8	0.6470	6.37x	8.21x	1.29x
16	0.8780	7.30x	9.74x	1.33x
32	0.9764	8.33x	11.29x	1.36x
64	0.9868	9.38x	12.74x	1.36x

SIFT1M (IVF128, nlist=128, nlevels=8)

nprobe	Recall@10	Old Speedup	New Speedup	Additional Speedup
1	0.2678	1.20x	1.81x	1.52x
2	0.4584	1.38x	2.23x	1.62x
4	0.6855	1.59x	2.70x	1.70x
8	0.8760	1.83x	3.44x	1.88x
16	0.9679	2.11x	4.72x	2.24x
32	0.9855	2.44x	5.61x	2.30x
64	0.9861	2.74x	6.39x	2.33x

Raw Data

Collected by running the new benches on main and this branch. On main, you cannot specify batch_size so remove the {1024} from the factory string in the new benches to run them there. The results above are calculated from the following raw data as follows:

For each experiment (i.e., GIST (old) or SIFT (new), calculate the Panorama speedups for each nprobe ((original ms per query) / (pano ms per query))
For each pairing of (old) and (new) results, calculate the additional speedup by calculating (new speedup) / (old speedup).

Before (`main`)

GIST1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.145200, speed: 2.705442 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.260800, speed: 5.456891 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.441900, speed: 10.895120 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.648200, speed: 21.676788 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.878000, speed: 43.142261 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.975400, speed: 84.498397 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986800, speed: 160.092644 ms/query, dims scanned: 100.00%
======PCA960,IVF128,FlatPanorama16
	nprobe   1, Recall@10: 0.143900, speed: 0.689507 ms/query, dims scanned: 12.96%
	nprobe   2, Recall@10: 0.260500, speed: 1.158416 ms/query, dims scanned: 11.18%
	nprobe   4, Recall@10: 0.436900, speed: 1.968814 ms/query, dims scanned: 9.90%
	nprobe   8, Recall@10: 0.647000, speed: 3.401469 ms/query, dims scanned: 8.91%
	nprobe  16, Recall@10: 0.878000, speed: 5.912757 ms/query, dims scanned: 8.10%
	nprobe  32, Recall@10: 0.976400, speed: 10.147847 ms/query, dims scanned: 7.44%
	nprobe  64, Recall@10: 0.986800, speed: 17.074573 ms/query, dims scanned: 6.93%

SIFT1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.267480, speed: 0.285990 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.457520, speed: 0.564067 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.685320, speed: 1.111833 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.877210, speed: 2.195088 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.967730, speed: 4.338444 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.985400, speed: 8.500538 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986100, speed: 16.349893 ms/query, dims scanned: 100.00%
======PCA128,IVF128,FlatPanorama8
	nprobe   1, Recall@10: 0.267670, speed: 0.239243 ms/query, dims scanned: 27.97%
	nprobe   2, Recall@10: 0.458320, speed: 0.408590 ms/query, dims scanned: 24.42%
	nprobe   4, Recall@10: 0.685480, speed: 0.699694 ms/query, dims scanned: 21.50%
	nprobe   8, Recall@10: 0.875930, speed: 1.197310 ms/query, dims scanned: 19.06%
	nprobe  16, Recall@10: 0.967760, speed: 2.055968 ms/query, dims scanned: 16.98%
	nprobe  32, Recall@10: 0.985370, speed: 3.481555 ms/query, dims scanned: 15.26%
	nprobe  64, Recall@10: 0.985980, speed: 5.977346 ms/query, dims scanned: 14.02%

After (`optimize-pano`)

GIST1M:

======IVF128,Flat
	nprobe   1, Recall@10: 0.145200, speed: 2.625779 ms/query, dims scanned: 100.00%
	nprobe   2, Recall@10: 0.260800, speed: 5.285007 ms/query, dims scanned: 100.00%
	nprobe   4, Recall@10: 0.441900, speed: 10.555867 ms/query, dims scanned: 100.00%
	nprobe   8, Recall@10: 0.648200, speed: 21.012494 ms/query, dims scanned: 100.00%
	nprobe  16, Recall@10: 0.878000, speed: 41.794143 ms/query, dims scanned: 100.00%
	nprobe  32, Recall@10: 0.975400, speed: 81.865038 ms/query, dims scanned: 100.00%
	nprobe  64, Recall@10: 0.986800, speed: 155.067333 ms/query, dims scanned: 100.00%
======PCA960,IVF128,FlatPanorama16_1024
	nprobe   1, Recall@10: 0.143900, speed: 0.668800 ms/query, dims scanned: 20.33%
	nprobe   2, Recall@10: 0.260500, speed: 1.018440 ms/query, dims scanned: 14.81%
	nprobe   4, Recall@10: 0.436900, speed: 1.563622 ms/query, dims scanned: 11.72%
	nprobe   8, Recall@10: 0.647000, speed: 2.557981 ms/query, dims scanned: 9.82%
	nprobe  16, Recall@10: 0.878000, speed: 4.292616 ms/query, dims scanned: 8.56%
	nprobe  32, Recall@10: 0.976400, speed: 7.248832 ms/query, dims scanned: 7.68%
	nprobe  64, Recall@10: 0.986800, speed: 12.171319 ms/query, dims scanned: 7.06%

SIFT1M:

======IVF128,Flat
        nprobe   1, Recall@10: 0.267480, speed: 0.295904 ms/query, dims scanned: 100.00%
        nprobe   2, Recall@10: 0.457520, speed: 0.583204 ms/query, dims scanned: 100.00%
        nprobe   4, Recall@10: 0.685320, speed: 1.150055 ms/query, dims scanned: 100.00%
        nprobe   8, Recall@10: 0.877210, speed: 2.425575 ms/query, dims scanned: 100.00%
        nprobe  16, Recall@10: 0.967730, speed: 5.509365 ms/query, dims scanned: 100.00%
        nprobe  32, Recall@10: 0.985400, speed: 10.794491 ms/query, dims scanned: 100.00%
        nprobe  64, Recall@10: 0.986100, speed: 20.727924 ms/query, dims scanned: 100.00%
======PCA128,IVF128,FlatPanorama8_1024
        nprobe   1, Recall@10: 0.267750, speed: 0.163266 ms/query, dims scanned: 34.97%
        nprobe   2, Recall@10: 0.458370, speed: 0.261109 ms/query, dims scanned: 27.97%
        nprobe   4, Recall@10: 0.685540, speed: 0.425977 ms/query, dims scanned: 23.30%
        nprobe   8, Recall@10: 0.875990, speed: 0.704580 ms/query, dims scanned: 19.98%
        nprobe  16, Recall@10: 0.967860, speed: 1.167465 ms/query, dims scanned: 17.45%
        nprobe  32, Recall@10: 0.985470, speed: 1.925296 ms/query, dims scanned: 15.50%
        nprobe  64, Recall@10: 0.986080, speed: 3.245793 ms/query, dims scanned: 14.14%

aknayar · 2026-04-04T06:03:42Z

+        }
+
+        float lower_bound = exact_distances[idx] - cauchy_schwarz_bound;
+        if constexpr (C::is_max) {


Unfortunately C::cmp() kills autovectorization here so we resort to this workaround.

aknayar · 2026-04-04T19:14:05Z

-        write_ivf_header(ivfp, f);
-        WRITE1(ivfp->n_levels);
-        WRITE1(ivfp->batch_size);
+        if (ivfp->batch_size == Panorama::kDefaultBatchSize) {


For backward compatibility.

aknayar · 2026-04-04T19:51:09Z

 * accelerating the refinement stage.
 */
 struct Panorama {
+    static constexpr size_t kDefaultBatchSize = 128;


I'm considering defining kLegacyDefaultBatchSize = 128 and kDefaultBatchSize = 1024 to update the default and have a fallback for the old indexes which were created with 128. Is such a change in default behavior allowed (IVF128,FlatPanorama8 would then silently use 1024 batch_size instead of 128)?

aknayar · 2026-04-04T19:58:05Z

+}
+
+template <typename Lambda>
+inline auto with_bool(bool value, Lambda&& fn) {


I'm curious if there's a more appropriate location to define this.

AlSchlo · 2026-04-20T17:41:26Z

Hello @mnorris11 , could we please get a review for this one :)

It's blocking the work of other PRs we are working on. Thanks!

mnorris11 · 2026-04-20T19:41:25Z

Hello @mnorris11 , could we please get a review for this one :)

It's blocking the work of other PRs we are working on. Thanks!

sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime
Edit: seems like I cannot import the PR into our internal codebase for tests until conflicts are resolved.

aknayar · 2026-04-21T04:06:46Z

Hello @mnorris11 , could we please get a review for this one :)
It's blocking the work of other PRs we are working on. Thanks!

sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime Edit: seems like I cannot import the PR into our internal codebase for tests until conflicts are resolved.

Thanks, @mnorris11, conflicts should be resolved now :)

mnorris11 · 2026-04-21T04:17:12Z

Hello @mnorris11 , could we please get a review for this one :)
It's blocking the work of other PRs we are working on. Thanks!

sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime Edit: seems like I cannot import the PR into our internal codebase for tests until conflicts are resolved.

Thanks, @mnorris11, conflicts should be resolved now :)

@aknayar It seems like 1 hopefully quick error:

/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp: In function 'std::unique_ptr<faiss::InvertedLists> faiss::read_InvertedLists_up(IOReader*, int)':
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:528:62: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  528 |                     ((sizes[i] + ArrayInvertedListsPanorama::kBatchSize - 1) /
      |                                                              ^~~~~~~~~~
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:529:50: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  529 |                      ArrayInvertedListsPanorama::kBatchSize) *
      |                                                  ^~~~~~~~~~
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:530:49: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  530 |                     ArrayInvertedListsPanorama::kBatchSize;
      |                                                 ^~~~~~~~~~

aknayar · 2026-04-21T04:38:50Z

Hello @mnorris11 , could we please get a review for this one :)
It's blocking the work of other PRs we are working on. Thanks!

sorry for delay. There were several recent changes to index_read, would you be able to resolve conflicts? I will review in meantime Edit: seems like I cannot import the PR into our internal codebase for tests until conflicts are resolved.

Thanks, @mnorris11, conflicts should be resolved now :)

@aknayar It seems like 1 hopefully quick error:
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp: In function 'std::unique_ptr<faiss::InvertedLists> faiss::read_InvertedLists_up(IOReader*, int)':
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:528:62: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  528 |                     ((sizes[i] + ArrayInvertedListsPanorama::kBatchSize - 1) /
      |                                                              ^~~~~~~~~~
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:529:50: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  529 |                      ArrayInvertedListsPanorama::kBatchSize) *
      |                                                  ^~~~~~~~~~
/home/runner/work/faiss/faiss/faiss/impl/index_read.cpp:530:49: error: 'kBatchSize' is not a member of 'faiss::ArrayInvertedListsPanorama'
  530 |                     ArrayInvertedListsPanorama::kBatchSize;
      |                                                 ^~~~~~~~~~

@mnorris11 Should be fixed now :)

meta-codesync · 2026-04-21T04:47:27Z

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this in D101753364.

mnorris11 · 2026-04-22T17:30:41Z

@aknayar this is merging. There are a few small changes I had to make to get it to build and pass lints internally too, so you might want to double check the actual committed code in an hour or two:

there is an additional check for COMPILE_SIMD_AVX2
static inline size_t compact_active_kernel( -> inline size_t compact_active_kernel(

meta-codesync · 2026-04-22T17:50:33Z

@mnorris11 merged this pull request in 0622841.

AlSchlo · 2026-04-22T20:24:35Z

Thanks @mnorris11 . We will make a similar PR for HNSW in the following days that similarly reduces the overhead to become pretty much negligible.

aknayar · 2026-04-22T20:29:05Z

@mnorris11 Thanks, just tested the merged version locally and it seems the optimizations are still intact.

meta-cla Bot added the CLA Signed label Apr 3, 2026

aknayar marked this pull request as draft April 3, 2026 22:43

aknayar commented Apr 4, 2026

View reviewed changes

aknayar force-pushed the optimize-pano branch 2 times, most recently from e9b483f to 0a4914d Compare April 4, 2026 18:52

aknayar commented Apr 4, 2026

View reviewed changes

aknayar marked this pull request as ready for review April 4, 2026 19:34

aknayar commented Apr 4, 2026

View reviewed changes

Comment thread faiss/CMakeLists.txt

aknayar mentioned this pull request Apr 7, 2026

Implement Panorama into IndexIVFPQPanorama #4970

Open

aknayar added 17 commits April 21, 2026 04:04

Rebase

999f469

Correct dims scanned calculation

d1a4b52

Original batch sizes

08e82e2

Rebase

1c2db72

Remove

d6f2f6d

More widths

9c1dacf

Begin cleanup

8f6375c

Format

36c2782

Fix build

8417207

Inline function

a066454

Simplify

2faa017

Define default batch size once

e835336

Simpler SIMD

5ab2c60

Remove comment

f0d879e

Remove comment

0e99a3c

Fix windows build

cedf8c0

Backward compatibility

786d137

aknayar added 9 commits April 21, 2026 04:06

Remove extra include

4febabb

Fix windows build (pt. 2)

ebca953

Fix windows build (pt. 2)

6214b57

Update guard

269ab36

Choose nlevels in bench

7dee2d1

Add comments

8d7c875

Rename

9bbb1e5

Another comment

2b77811

Add flags

af8b07d

aknayar force-pushed the optimize-pano branch from 3f33328 to af8b07d Compare April 21, 2026 04:06

fix

4369195

meta-codesync Bot closed this in 0622841 Apr 22, 2026

facebook-github-tools Bot added the Merged label Apr 22, 2026

Conversation

aknayar commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Performance

Other

Results

Raw Data

Before (main)

After (optimize-pano)

Uh oh!

aknayar Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aknayar Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

aknayar Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

aknayar Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlSchlo commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mnorris11 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aknayar commented Apr 21, 2026

Uh oh!

mnorris11 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aknayar commented Apr 21, 2026

Uh oh!

meta-codesync Bot commented Apr 21, 2026

Uh oh!

mnorris11 commented Apr 22, 2026

Uh oh!

meta-codesync Bot commented Apr 22, 2026

Uh oh!

AlSchlo commented Apr 22, 2026

Uh oh!

aknayar commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aknayar commented Apr 3, 2026 •

edited

Loading

Before (`main`)

After (`optimize-pano`)

aknayar Apr 4, 2026 •

edited

Loading

AlSchlo commented Apr 20, 2026 •

edited

Loading

mnorris11 commented Apr 20, 2026 •

edited

Loading

mnorris11 commented Apr 21, 2026 •

edited

Loading