Add database-parallel flat search for few-query workloads by ivan-digital · Pull Request #5000 · facebookresearch/faiss

ivan-digital · 2026-03-27T21:07:11Z

Summary

When the number of queries (nx) is smaller than the available thread count, the current flat search underutilizes CPU cores — each thread scans the entire database for a single query. This PR adds a database-parallel code path that divides the database across threads instead, with per-thread BLAS calls and heap merging.

Activates automatically when nx < omp_get_max_threads() and the database is large enough
Falls back to the existing path when an IDSelector is active or the database is too small to benefit
Works for both METRIC_INNER_PRODUCT (CMin) and METRIC_L2 (CMax)

Benchmarks

12 threads, d=128, k=10:

Index	ny	nx	Before	After	Speedup
FlatIP	1,000,000	4	56.7 ms	8.4 ms	6.7x
FlatIP	1,000,000	1	14.1 ms	5.0 ms	2.8x
FlatL2	1,000,000	4	56.7 ms	14.0 ms	4.0x
FlatL2	1,000,000	1	14.1 ms	9.5 ms	1.5x
FlatIP	200,000	4	11.5 ms	1.6 ms	7.0x
FlatL2	200,000	4	11.4 ms	2.8 ms	4.2x

No regression for nx >= nthreads (existing path is unchanged).

Motivation

Addresses #4121. The original issue demonstrated that for few-query workloads over large databases, parallelizing over database segments instead of queries can yield multi-fold speedups. This is especially relevant for serving workloads where queries arrive one at a time.

Test plan

All existing TestIndexFlat tests pass (15/15)
Full test_index.py passes (47/47 including 9 new tests)
test_search_params.py + test_index_composite.py pass (81/81)
New TestDbParallelSearch covers: IP, L2, k=1, k=200, single query, few queries, thread scaling consistency
Cross-validated against NumPy brute-force for IP and L2

When the number of queries (nx) is small relative to the thread count, the existing query-parallel approach underutilizes CPU cores — each thread processes one query against the entire database. This change adds a database-parallel code path that instead gives each thread a disjoint slice of the database to scan, using single-threaded BLAS and per-thread heaps that are merged after the parallel region. The new path activates automatically when: - nx < omp_get_max_threads() - ny >= max(10000, nthreads * blas_database_bs) - No IDSelector is active Benchmarks (12 threads, d=128): IndexFlatIP, ny=1M, nx=4: 56.7ms → 8.4ms (6.7x) IndexFlatIP, ny=1M, nx=1: 14.1ms → 5.0ms (2.8x) IndexFlatL2, ny=1M, nx=4: 56.7ms → 14.0ms (4.0x) IndexFlatL2, ny=1M, nx=1: 14.1ms → 9.5ms (1.5x) Addresses facebookresearch#4121.

meta-cla · 2026-03-27T21:07:17Z

Hi @ivan-digital!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

meta-cla · 2026-03-27T21:13:04Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

- Move heap initialization inside the parallel region so each thread only initializes its own heaps - Parallelize the merge phase over queries with omp parallel for - Trim verbose comments

ivan-digital marked this pull request as draft March 27, 2026 21:08

ivan-digital marked this pull request as ready for review March 27, 2026 21:10

ivan-digital marked this pull request as draft March 27, 2026 21:13

meta-cla bot added the CLA Signed label Mar 27, 2026

Optimize db-parallel impl: parallel heap init and merge

c26d7e0

- Move heap initialization inside the parallel region so each thread only initializes its own heaps - Parallelize the merge phase over queries with omp parallel for - Trim verbose comments

ivan-digital marked this pull request as ready for review March 28, 2026 05:42

mnorris11 added enhancement Implementation labels Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add database-parallel flat search for few-query workloads#5000

Add database-parallel flat search for few-query workloads#5000
ivan-digital wants to merge 2 commits intofacebookresearch:mainfrom
ivan-digital:fix/flat-search-parallelization

ivan-digital commented Mar 27, 2026

Uh oh!

meta-cla bot commented Mar 27, 2026

Uh oh!

meta-cla bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ivan-digital commented Mar 27, 2026

Summary

Benchmarks

Motivation

Test plan

Uh oh!

meta-cla bot commented Mar 27, 2026

Action Required

Process

Uh oh!

meta-cla bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants