SIMD optimization RaBitQ#4515
Closed
alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom
Closed
Conversation
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D79301607 |
2c3f45f to
49bc911
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Aug 7, 2025
Summary:
This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:
1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.
3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.
4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.
Differential Revision: D79301607
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D79301607 |
49bc911 to
f6d81f6
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Aug 7, 2025
Summary: Pull Request resolved: facebookresearch#4515 This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations: 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data. 2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data. 3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing. 4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support. Differential Revision: D79301607
Contributor
|
looks good |
f6d81f6 to
7a26eae
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Aug 7, 2025
Summary: This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations: 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data. 2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data. 3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing. 4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support. Differential Revision: D79301607
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D79301607 |
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Aug 7, 2025
Summary: Pull Request resolved: facebookresearch#4515 This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations: 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data. 2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data. 3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing. 4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support. Differential Revision: D79301607
7a26eae to
bbdce36
Compare
alibeklfc
added a commit
to alibeklfc/faiss
that referenced
this pull request
Aug 7, 2025
Summary: This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations: 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data. 2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data. 3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing. 4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support. Differential Revision: D79301607
bbdce36 to
4ec4fe6
Compare
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D79301607 |
Summary: This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations: 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data. 2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data. 3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing. 4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support. Reviewed By: mnorris11 Differential Revision: D79301607
4ec4fe6 to
7f675f5
Compare
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D79301607 |
Contributor
|
This pull request has been merged in 7af24e5. |
samanthawaters8882michaeldonovan
added a commit
to samanthawaters8882michaeldonovan/faiss
that referenced
this pull request
Oct 12, 2025
Summary: Pull Request resolved: facebookresearch/faiss#4515 This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations: 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data. 2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data. 3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing. 4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support. Reviewed By: mnorris11 Differential Revision: D79301607 fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025
dimitraseferiadi
pushed a commit
to dimitraseferiadi/SuCo
that referenced
this pull request
Mar 8, 2026
Summary: Pull Request resolved: facebookresearch#4515 This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations: 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data. 2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data. 3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing. 4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support. Reviewed By: mnorris11 Differential Revision: D79301607 fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025
dimitraseferiadi
pushed a commit
to dimitraseferiadi/SuCo
that referenced
this pull request
Mar 16, 2026
Summary: Pull Request resolved: facebookresearch#4515 This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations: 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data. 2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data. 3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing. 4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support. Reviewed By: mnorris11 Differential Revision: D79301607 fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:
AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.
AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.
Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.
Differential Revision: D79301607