Skip to content

SIMD optimization RaBitQ#4515

Closed
alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom
alibeklfc:export-D79301607
Closed

SIMD optimization RaBitQ#4515
alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom
alibeklfc:export-D79301607

Conversation

@alibeklfc
Copy link
Copy Markdown
Contributor

Summary:
This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

  1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.

  2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

  3. AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

  4. Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607

@meta-cla meta-cla Bot added the CLA Signed label Aug 7, 2025
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D79301607

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Aug 7, 2025
Summary:

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.
    
3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.
    
4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D79301607

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Aug 7, 2025
Summary:
Pull Request resolved: facebookresearch#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607
@alexanderguzhva
Copy link
Copy Markdown
Contributor

looks good

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Aug 7, 2025
Summary:

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D79301607

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Aug 7, 2025
Summary:
Pull Request resolved: facebookresearch#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607
alibeklfc added a commit to alibeklfc/faiss that referenced this pull request Aug 7, 2025
Summary:

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D79301607

Summary:

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Reviewed By: mnorris11

Differential Revision: D79301607
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D79301607

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in 7af24e5.

samanthawaters8882michaeldonovan added a commit to samanthawaters8882michaeldonovan/faiss that referenced this pull request Oct 12, 2025
Summary:
Pull Request resolved: facebookresearch/faiss#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Reviewed By: mnorris11

Differential Revision: D79301607

fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025
dimitraseferiadi pushed a commit to dimitraseferiadi/SuCo that referenced this pull request Mar 8, 2026
Summary:
Pull Request resolved: facebookresearch#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Reviewed By: mnorris11

Differential Revision: D79301607

fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025
dimitraseferiadi pushed a commit to dimitraseferiadi/SuCo that referenced this pull request Mar 16, 2026
Summary:
Pull Request resolved: facebookresearch#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Reviewed By: mnorris11

Differential Revision: D79301607

fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants