SIMD optimization RaBitQ#4515

Closed

alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom

alibeklfc:export-D79301607

Contributor

alibeklfc commented Aug 7, 2025

Summary:
This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.
AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.
Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607

meta-cla Bot added the CLA Signed label

Contributor

facebook-github-bot commented Aug 7, 2025

This pull request was exported from Phabricator. Differential Revision: D79301607

facebook-github-bot added the fb-exported label

alibeklfc force-pushed the export-D79301607 branch from 2c3f45f to 49bc911 Compare

August 7, 2025 17:20

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request


          SIMD optimization RaBitQ (facebookresearch#4515)

49bc911

Summary:

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.
    
3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.
    
4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607

Contributor

facebook-github-bot commented Aug 7, 2025

This pull request was exported from Phabricator. Differential Revision: D79301607

alibeklfc force-pushed the export-D79301607 branch from 49bc911 to f6d81f6 Compare

August 7, 2025 17:24

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request


          SIMD optimization RaBitQ (facebookresearch#4515)

f6d81f6

Summary:
Pull Request resolved: facebookresearch#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607

Contributor

alexanderguzhva commented Aug 7, 2025

looks good

alibeklfc force-pushed the export-D79301607 branch from f6d81f6 to 7a26eae Compare

August 7, 2025 21:03

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request


          SIMD optimization RaBitQ (facebookresearch#4515)

7a26eae

Summary:

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607

Contributor

facebook-github-bot commented Aug 7, 2025

This pull request was exported from Phabricator. Differential Revision: D79301607

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request


          SIMD optimization RaBitQ (facebookresearch#4515)

bbdce36

Summary:
Pull Request resolved: facebookresearch#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607

alibeklfc force-pushed the export-D79301607 branch from 7a26eae to bbdce36 Compare

August 7, 2025 21:10

alibeklfc added a commit to alibeklfc/faiss that referenced this pull request


          SIMD optimization RaBitQ (facebookresearch#4515)

4ec4fe6

Summary:

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Differential Revision: D79301607

alibeklfc force-pushed the export-D79301607 branch from bbdce36 to 4ec4fe6 Compare

August 7, 2025 21:23

Contributor

facebook-github-bot commented Aug 7, 2025

This pull request was exported from Phabricator. Differential Revision: D79301607


          SIMD optimization RaBitQ (facebookresearch#4515)

7f675f5

Summary:

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Reviewed By: mnorris11

Differential Revision: D79301607

alibeklfc force-pushed the export-D79301607 branch from 4ec4fe6 to 7f675f5 Compare

August 8, 2025 17:04

Contributor

facebook-github-bot commented Aug 8, 2025

This pull request was exported from Phabricator. Differential Revision: D79301607

facebook-github-bot closed this in

7af24e5

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Aug 8, 2025

This pull request has been merged in 7af24e5.

samanthawaters8882michaeldonovan added a commit to samanthawaters8882michaeldonovan/faiss that referenced this pull request


          SIMD optimization RaBitQ (#4515)

10b90e2

Summary:
Pull Request resolved: facebookresearch/faiss#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Reviewed By: mnorris11

Differential Revision: D79301607

fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025

dimitraseferiadi pushed a commit to dimitraseferiadi/SuCo that referenced this pull request


          SIMD optimization RaBitQ (facebookresearch#4515)

4bc5422

Summary:
Pull Request resolved: facebookresearch#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Reviewed By: mnorris11

Differential Revision: D79301607

fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025

dimitraseferiadi pushed a commit to dimitraseferiadi/SuCo that referenced this pull request


          SIMD optimization RaBitQ (facebookresearch#4515)

17df715

Summary:
Pull Request resolved: facebookresearch#4515

This diff introduces a new file rabitq_simd.h with multiple SIMD-optimized implementations of the dot product calculation using population count (popcnt) operations:

 1. AVX-512 implementation with AVX512VPOPCNTDQ: Processes data in 512-bit (64-byte) chunks using dedicated AVX-512 popcnt instructions, with fallbacks to smaller vector sizes for remaining data.
2. AVX-512 fallback implementation without AVX512VPOPCNTDQ: Uses AVX512F instructions with a lookup-based popcount method for 512-bit vectors, falling back to smaller vectors for remaining data.

3.  AVX2 implementation: Uses a lookup-based popcount method with 256-bit (32-byte) AVX2 instructions, handling leftovers with 128-bit SSE operations and scalar processing.

4.  Scalar fallback: Processes data in 64-bit chunks with builtin popcount operations for systems without SIMD support.

Reviewed By: mnorris11

Differential Revision: D79301607

fbshipit-source-id: 4a5277c333ef75aaa14734b59bbe65b986cae025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported Merged