perf(encoding): varint encoding - use manual vectorization/simd in decodeSingleByteRun (#579) by srsuryadev · Pull Request #579 · facebookincubator/nimble

srsuryadev · 2026-03-18T06:36:09Z

Summary:

Replace scalar byte expansion and reinterpret_cast-based uint64_t loads in
decodeSingleByteRun with xsimd-based operations.

Reviewed By: xiaoxmeng

Differential Revision: D96628007

…Width, MainlyConstant for faster iteration for SST workload Summary: Add v2 encoding scaffoldings for the Varint, RLE, FixedBitWidth, and MainlyConstant for faster iteration or perf tuning Differential Revision: D96684714

Summary: Add `decodeSingleByteRun` fast path to `bulkVarintDecode32` and `bulkVarintDecode64` that processes leading runs of single-byte varints (values 0-127) using 8-byte word reads before falling through to the BMI2 switch-based decoder. For each 8-byte word where no continuation bits are set (`word & 0x8080808080808080 == 0`), all 8 varints are decoded with simple shifts, avoiding the `_pext_u64` and 64-case switch overhead. This is placed in the caller functions rather than inside `bulkVarintDecodeBmi2` to preserve the BMI2 function's code layout and icache behavior for mixed-width data. Benchmark results (1M elements, mode/opt): | Scenario | Before | After | Speedup | |-----------------------|-----------|-----------|-----------| | 1-byte (32-bit) | 465us | 260us | 1.79x | | 5-byte (32-bit) | slower | 1.22ms | fixed | | 3-byte (32-bit) | 1.04ms | 864us | 1.20x | | 4-byte (32-bit) | 1.50ms | 1.04ms | 1.44x | | 64-bit 1-byte | 294us | 232us | 1.27x | | batch1024 | 1.96us | 1.20us | 1.63x | | Uniform/2-byte/8-byte | unchanged | unchanged | no regress| Also enhances the varint benchmark with fixed byte-width benchmarks (1-5 byte for 32-bit, 1/4/8 byte for 64-bit), skip benchmarks, and batch size benchmarks. Differential Revision: D96617939

… single-byte varints Summary: Manually loop-unroll `decodeSingleByteRun` with a 3-tier approach: 1. 32-element (4-word) unrolled loop with combined high-bit check `(w0 | w1 | w2 | w3) & kHighBits` to minimize branch overhead 2. 8-element (1-word) loop for smaller runs 3. Single-element trailing loop to pick up individual single-byte varints before multi-byte values Also extracts the byte-expansion logic into a reusable `expandWord()` helper for clarity. Differential Revision: D96619597

meta-codesync · 2026-03-18T06:36:42Z

@srsuryadev has exported this pull request. If you are a Meta employee, you can view the originating Diff in D96628007.

…codeSingleByteRun (#579) Summary: Pull Request resolved: #579 Replace scalar byte expansion and reinterpret_cast-based uint64_t loads in decodeSingleByteRun with xsimd-based operations. Reviewed By: xiaoxmeng Differential Revision: D96628007

srsuryadev added 3 commits March 15, 2026 22:32

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 18, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 18, 2026

meta-codesync bot changed the title ~~perf(encoding): varint encoding - use manual vectorization/simd in decodeSingleByteRun~~ perf(encoding): varint encoding - use manual vectorization/simd in decodeSingleByteRun (#579) Mar 18, 2026

srsuryadev force-pushed the export-D96628007 branch 2 times, most recently from 892911b to 82e9336 Compare March 19, 2026 17:24

srsuryadev force-pushed the export-D96628007 branch from 82e9336 to 272ecf1 Compare March 19, 2026 22:33

srsuryadev force-pushed the export-D96628007 branch from 272ecf1 to e059552 Compare March 19, 2026 22:38

srsuryadev force-pushed the export-D96628007 branch from e059552 to aae125c Compare March 20, 2026 03:39

srsuryadev force-pushed the export-D96628007 branch from aae125c to 5fb223c Compare March 20, 2026 03:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(encoding): varint encoding - use manual vectorization/simd in decodeSingleByteRun (#579)#579

perf(encoding): varint encoding - use manual vectorization/simd in decodeSingleByteRun (#579)#579
srsuryadev wants to merge 4 commits intomainfrom
export-D96628007

srsuryadev commented Mar 18, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

meta-codesync bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

srsuryadev commented Mar 18, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

srsuryadev commented Mar 18, 2026 •

edited by meta-codesync bot

Loading