Skip to content

Add contrib intdiv: fast integer division by invariant scalars using multiplication#2875

Open
abhishek-iitmadras wants to merge 1 commit intogoogle:masterfrom
abhishek-iitmadras:abhishekk_intdiv
Open

Add contrib intdiv: fast integer division by invariant scalars using multiplication#2875
abhishek-iitmadras wants to merge 1 commit intogoogle:masterfrom
abhishek-iitmadras:abhishekk_intdiv

Conversation

@abhishek-iitmadras
Copy link
Copy Markdown

This change adds a contrib module implementing fast integer division by invariant (loop-constant) divisors using multiplication and shifts, following Granlund & Montgomery, “Division by Invariant Integers Using Multiplication” (PLDI 1994).

  • Supports all scalar lane widths and signs:

    • Unsigned: uint8_t, uint16_t, uint32_t, uint64_t
    • Signed: int8_t, int16_t, int32_t, int64_t
  • This contrib module provides general-purpose, cross-architecture implementation of division by invariant scalars using multiplication, suitable for vectorized code built on Highway. It mirrors the GM(Algo) scheme and is conceptually similar to the integer SIMD division intrinsics used in NumPy’s npyv_intdiv, but expressed purely in Highway’s portable SIMD API.

Copy link
Copy Markdown
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of comments :) Some influence others, so please read them all before addressing any.

*
* We split the work into two steps:
* 1) Precompute parameters from the scalar divisor (multiplier + shifts).
* DivisorParams{U,S}<T> ComputeDivisorParams(T divisor);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse any of the existing logic from base.h Divisor[64]?

Copy link
Copy Markdown
Author

@abhishek-iitmadras abhishek-iitmadras Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked base.h Divisor / Divisor64 which solve different scalar-only problem using a simpler reciprocal-multiply scheme. We don't thinks , we can use logic here , i mean that this contrib/intdiv implements GM invariant-division algorithm, including the correction step, signed handling, and edge cases which in turn supports vector lane division and separate logic for signed vs unsigned and 8/16-bit widened multiply path and vector MulHigh and target-specific scalar fallback for 64-bit lane so might be direct reuse of existing logic is unlikely to fit here.

Correct me if i am wrong.

};

template <>
struct MulType<uint8_t> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use existing functionality: base.h also has

template <>
struct Relations<uint8_t> {
  using Unsigned = uint8_t;
  using Signed = int8_t;
  using Wide = uint16_t;
};

etc, so we could use MulType = Relations::Wide.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried but it partially applicable but not a direct replacement as mapping diverge for 32-bit and 64-bit

return HWY_NAMESPACE::ComputeDivisorParams<T>(d);
}

template <typename T, HWY_IF_T_SIZE(T, 1), HWY_IF_SIGNED(T)>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than SFINAE for T size 1..8, we can HWY_IF_T_SIZE_ONE_OF(T, (1 << 1) | (1 << 2) | (1 << 4) | (1 << 8)), or better yet, just static_assert IsPow2(sizeof(T)) within one function.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok got it , now collapse repeated size-based SFINAE overloads into single signed and unsigned overload using static_assert for supported integer sizes

return HWY_NAMESPACE::ComputeDivisorParams<T>(d);
}

template <class D, class V = VecD<D>, typename T = TFromD_<D>, HWY_IF_UNSIGNED_D(D)>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also we could static_assert(IsUnsigned()) inside the function, given that you have a DivisorParamsU argument.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here also

if constexpr (sizeof(T) <= 4) {
return static_cast<T>(Random32(&rng));
} else {
const uint64_t hi = Random32(&rng);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a Random64().

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done , i have used Random64() now

}

template <typename T>
bool IsPow2(T x) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already defined in intdiv-inl.h?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from here

Signed-off-by: Abhishek Kumar <abhishek.r.kumar@fujitsu.com>
@abhishek-iitmadras abhishek-iitmadras marked this pull request as ready for review March 16, 2026 16:13

template <>
struct MultiplierType<uint32_t> {
using type = uint32_t;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use using type = If<(sizeof(T) < 4), Relations::Wide, T>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants