[cuda] fix half-size allocation of discretized gradient buffer by BelixRogner · Pull Request #7254 · lightgbm-org/LightGBM

BelixRogner · 2026-05-03T16:27:34Z

Summary

CUDAGradientDiscretizer::Init resizes discretized_gradients_and_hessians_ (a CUDAVector<int8_t>) to num_data * 2 elements. The DiscretizeGradientsKernel writes a pair of int16 values per data row (gradient + hessian) at byte offsets 4*index and 4*index+2 — i.e. it needs num_data * 4 bytes, not 2.

The current allocation is half the required size; the kernel writes past the end of the buffer for the upper half of the data.

Fix

Resize(num_data * 4) so the buffer holds the full int16 pairs without overrunning. No effect when use_quantized_grad is false. One line.

Reproducer

Any device='cuda' training with use_quantized_grad=True on more than a few million rows. compute-sanitizer flags it as Invalid __global__ write of size 2 bytes ... is N bytes after the nearest allocation at cuda_gradient_discretizer.cu:115.

Test plan

Build with -DUSE_CUDA=1 on sm_120 (RTX 5090).
Run quantized-grad training on 6.7M rows — no compute-sanitizer writes-past-end after the fix.

CUDAGradientDiscretizer::Init resizes discretized_gradients_and_hessians_ (a CUDAVector<int8_t>) to num_data * 2 elements. The DiscretizeGradientsKernel writes a *pair* of int16 values per data row (gradient + hessian) at offsets 4*index and 4*index+2 — i.e. it needs num_data * 4 bytes, not 2. The current allocation is half the required size; the kernel writes past the end of the buffer for the upper half of the data. compute-sanitizer flags this as an Invalid __global__ write of size 2 bytes for any `use_quantized_grad=true` run on >~3M rows on a 32-bit-aligned device. Resize to num_data * 4 so the buffer holds the full int16 pairs without overrunning. No effect when use_quantized_grad is false. Signed-off-by: Felix Jonas Kroner <fksnake@gmail.com>

jameslamb

@shiyu1994 could you review this one and see if it makes sense?

Could this help with #6703 ?

BelixRogner requested review from StrikerRUS, borchero, guolinke, jameslamb, jmoralez, mayer79 and shiyu1994 as code owners May 3, 2026 16:27

jameslamb added fix gpu (CUDA) Issue is related to the CUDA GPU variant. labels May 3, 2026

jameslamb assigned jameslamb and shiyu1994 and unassigned jameslamb May 3, 2026

jameslamb reviewed May 3, 2026

View reviewed changes

Merge branch 'master' into fix-cuda-discretizer-buffer-size

ba4fd02

jameslamb added the awaiting review label May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuda] fix half-size allocation of discretized gradient buffer#7254

[cuda] fix half-size allocation of discretized gradient buffer#7254
BelixRogner wants to merge 2 commits intolightgbm-org:masterfrom
BelixRogner:fix-cuda-discretizer-buffer-size

BelixRogner commented May 3, 2026

Uh oh!

jameslamb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BelixRogner commented May 3, 2026

Summary

Fix

Reproducer

Test plan

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants