Skip to content

[c++] add path_smooth_hessian parameter for hessian-based path smoothing#7242

Open
amqadmiakur8 wants to merge 1 commit intolightgbm-org:masterfrom
amqadmiakur8:feature/path-smooth-hessian
Open

[c++] add path_smooth_hessian parameter for hessian-based path smoothing#7242
amqadmiakur8 wants to merge 1 commit intolightgbm-org:masterfrom
amqadmiakur8:feature/path-smooth-hessian

Conversation

@amqadmiakur8
Copy link
Copy Markdown

@amqadmiakur8 amqadmiakur8 commented Apr 22, 2026

Summary

Adds a new path_smooth_hessian parameter — a hessian-based variant of path_smooth (implemented in #2950) that is more appropriate when samples have different weights.

Motivation

The existing path_smooth uses sample counts as the smoothing weight. This works well for unweighted data, but when samples have different weights (e.g. via the weight column), sample count does not reflect the actual importance of the data in each leaf. A leaf with 10 high-weight samples and a leaf with 10 low-weight samples get the same smoothing, even though they carry very different amounts of information.

path_smooth_hessian uses the sum of Hessians instead, which naturally incorporates sample weights (since h_i = w_i * h_i_unweighted). This makes the smoothing weight-aware: leaves with more weighted evidence are trusted more, leaves with less are pulled harder toward the parent.

Theory

The current path_smooth looks like an implementation of Bülmann's credibility, which assumes all rows have the same exposure/weight.

An extension of it is the Bühlmann-Straub credibility, which uses weights rather than raw counts when observations have different weights. However, we implement it using the Hessian instead to stay consistent with the existing min_data_in_leaf vs min_sum_hessian_in_leaf, and because it would require more changes to the code (weights aren't available when smoothing is applied).

Smoothing formula

Same structure as path_smooth, with h (sum of hessians) replacing n (sample count):

w_L = w*_L * (h / α) / (h / α + 1) + w_parent / (h / α + 1)

where alpha = path_smooth_hessian, h = sum of hessians in the leaf, w*_L = unsmoothed leaf output, w_parent = parent's smoothed output.

Note: the current implementation of path_smooth actually uses a hessian-based approximation of n_samples (via RoundInt(bin_hessian * num_data / leaf_sum_hessian)), not the true sample count. This
means path_smooth is already between its stated definition and path_smooth_hessian.

Results

I tested this change on some of my datasets, with Poisson, Gamma, and Logistic Losses, and it seems to either perform better or as good as path_smooth, depending on the dataset.
Here is an example of the performance on a Poisson dataset with heterogeneous weights. The smoothing ranges are centered around the optimal one (found empirically), and results are cross-validated to reduce the noise.

image

I also compared it with min_data_in_leaf and min_sum_hessian_in_leaf (which are called mcs and mcw on the graph). And it seems to improve path_smooth the same way as min_sum_hessian_in_leaf improves min_data_in_leaf.

image

Also, out of 8 business cases I tested, soft smoothing methods (path_smooth and path_smooth_hessian) outperformed hard smoothing methods (min_data_in_leaf and min_sum_hessian_in_leaf) on 6 of them, while hard methods had a slight edge on 2 of them (the two smallest datasets I had, 4k and 30k rows). It is one of the reasons that made me want to use soft smoothing methods rather than hard smoothing ones.

Summary of changes

  • Separate parameter (path_smooth_hessian, double, default 0) rather than a boolean flag on path_smooth, following the min_data_in_leaf / min_sum_hessian_in_leaf pattern.
  • Same dimension as min_sum_hessian_in_leaf.
  • Mutually exclusive with path_smooth: if both are set, path_smooth is ignored with a warning.
  • min_data_in_leaf >= 2 guard only applies to count-based path_smooth. The hessian-based path uses sum_hessians directly from the histogram, so the rounding issue that motivated the guard does not
    apply.
  • Two helper methods on Config (effective_path_smooth(), use_hessian_smoothing()) to avoid repeated branching logic across call sites.
  • Full CUDA support, mirroring the CPU implementation.
  • new test_path_smoothing_hessian

@amqadmiakur8 amqadmiakur8 force-pushed the feature/path-smooth-hessian branch from cf17946 to 35c4808 Compare April 22, 2026 07:02
Copy link
Copy Markdown
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in LightGBM. I personally don't understand this submission, but I hope that maybe @shiyu1994 or @guolinke will be able to comment when they have time.

If they think this is a useful addition to LightGBM, I'd be happy to help with the tactical parts of getting it merge-ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants