Add LightGBM dtreeviz tree visualization support (#346)

oegedijk · web-flow · commit 4efff76ab42e · 2026-02-08T17:39:46.000+01:00
* Add LightGBM dtreeviz dashboard support (closes #118) * Document LightGBM decision-tree visualization support
diff --git a/README.md b/README.md
@@ -104,7 +104,7 @@ The library includes:
 - *Permutation importances* (how much does the model metric deteriorate when you shuffle a feature?)
 - *Partial dependence plots* (how does the model prediction change when you vary a single feature?
 - *Shap interaction values* (decompose the shap value into a direct effect an interaction effects)
-- For Random Forests and xgboost models: visualisation of individual decision trees
+- For Random Forest, XGBoost, and LightGBM models: visualisation of individual decision trees
 - Plus for classifiers: precision plots, confusion matrix, ROC AUC plot, PR AUC plot, etc
 - For regression models: goodness-of-fit plots, residual plots, etc.
 
diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
@@ -12,6 +12,8 @@
 - Fix XGBoost multiclass decision-path summary wording to display `prediction (logodds)` when explainer `model_output='logodds'`.
 - Fix issue #256: add robust multiclass probability fallback for classifiers that expose `decision_function` but not `predict_proba` (e.g. `LinearSVC`), and use it consistently across kernel SHAP, prediction helpers, PDP, and permutation scorer paths.
 - Prevent multiclass class-count mismatches when user-provided/broken `predict_proba` outputs do not match model class count by falling back to `decision_function`-based probabilities.
+- Fix issue #118: add LightGBM decision-tree visualization support (dtreeviz) across explainer auto-detection, tree plotting, and decision-path rendering in dashboard tree tabs.
+- Fix dtreeviz callback rendering on macOS by switching matplotlib to a non-interactive backend for off-main-thread tree rendering to prevent dashboard 500 errors.
 
 ### Tests
 - Add regression tests for LightGBM with string categorical features covering dashboard initialization, `get_shap_row(...)`, unseen categorical values in `X_row`, and regression dashboard initialization.
@@ -22,6 +24,7 @@
 - Add explainer-method unit tests for binary-like onehot detection, transformed feature-name deduping, inferred pipeline cats, and pipeline extraction warning text.
 - Add regression tests for issue #256 covering multiclass `LinearSVC` with kernel SHAP, PDP, and permutation-importances flows using `decision_function` fallback.
 - Add guard tests to confirm multiclass `predict_proba` models (logistic regression) keep working for PDP and permutation-importances paths.
+- Add LightGBM tree-visualization regression tests (shadow trees, decision paths, plot_trees, and dtreeviz render contracts) in the boosting-model test suite.
 
 ### Improvements
 - Add pipeline feature-name cleanup options: `strip_pipeline_prefix=True` and `feature_name_fn=...` for sklearn/imblearn pipeline transformed output columns.
diff --git a/TODO.md b/TODO.md
@@ -12,22 +12,23 @@
 - [S][Hub][#146/#342] hub.to_yaml integrate_dashboard_yamls honors pickle_type and dumps integrated explainer artifacts.
 - [M][Explainers][#294] align/explain multiclass logodds between Contributions Plot and Prediction Box (+ PDP highlight and XGBoost decision path wording alignment).
 - [M][Explainers/Methods/Docs][#213] improve sklearn/imblearn pipeline support: feature-name cleanup (`strip_pipeline_prefix`, `feature_name_fn`), auto-detect onehot groups (`auto_detect_pipeline_cats`), accept binary-like scaled onehot columns in `cats`, preserve transformed index, add warnings/docs/tests.
+- [M][Explainers/Methods/Tests/Docs][#256] improve multiclass LinearSVC support/docs with decision_function probability fallback and regression coverage for SHAP/PDP/permutation flows.
+- [M][Explainers/Methods/Components/Tests][#118] add LightGBM tree visualization support (dtreeviz), including tree explainer wiring, dashboard tree tabs, and regression coverage.
 
 **Now**
-- [M][Explainers][#118] add LightGBM tree visualization support (dtreeviz).
+- [M][Dashboard][#161] more flexible instantiate_component (no explainer needed for non-ExplainerComponents).
 
 **Next**
-- [M][Dashboard][#263/#161] more flexible instantiate_component (no explainer needed for non-ExplainerComponents).
+- [M] add ExtraTrees and GradientBoostingClassifier to tree visualizers.
 
 **Backlog: Explainers**
 - [M] add plain language explanations for plots (in_words + UI toggle).
 - [S] pass n_jobs to pdp_isolate.
 - [M] add ExtraTrees and GradientBoostingClassifier to tree visualizers.
-- [M][#118] add LightGBM tree visualization support (dtreeviz).
 
 **Backlog: Dashboard**
 - [S] make poweredby right-aligned.
-- [M][#263/#161] more flexible instantiate_component (no explainer needed for non-ExplainerComponents).
+- [M][#161] more flexible instantiate_component (no explainer needed for non-ExplainerComponents).
 - [M] add TablePopout.
 - [M][#247] add EDA-style feature histograms/bar charts/correlation graphs.
 - [M/L] add cost calculator/optimizer for classifier models (confusion matrix weights, Youden J).
@@ -54,7 +55,6 @@
 - [M] support SamplingExplainer, PartitionExplainer, PermutationExplainer, AdditiveExplainer.
 - [M] support LimeTabularExplainer.
 - [M] investigate method from https://arxiv.org/abs/2006.04750.
-- [M][#256] improve multiclass LinearSVC support/docs (class-count mismatch with SHAP output).
 - [M][#229] clarify/add support path for Poisson and Gamma regression explainers.
 
 **Backlog: Plots**
diff --git a/docs/source/deployment.rst b/docs/source/deployment.rst
@@ -126,7 +126,7 @@ And you need to tell heroku how to start your server in ``Procfile``::
 Graphviz buildpack
 ------------------
 
-If you want to visualize individual trees inside your ``RandomForest`` or ``xgboost``
+If you want to visualize individual trees inside your ``RandomForest``, ``xgboost`` or ``lightgbm``
 model using the ``dtreeviz`` package you will
 need to make sure that ``graphviz`` is installed on your ``heroku`` dyno by
 adding the following buildstack (as well as the ``python`` buildpack):
diff --git a/docs/source/explainers.rst b/docs/source/explainers.rst
@@ -456,10 +456,10 @@ plot_residuals_vs_feature
 DecisionTree Plots
 ------------------
 
-There are additional mixin classes specifically for ``sklearn`` ``RandomForests``
-and for xgboost models that define additional methods and plots to investigate and visualize
-individual decision trees within the ensemblke. These
-uses the ``dtreeviz`` library to visualize individual decision trees.
+There are additional mixin classes specifically for ``sklearn`` ``RandomForests``,
+``xgboost``, and ``lightgbm`` models that define additional methods and plots to
+investigate and visualize individual decision trees within the ensemble. These
+use the ``dtreeviz`` library to visualize individual decision trees.
 
 You can get a pd.DataFrame summary of the path that a specific index row took
 through a specific decision tree.
@@ -476,9 +476,9 @@ And for dtreeviz visualization of individual decision trees (svg format)::
     explainer.decisiontree_file(tree_idx, index)
     explainer.decisiontree_encoded(tree_idx, index)
 
-These methods are part of the ``RandomForestExplainer`` and XGBExplainer`` mixin
-classes that get automatically loaded when you pass either a RandomForest
-or XGBoost model.
+These methods are part of the ``RandomForestExplainer``, ``XGBExplainer``, and
+``LGBMExplainer`` mixin classes that get automatically loaded when you pass a
+RandomForest, XGBoost, or LightGBM model.
 
 
 plot_trees
@@ -661,12 +661,12 @@ restrict candidate rows by feature values before selecting a random index::
 .. automethod:: explainerdashboard.explainers.RegressionExplainer.random_index
 
 
-RandomForest and XGBoost outputs
---------------------------------
+RandomForest, XGBoost, and LightGBM outputs
+-------------------------------------------
 
-For RandomForest and XGBoost models mixin classes that visualize individual
-decision trees will be loaded: ``RandomForestExplainer`` and ``XGBExplainer``
-with the following additional methods::
+For RandomForest, XGBoost, and LightGBM models mixin classes that visualize
+individual decision trees will be loaded: ``RandomForestExplainer``,
+``XGBExplainer``, and ``LGBMExplainer`` with the following additional methods::
 
     decisiontree_df(tree_idx, index, pos_label=None)
     decisiontree_summary_df(tree_idx, index, round=2, pos_label=None)
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -12,7 +12,8 @@ with just two lines of code.
 
 It allows you to investigate SHAP values, permutation importances,
 interaction effects, partial dependence plots, all kinds of performance plots,
-and even individual decision trees inside a random forest. With ``explainerdashboard`` any data
+and even individual decision trees inside random forest, XGBoost, and LightGBM models.
+With ``explainerdashboard`` any data
 scientist can create an interactive explainable AI web app in minutes,
 without having to know anything about web development or deployment.
 
diff --git a/explainerdashboard/dashboard_components/decisiontree_components.py b/explainerdashboard/dashboard_components/decisiontree_components.py
@@ -8,7 +8,7 @@
 from dash.exceptions import PreventUpdate
 import dash_bootstrap_components as dbc
 
-from ..explainers import RandomForestExplainer, XGBExplainer
+from ..explainers import RandomForestExplainer, XGBExplainer, LGBMExplainer
 from ..dashboard_methods import *
 from .. import to_html
 
@@ -94,12 +94,20 @@ def __init__(
         elif isinstance(self.explainer, XGBExplainer):
             if self.description is None:
                 self.description = """
-            Shows the marginal contributions of each decision tree in an 
+            Shows the marginal contributions of each decision tree in an
             xgboost ensemble to the final prediction. This demonstrates that
             an xgboost model is simply a sum of individual decision trees.
             """
             if self.subtitle == "Displaying individual decision trees":
                 self.subtitle += " inside xgboost model"
+        elif isinstance(self.explainer, LGBMExplainer):
+            if self.description is None:
+                self.description = """
+            Shows the marginal contributions of each decision tree in a
+            LightGBM ensemble to the final prediction.
+            """
+            if self.subtitle == "Displaying individual decision trees":
+                self.subtitle += " inside LightGBM model"
         else:
             if self.description is None:
                 self.description = ""
diff --git a/explainerdashboard/explainer_methods.py b/explainerdashboard/explainer_methods.py
@@ -40,6 +40,7 @@
     "get_xgboost_path_df",
     "get_xgboost_path_summary_df",
     "get_xgboost_preds_df",
+    "get_lgbm_preds_df",
     "get_multiclass_logodds_scores",
     "get_xgboost_output_label",
     "_ensure_numeric_predictions",  # Internal helper for XGBoost 3.0+ compatibility
@@ -2165,7 +2166,14 @@ def node_pred_proba(node):
     else:
 
         def node_mean(node):
-            return decision_tree.tree_model.tree_.value[node.id].item()
+            try:
+                return decision_tree.tree_model.tree_.value[node.id].item()
+            except Exception:
+                node_samples = decision_tree.get_node_samples()
+                sample_idxs = node_samples.get(node.id, [])
+                if len(sample_idxs) == 0:
+                    return np.nan
+                return float(np.asarray(decision_tree.y_train)[sample_idxs].mean())
 
         for node in nodes:
             if not node.isleaf():
@@ -2549,3 +2557,93 @@ def get_xgboost_preds_df(xgbmodel, X_row, pos_label=1):
             0, "pred_proba"
         ]
     return xgboost_preds_df
+
+
+def get_lgbm_preds_df(lgbmodel, X_row, pos_label=1):
+    """Returns cumulative per-tree predictions for a LightGBM model.
+
+    Args:
+        lgbmodel: fitted LightGBM sklearn-compatible model
+            (i.e. LGBMClassifier or LGBMRegressor)
+        X_row: a single row of data, e.g X_train.iloc[0]
+        pos_label: for classifier the label to be used as positive label
+            Defaults to 1.
+
+    Returns:
+        pd.DataFrame
+    """
+    if safe_isinstance(lgbmodel, "lightgbm.sklearn.LGBMClassifier"):
+        is_classifier = True
+        n_classes = len(lgbmodel.classes_)
+        n_trees = lgbmodel.booster_.num_trees()
+        if n_classes > 2:
+            n_trees = int(n_trees / n_classes)
+    elif safe_isinstance(lgbmodel, "lightgbm.sklearn.LGBMRegressor"):
+        is_classifier = False
+        n_trees = lgbmodel.booster_.num_trees()
+    else:
+        raise ValueError("Pass either an LGBMClassifier or LGBMRegressor!")
+
+    if is_classifier:
+        if n_classes == 2:
+            if pos_label not in (0, 1):
+                raise ValueError("pos_label should be either 0 or 1!")
+
+            margins = []
+            for i in range(1, n_trees + 1):
+                margin_raw = lgbmodel.predict(X_row, raw_score=True, num_iteration=i)[0]
+                margin_raw = _ensure_numeric_predictions(margin_raw)
+                if isinstance(margin_raw, np.ndarray):
+                    margin_raw = (
+                        margin_raw.item()
+                        if margin_raw.ndim == 0
+                        else float(margin_raw[0])
+                    )
+                margin = float(margin_raw)
+                margins.append(margin if pos_label == 1 else -margin)
+
+            pred_probas = (np.exp(margins) / (1 + np.exp(margins))).tolist()
+            base_score = 0.0
+            base_proba = 0.5
+            preds = margins
+        else:
+            if pos_label < 0 or pos_label >= n_classes:
+                raise ValueError(
+                    f"pos_label={pos_label}, but should be >= 0 and <= {n_classes - 1}!"
+                )
+            margins = []
+            for i in range(1, n_trees + 1):
+                margin_raw = lgbmodel.predict(X_row, raw_score=True, num_iteration=i)[0]
+                margin_raw = _ensure_numeric_predictions(margin_raw)
+                margin = np.asarray(margin_raw, dtype=float)
+                margins.append(margin)
+
+            preds = [float(margin[pos_label]) for margin in margins]
+            pred_probas = [
+                float((np.exp(margin) / np.exp(margin).sum())[pos_label])
+                for margin in margins
+            ]
+            base_score = 0.0
+            base_proba = 1.0 / n_classes
+    else:
+        preds = []
+        for i in range(1, n_trees + 1):
+            pred_raw = lgbmodel.predict(X_row, raw_score=True, num_iteration=i)[0]
+            pred_raw = _ensure_numeric_predictions(pred_raw)
+            if isinstance(pred_raw, np.ndarray):
+                pred_raw = pred_raw.item() if pred_raw.ndim == 0 else float(pred_raw[0])
+            preds.append(float(pred_raw))
+        base_score = 0.0
+
+    lgbm_preds_df = pd.DataFrame(
+        dict(tree=range(-1, n_trees), pred=[base_score] + preds)
+    )
+    lgbm_preds_df["pred_diff"] = lgbm_preds_df.pred.diff()
+    lgbm_preds_df.loc[0, "pred_diff"] = lgbm_preds_df.loc[0, "pred"]
+
+    if is_classifier:
+        lgbm_preds_df["pred_proba"] = [base_proba] + pred_probas
+        lgbm_preds_df["pred_proba_diff"] = lgbm_preds_df.pred_proba.diff()
+        lgbm_preds_df.loc[0, "pred_proba_diff"] = lgbm_preds_df.loc[0, "pred_proba"]
+
+    return lgbm_preds_df
diff --git a/explainerdashboard/explainer_plots.py b/explainerdashboard/explainer_plots.py
@@ -2930,6 +2930,7 @@ def plotly_xgboost_trees(
     target="",
     units="",
     higher_is_better=True,
+    model_name="xgboost",
 ):
     """Generate a plot showing the prediction of every single tree inside an XGBoost model
 
@@ -2944,6 +2945,8 @@ def plotly_xgboost_trees(
         units (str, optional): Units of target variable. Defaults to "".
         higher_is_better (bool, optional): up is green, down is red. If False then
             flip the colors.
+        model_name (str, optional): model family label used in chart titles.
+            Defaults to "xgboost".
 
     Returns:
         Plotly fig
@@ -3041,10 +3044,10 @@ def plotly_xgboost_trees(
     )
 
     if target:
-        title = f"Individual xgboost decision trees predicting {target}"
+        title = f"Individual {model_name} decision trees predicting {target}"
         yaxis_title = f"Predicted {target} {f'({units})' if units else ''}"
     else:
-        title = "Individual xgboost decision trees"
+        title = f"Individual {model_name} decision trees"
         yaxis_title = f"Predicted outcome ({units})" if units else "Predicted outcome"
 
     layout = go.Layout(
diff --git a/explainerdashboard/explainers.py b/explainerdashboard/explainers.py
diff --git a/scripts/run_lgbm_dashboard.py b/scripts/run_lgbm_dashboard.py
diff --git a/tests/test_boosting_models.py b/tests/test_boosting_models.py