Skip to content

[python-package] fix misleading feature name warning on sklearn 1.6+ predict()#7232

Open
FranciscoRMendes wants to merge 3 commits intolightgbm-org:masterfrom
FranciscoRMendes:fix/issue-6798
Open

[python-package] fix misleading feature name warning on sklearn 1.6+ predict()#7232
FranciscoRMendes wants to merge 3 commits intolightgbm-org:masterfrom
FranciscoRMendes:fix/issue-6798

Conversation

@FranciscoRMendes
Copy link
Copy Markdown

@FranciscoRMendes FranciscoRMendes commented Apr 19, 2026

Summary

Fixes #6798.

Starting with scikit-learn 1.6, LGBMClassifier/LGBMRegressor/etc. emit a spurious warning during predict() when fitted on data without feature names (e.g. a numpy array):

UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names

This happens because LightGBM auto-generates feature names (Column_0, Column_1, ...) for such inputs and previously exposed them via feature_names_in_, causing sklearn to believe the model was fitted with named features.

Fix: feature_names_in_ now raises AttributeError when training data had no feature names, matching sklearn's own convention. Auto-generated names remain accessible via the LightGBM-specific feature_name_ property.

Changes

  • python-package/lightgbm/sklearn.py: track _fitted_with_feature_names in fit(); gate feature_names_in_ on that flag
  • tests/python_package_test/test_sklearn.py: update test_getting_feature_names_in_np_input to assert feature_names_in_ is absent for numpy input; add regression test test_no_spurious_feature_name_warning_on_np_predict

Test plan

  • test_getting_feature_names_in_np_input — asserts feature_names_in_ is not set after numpy fit, feature_name_ still works
  • test_no_spurious_feature_name_warning_on_np_predict — asserts no warnings raised during predict() on numpy input
  • test_getting_feature_names_in_pd_input — unchanged; DataFrame input still exposes feature_names_in_
  • Full test_sklearn.py suite: 483 passed, 0 failures

…predict()

When fitting on data without feature names (e.g. numpy arrays), LightGBM
auto-generates names like Column_0, Column_1, etc. Previously these were
exposed via feature_names_in_, causing sklearn 1.6+ to emit a spurious
UserWarning during predict() ("X does not have valid feature names, but
... was fitted with feature names").

feature_names_in_ now raises AttributeError when the training data had no
feature names, matching sklearn's own convention. Auto-generated names
remain accessible via the LightGBM-specific feature_name_ property.

Fixes lightgbm-org#6798
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[python-package] on scikit-learn 1.6+, predict() raises misleading warning "X does not have valid feature names"

2 participants