[python-package] fix misleading feature name warning on sklearn 1.6+ predict()#7232
Open
FranciscoRMendes wants to merge 3 commits intolightgbm-org:masterfrom
Open
[python-package] fix misleading feature name warning on sklearn 1.6+ predict()#7232FranciscoRMendes wants to merge 3 commits intolightgbm-org:masterfrom
FranciscoRMendes wants to merge 3 commits intolightgbm-org:masterfrom
Conversation
…predict()
When fitting on data without feature names (e.g. numpy arrays), LightGBM
auto-generates names like Column_0, Column_1, etc. Previously these were
exposed via feature_names_in_, causing sklearn 1.6+ to emit a spurious
UserWarning during predict() ("X does not have valid feature names, but
... was fitted with feature names").
feature_names_in_ now raises AttributeError when the training data had no
feature names, matching sklearn's own convention. Auto-generated names
remain accessible via the LightGBM-specific feature_name_ property.
Fixes lightgbm-org#6798
beaa0b2 to
c4e1c39
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #6798.
Starting with scikit-learn 1.6,
LGBMClassifier/LGBMRegressor/etc. emit a spurious warning duringpredict()when fitted on data without feature names (e.g. a numpy array):This happens because LightGBM auto-generates feature names (
Column_0,Column_1, ...) for such inputs and previously exposed them viafeature_names_in_, causing sklearn to believe the model was fitted with named features.Fix:
feature_names_in_now raisesAttributeErrorwhen training data had no feature names, matching sklearn's own convention. Auto-generated names remain accessible via the LightGBM-specificfeature_name_property.Changes
python-package/lightgbm/sklearn.py: track_fitted_with_feature_namesinfit(); gatefeature_names_in_on that flagtests/python_package_test/test_sklearn.py: updatetest_getting_feature_names_in_np_inputto assertfeature_names_in_is absent for numpy input; add regression testtest_no_spurious_feature_name_warning_on_np_predictTest plan
test_getting_feature_names_in_np_input— assertsfeature_names_in_is not set after numpy fit,feature_name_still workstest_no_spurious_feature_name_warning_on_np_predict— asserts no warnings raised duringpredict()on numpy inputtest_getting_feature_names_in_pd_input— unchanged; DataFrame input still exposesfeature_names_in_test_sklearn.pysuite: 483 passed, 0 failures