You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Advanced-Topics.rst
+9-1Lines changed: 9 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ Categorical Feature Support
23
23
- Use ``categorical_feature`` to specify the categorical features.
24
24
Refer to the parameter ``categorical_feature`` in `Parameters <./Parameters.rst#categorical_feature>`__.
25
25
26
-
- Categorical features will be cast to ``int32`` (integer codes will be extracted from pandas categoricals in the Python-package) so they must be encoded as non-negative integers (negative values will be treated as missing)
26
+
- Categorical features will be cast to ``int32`` so they must be encoded as non-negative integers (negative values will be treated as missing)
27
27
less than ``Int32.MaxValue`` (2147483647).
28
28
It is best to use a contiguous range of integers started from zero.
29
29
Floating point numbers in categorical features will be rounded towards 0.
@@ -34,6 +34,14 @@ Categorical Feature Support
34
34
treat the feature as numeric, either by simply ignoring the categorical interpretation of the integers or
35
35
by embedding the categories in a low-dimensional numeric space.
36
36
37
+
.. note::
38
+
39
+
When using the Python package with a pandas ``DataFrame`` and columns of dtype ``category``,
40
+
LightGBM stores the category labels observed during training and re-aligns categories at
41
+
prediction time before converting them to integer codes. This ensures consistent encoding
42
+
even if category order or subsets differ between training and prediction data. Categories
43
+
not seen during training are treated as missing values.
0 commit comments