Skip to content

Feature fraction gives different column selections when using a custom objective #6053

@jrvmalik

Description

@jrvmalik

Description

I ran into this because I got different column subsampling behaviour when using a built-in objective like MSE and when using a custom objective. I thought either: the reported seed in the booster.txt was wrong, or that columns are subsampled twice each iteration when using a custom objective, but only once when using a built-in objective (via perhaps an extra call to 'ResetByTree').

Reproducible example

import lightgbm as lgb
import numpy as np
import pandas as pd

np.random.seed(1)
X = np.random.randn(300, 300)
y = np.random.randn(300)

data = lgb.Dataset(X, label=y, init_score=np.zeros_like(y))
params = {'max_depth': 1, 'learning_rate': 0.001, 'verbose': -1, 'objective': 'l2', 'feature_fraction': 1.0, 'seed': 0}

model = lgb.train(params, data, num_boost_round=1000)
model_dict = model.dump_model(num_iteration=3)['tree_info']

def fobj(predictions, dataset):
    return predictions - dataset.get_label(), np.ones_like(predictions)

model_custom = lgb.train(params, data, num_boost_round=1000, fobj=fobj)
model_custom_dict = model_custom.dump_model(num_iteration=3)['tree_info']

model_custom_dict == model_dict

This works when feature fraction is equal to 1.0 only. It breaks when feature_fraction = 0.5 for instance.

I can hack around the seeds to make it work:

import lightgbm as lgb
import numpy as np
import pandas as pd

np.random.seed(1)
X = np.random.randn(300, 300)
y = np.random.randn(300)

data = lgb.Dataset(X, label=y, init_score=np.zeros_like(y))
params = {'max_depth': 1, 'learning_rate': 0.001, 'verbose': -1, 'objective': 'l2', 'feature_fraction': 0.5, 'seed': 0, 'feature_fraction_seed': 974891790}

model = lgb.train(params, data, num_boost_round=1000)
model_dict = model.dump_model(num_iteration=3)['tree_info']

def fobj(predictions, dataset):
    return predictions - dataset.get_label(), np.ones_like(predictions)

params['feature_fraction_seed'] = 2
model_custom = lgb.train(params, data, num_boost_round=1000, fobj=fobj)
model_custom_dict = model_custom.dump_model(num_iteration=3)['tree_info']

model_custom_dict == model_dict

Environment info

LightGBM version or commit hash: 8d01d648942a427f6bb4962dc3f4330e005fa495

Command(s) you used to install LightGBM

pip install lightgbm

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions