Description
I ran into this because I got different column subsampling behaviour when using a built-in objective like MSE and when using a custom objective. I thought either: the reported seed in the booster.txt was wrong, or that columns are subsampled twice each iteration when using a custom objective, but only once when using a built-in objective (via perhaps an extra call to 'ResetByTree').
Reproducible example
import lightgbm as lgb
import numpy as np
import pandas as pd
np.random.seed(1)
X = np.random.randn(300, 300)
y = np.random.randn(300)
data = lgb.Dataset(X, label=y, init_score=np.zeros_like(y))
params = {'max_depth': 1, 'learning_rate': 0.001, 'verbose': -1, 'objective': 'l2', 'feature_fraction': 1.0, 'seed': 0}
model = lgb.train(params, data, num_boost_round=1000)
model_dict = model.dump_model(num_iteration=3)['tree_info']
def fobj(predictions, dataset):
return predictions - dataset.get_label(), np.ones_like(predictions)
model_custom = lgb.train(params, data, num_boost_round=1000, fobj=fobj)
model_custom_dict = model_custom.dump_model(num_iteration=3)['tree_info']
model_custom_dict == model_dict
This works when feature fraction is equal to 1.0 only. It breaks when feature_fraction = 0.5 for instance.
I can hack around the seeds to make it work:
import lightgbm as lgb
import numpy as np
import pandas as pd
np.random.seed(1)
X = np.random.randn(300, 300)
y = np.random.randn(300)
data = lgb.Dataset(X, label=y, init_score=np.zeros_like(y))
params = {'max_depth': 1, 'learning_rate': 0.001, 'verbose': -1, 'objective': 'l2', 'feature_fraction': 0.5, 'seed': 0, 'feature_fraction_seed': 974891790}
model = lgb.train(params, data, num_boost_round=1000)
model_dict = model.dump_model(num_iteration=3)['tree_info']
def fobj(predictions, dataset):
return predictions - dataset.get_label(), np.ones_like(predictions)
params['feature_fraction_seed'] = 2
model_custom = lgb.train(params, data, num_boost_round=1000, fobj=fobj)
model_custom_dict = model_custom.dump_model(num_iteration=3)['tree_info']
model_custom_dict == model_dict
Environment info
LightGBM version or commit hash: 8d01d648942a427f6bb4962dc3f4330e005fa495
Command(s) you used to install LightGBM
Description
I ran into this because I got different column subsampling behaviour when using a built-in objective like MSE and when using a custom objective. I thought either: the reported seed in the booster.txt was wrong, or that columns are subsampled twice each iteration when using a custom objective, but only once when using a built-in objective (via perhaps an extra call to 'ResetByTree').
Reproducible example
This works when feature fraction is equal to 1.0 only. It breaks when feature_fraction = 0.5 for instance.
I can hack around the seeds to make it work:
Environment info
LightGBM version or commit hash: 8d01d648942a427f6bb4962dc3f4330e005fa495
Command(s) you used to install LightGBM