Skip to content

burningcost/insurance-causal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

insurance-causal

Python License: MIT PyPI

Causal inference for insurance pricing, built on Double Machine Learning.


Every UK pricing team has the same argument in some form: "Is this factor causing the claims, or is it a proxy for something else?" For telematics, is harsh braking causing accidents or is it just correlated with urban driving? For renewal pricing, is the price increase causing lapse or are the customers receiving large increases systematically more likely to lapse anyway?

These are causal questions. GLM coefficients and GBM feature importances do not answer them - they measure correlation. The standard actuarial response ("we use educated judgment and check for factor stability") is honest but leaves money on the table.

Double Machine Learning (DML), introduced by Chernozhukov et al. (2018), solves this. It estimates causal treatment effects from observational data using ML to handle high-dimensional confounders, while preserving valid frequentist inference on the parameter that matters: how much does X causally affect Y?

insurance-causal wraps DoubleML with an interface designed for pricing actuaries. You specify the treatment (price change, channel flag, telematics score) and the confounders (rating factors), and it gives you a causal estimate with a confidence interval.


The killer feature: confounding bias report

A pricing team has a GLM coefficient on price change of -0.045. This is the naive estimate: price sensitivity looks very high. They fit DML and get:

report = model.confounding_bias_report(naive_coefficient=-0.045)
  treatment         outcome  naive_estimate  causal_estimate    bias  bias_pct  ...
  pct_price_change  renewal         -0.0450          -0.0230  -0.022     -95.7%

The naive estimate is roughly double the causal effect. The confounding mechanism: high-risk customers receive larger price increases, and those customers have lower baseline renewal rates. The price change is correlated with risk quality, so the naive regression attributes some of the risk-driven lapse to price sensitivity.

The correct causal elasticity is -0.023. Pricing decisions made using -0.045 are wrong. They will over-price high-risk customers at renewal (correct for other reasons) but for the wrong reason (thinking the price effect is twice as large as it is), and optimisation built on the naive elasticity will be systematically off.


Installation

uv add insurance-causal

Dependencies: doubleml, catboost, polars, pandas, scikit-learn, scipy, numpy.


Quick start

from insurance_causal import CausalPricingModel
from insurance_causal.treatments import PriceChangeTreatment

model = CausalPricingModel(
    outcome="renewal",
    outcome_type="binary",
    treatment=PriceChangeTreatment(
        column="pct_price_change",  # proportional change: 0.05 = 5% increase
        scale="log",                # transform to log(1+D); theta is semi-elasticity
    ),
    confounders=["age_band", "ncb_years", "vehicle_age", "prior_claims"],
    cv_folds=5,
)

model.fit(df)  # accepts polars or pandas DataFrame

ate = model.average_treatment_effect()
print(ate)

Output:

Average Treatment Effect
  Treatment: pct_price_change
  Outcome:   renewal
  Estimate:  -0.0231
  Std Error: 0.0041
  95% CI:    (-0.0311, -0.0151)
  p-value:   0.0000
  N:         15,000

Confounding bias report

# Compare to a naive GLM/OLS estimate
report = model.confounding_bias_report(naive_coefficient=-0.045)
print(report)

# Or pass a fitted sklearn/glum/statsmodels model directly
report = model.confounding_bias_report(glm_model=fitted_glm)

The report returns a DataFrame with: naive_estimate, causal_estimate, bias, bias_pct, and a plain-English interpretation.


Treatment types

Price change (continuous)

from insurance_causal.treatments import PriceChangeTreatment

treatment = PriceChangeTreatment(
    column="pct_price_change",   # proportional: 0.05 = 5% increase
    scale="log",                 # "log" or "linear"
    clip_percentiles=(0.01, 0.99),  # optional: clip extreme values
)

Binary treatment (channel, discount flag, product type)

from insurance_causal.treatments import BinaryTreatment

treatment = BinaryTreatment(
    column="is_aggregator",
    positive_label="aggregator",
    negative_label="direct",
)

Generic continuous (telematics score, credit score)

from insurance_causal.treatments import ContinuousTreatment

treatment = ContinuousTreatment(
    column="harsh_braking_score",
    standardise=True,  # coefficient = effect of 1 SD change
)

Outcome types

CausalPricingModel(
    outcome_type="binary",      # renewal indicator, conversion
    outcome_type="poisson",     # claim count (divide by exposure if exposure_col set)
    outcome_type="continuous",  # log loss cost, any symmetric continuous outcome
    outcome_type="gamma",       # claim severity (log-transformed internally)
)

For Poisson frequency, set exposure_col:

model = CausalPricingModel(
    outcome="claim_count",
    outcome_type="poisson",
    exposure_col="earned_years",
    ...
)

CATE by segment

Average treatment effects within subgroups. Fits a separate DML model per segment - computationally expensive but gives segment-level inference.

cate = model.cate_by_segment(df, segment_col="age_band")
# Returns DataFrame: segment, cate_estimate, ci_lower, ci_upper, std_error, p_value, n_obs

Or by decile of a risk score:

from insurance_causal.diagnostics import cate_by_decile

cate = cate_by_decile(model, df, score_col="predicted_frequency", n_deciles=10)

Sensitivity analysis

How strong would an unobserved confounder need to be to overturn the result?

from insurance_causal.diagnostics import sensitivity_analysis

ate = model.average_treatment_effect()
report = sensitivity_analysis(
    ate=ate.estimate,
    se=ate.std_error,
    gamma_values=[1.0, 1.25, 1.5, 2.0, 3.0],
)
print(report[["gamma", "conclusion_holds", "ci_lower", "ci_upper"]])

The Rosenbaum parameter gamma is the odds ratio of treatment for two units with identical observed confounders. Gamma = 1 is no unobserved confounding; Gamma = 2 means an unobserved factor doubles the treatment odds for some units. If conclusion_holds becomes False at Gamma = 1.25, the result is fragile. If it holds to Gamma = 2.0, the result is robust.


The maths, briefly

DML estimates the partially linear model:

Y = theta_0 * D + g_0(X) + epsilon
D = m_0(X) + V

Where theta_0 is the causal effect of treatment D on outcome Y, g_0(X) is an unknown nonlinear confounder effect, and m_0(X) is the conditional expectation of treatment given confounders.

The estimation procedure:

  1. Fit E[Y|X] using CatBoost (with 5-fold cross-fitting). Compute residuals Y_tilde = Y - E_hat[Y|X].
  2. Fit E[D|X] using CatBoost (with 5-fold cross-fitting). Compute residuals D_tilde = D - E_hat[D|X].
  3. Regress Y_tilde on D_tilde via OLS. The coefficient is theta_hat.

Step 3 is just OLS, which gives valid standard errors and confidence intervals. The cross-fitting in steps 1-2 ensures that nuisance estimation errors are asymptotically orthogonal to the score, so they do not bias theta_hat. This is the Neyman orthogonality property that makes DML valid even when the nuisance models are regularised ML estimators.

The result: theta_hat is root-n-consistent and asymptotically normal, with a valid 95% CI. This is not possible with naive ML plug-in estimators.


Why CatBoost for nuisance models?

The nuisance models E[Y|X] and E[D|X] need to be flexible nonlinear estimators that converge at n^{-1/4} or faster - a condition satisfied by well-tuned gradient boosted trees. A 2024 systematic evaluation (ArXiv 2403.14385) found that gradient boosted trees outperform LASSO in the DML nuisance step when confounding is genuinely nonlinear - which it is for insurance data with postcode effects and interaction of age with vehicle type.

CatBoost is the default because it handles categorical features natively (postcode band, vehicle group, occupation class) without label encoding, and its ordered boosting reduces target leakage from high-cardinality categoricals. The nuisance model architecture: 500 trees, depth 6, learning rate 0.05. This is more conservative than a typical predictive model but appropriate for the debiasing goal.


Limitations

Unobserved confounders. DML is only as good as the assumption that all relevant confounders are in the confounders list. If attitude to risk, actual annual mileage, or claim reporting behaviour are confounders and you do not observe them, the estimate is biased. Use sensitivity_analysis() to understand how fragile the result is to this assumption.

Near-deterministic treatment. If price changes are almost entirely determined by the pricing model (i.e. D is very close to a deterministic function of X), the residualised treatment D_tilde will have near-zero variance. The DML estimate will be imprecise and the confidence interval wide. This is correct behaviour - the data genuinely contain little exogenous variation to identify the causal effect. The solution is to include genuinely exogenous sources of variation: manual underwriting decisions, competitive environment shocks, or timing effects.

Mediators vs. confounders. Including a mediator (a variable causally downstream of treatment) as a confounder is the "bad controls" problem - it blocks the causal channel you are trying to measure. If NCB is partly caused by the claim experience that is itself caused by the risk factors you are studying, including NCB as a confounder will attenuate your estimate. Think carefully about the causal graph before specifying confounders.

Large datasets. DML with CatBoost and 5-fold cross-fitting is moderately expensive. On 100k observations with 10 confounders, expect 5-15 minutes on a standard Databricks cluster. Use fewer CV folds (cv_folds=3) for exploratory work.


References

  1. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018). "Double/Debiased Machine Learning for Treatment and Structural Parameters." The Econometrics Journal, 21(1): C1-C68. ArXiv: 1608.00060

  2. Bach, P., Chernozhukov, V., Kurz, M.S., Spindler, M. and Klaassen, S. (2024). "DoubleML: An Object-Oriented Implementation of Double Machine Learning in R." Journal of Statistical Software, 108(3): 1-56. docs.doubleml.org

  3. Guelman, L. and Guillen, M. (2014). "A causal inference approach to measure price elasticity in automobile insurance." Expert Systems with Applications, 41(2): 387-396.

  4. Chernozhukov, V. et al. (2024). "Applied Causal Inference Powered by ML and AI." causalml-book.org


Other Burning Cost libraries

Model building

Library Description
shap-relativities Extract rating relativities from GBMs using SHAP
insurance-interactions Automated GLM interaction detection via CANN and NID scores
insurance-cv Walk-forward cross-validation respecting IBNR structure

Uncertainty quantification

Library Description
insurance-conformal Distribution-free prediction intervals for Tweedie models
bayesian-pricing Hierarchical Bayesian models for thin-data segments
credibility Bühlmann-Straub credibility weighting

Deployment and optimisation

Library Description
rate-optimiser Constrained rate change optimisation with FCA PS21/5 compliance
insurance-demand Conversion, retention, and price elasticity modelling

Governance

Library Description
insurance-fairness Proxy discrimination auditing for UK insurance models
insurance-monitoring Model monitoring: PSI, A/E ratios, Gini drift test

Spatial

Library Description
insurance-spatial BYM2 spatial territory ratemaking for UK personal lines

All libraries →


Licence

MIT. Part of the Burning Cost insurance pricing toolkit.

About

Causal inference for insurance pricing. Double Machine Learning with CatBoost nuisance models, confounding bias reports, Poisson/Gamma/binary outcomes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages