causalml
- Meta-Learner Example Notebook¶CausalML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research. It provides a standard interface that allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention T on outcome Y for users with observed features X, without strong assumptions on the model form. The package currently supports the following methods:
In this notebook, we will generate some synthetic data to demonstrate how to use the various Meta-Learner algorithms in order to estimate Individual Treatment Effects (and Average Treatment Effects with confidence intervals).
import os
base_path = os.path.abspath("../")
os.chdir(base_path)
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from xgboost import XGBRegressor
import warnings
from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor, MLPTRegressor
from causalml.inference.meta import BaseXRegressor, BaseRRegressor, BaseSRegressor, BaseTRegressor
from causalml.match import NearestNeighborMatch, MatchOptimizer, create_table_one
from causalml.propensity import ElasticNetPropensityModel
from causalml.dataset import *
from causalml.metrics import *
warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')
%matplotlib inline
import causalml
print(causalml.__version__)
0.6.0
mode
). Refer to the References section for more detail on these data generation processes.# Generate synthetic data using mode 1
y, X, treatment, tau, b, e = synthetic_data(mode=1, n=10000, p=8, sigma=1.0)
A meta-learner can be instantiated by calling a base learner class and providing an sklearn/xgboost regressor class as input. Alternatively, we have provided some ready-to-use learners that have already inherited their respective base learner class capabilities. This is more abstracted and allows these tools to be quickly and readily usable.
# Ready-to-use S-Learner using LinearRegression
learner_s = LRSRegressor()
ate_s = learner_s.estimate_ate(X=X, treatment=treatment, y=y)
print(ate_s)
print('ATE estimate: {:.03f}'.format(ate_s[0][0]))
print('ATE lower bound: {:.03f}'.format(ate_s[1][0]))
print('ATE upper bound: {:.03f}'.format(ate_s[2][0]))
(array([0.67948682]), array([0.63168778]), array([0.72728586])) ATE estimate: 0.679 ATE lower bound: 0.632 ATE upper bound: 0.727
# Ready-to-use T-Learner using XGB
learner_t = XGBTRegressor()
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the ready-to-use XGBTRegressor class')
print(ate_t)
# Calling the Base Learner class and feeding in XGB
learner_t = BaseTRegressor(learner=XGBRegressor())
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseTRegressor class and using XGB (same result):')
print(ate_t)
# Calling the Base Learner class and feeding in LinearRegression
learner_t = BaseTRegressor(learner=LinearRegression())
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseTRegressor class and using Linear Regression (different result):')
print(ate_t)
Using the ready-to-use XGBTRegressor class (array([0.58515988]), array([0.54731816]), array([0.6230016])) Using the BaseTRegressor class and using XGB (same result): (array([0.58515988]), array([0.54731816]), array([0.6230016])) Using the BaseTRegressor class and using Linear Regression (different result): (array([0.66952597]), array([0.62869069]), array([0.71036124]))
# X Learner with propensity score input
# Calling the Base Learner class and feeding in XGB
learner_x = BaseXRegressor(learner=XGBRegressor())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseXRegressor class and using XGB:')
print(ate_x)
# Calling the Base Learner class and feeding in LinearRegression
learner_x = BaseXRegressor(learner=LinearRegression())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('\nUsing the BaseXRegressor class and using Linear Regression:')
print(ate_x)
Using the BaseXRegressor class and using XGB: (array([0.48938315]), array([0.45231978]), array([0.52644652])) Using the BaseXRegressor class and using Linear Regression: (array([0.65372048]), array([0.61345327]), array([0.69398768]))
# X Learner without propensity score input
# Calling the Base Learner class and feeding in XGB
learner_x = BaseXRegressor(XGBRegressor())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseXRegressor class and using XGB without propensity score input:')
print(ate_x)
# Calling the Base Learner class and feeding in LinearRegression
learner_x = BaseXRegressor(learner=LinearRegression())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseXRegressor class and using Linear Regression without propensity score input:')
print(ate_x)
Using the BaseXRegressor class and using XGB without propensity score input: (array([0.49405919]), array([0.45699449]), array([0.53112388])) Using the BaseXRegressor class and using Linear Regression without propensity score input: (array([0.65372048]), array([0.61345327]), array([0.69398768]))
# R Learner with propensity score input
# Calling the Base Learner class and feeding in XGB
learner_r = BaseRRegressor(learner=XGBRegressor())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseRRegressor class and using XGB:')
print(ate_r)
# Calling the Base Learner class and feeding in LinearRegression
learner_r = BaseRRegressor(learner=LinearRegression())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseRRegressor class and using Linear Regression:')
print(ate_r)
Using the BaseRRegressor class and using XGB: (array([0.4625999]), array([0.46219818]), array([0.46300161])) Using the BaseRRegressor class and using Linear Regression: (array([0.45658713]), array([0.45617783]), array([0.45699643]))
# R Learner without propensity score input
# Calling the Base Learner class and feeding in XGB
learner_r = BaseRRegressor(learner=XGBRegressor())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseRRegressor class and using XGB without propensity score input:')
print(ate_r)
# Calling the Base Learner class and feeding in LinearRegression
learner_r = BaseRRegressor(learner=LinearRegression())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseRRegressor class and using Linear Regression without propensity score input:')
print(ate_r)
Using the BaseRRegressor class and using XGB without propensity score input: (array([0.43475727]), array([0.43435639]), array([0.43515815])) Using the BaseRRegressor class and using Linear Regression without propensity score input: (array([0.53967039]), array([0.53926028]), array([0.5400805]))
CATE stands for Conditional Average Treatment Effect.
# S Learner
learner_s = LRSRegressor()
cate_s = learner_s.fit_predict(X=X, treatment=treatment, y=y)
# T Learner
learner_t = BaseTRegressor(learner=XGBRegressor())
cate_t = learner_t.fit_predict(X=X, treatment=treatment, y=y)
# X Learner with propensity score input
learner_x = BaseXRegressor(learner=XGBRegressor())
cate_x = learner_x.fit_predict(X=X, treatment=treatment, y=y, p=e)
# X Learner without propensity score input
learner_x_no_p = BaseXRegressor(learner=XGBRegressor())
cate_x_no_p = learner_x_no_p.fit_predict(X=X, treatment=treatment, y=y)
# R Learner with propensity score input
learner_r = BaseRRegressor(learner=XGBRegressor())
cate_r = learner_r.fit_predict(X=X, treatment=treatment, y=y, p=e)
# R Learner without propensity score input
learner_r_no_p = BaseRRegressor(learner=XGBRegressor())
cate_r_no_p = learner_r_no_p.fit_predict(X=X, treatment=treatment, y=y)
alpha=0.2
bins=30
plt.figure(figsize=(12,8))
plt.hist(cate_t, alpha=alpha, bins=bins, label='T Learner')
plt.hist(cate_x, alpha=alpha, bins=bins, label='X Learner')
plt.hist(cate_x_no_p, alpha=alpha, bins=bins, label='X Learner (no propensity score)')
plt.hist(cate_r, alpha=alpha, bins=bins, label='R Learner')
plt.hist(cate_r_no_p, alpha=alpha, bins=bins, label='R Learner (no propensity score)')
plt.vlines(cate_s[0], 0, plt.axes().get_ylim()[1], label='S Learner',
linestyles='dotted', colors='green', linewidth=2)
plt.title('Distribution of CATE Predictions by Meta Learner')
plt.xlabel('Individual Treatment Effect (ITE/CATE)')
plt.ylabel('# of Samples')
_=plt.legend()
We will validate the meta-learners' performance based on the same synthetic data generation method in Part A (simulate_nuisance_and_easy_treatment
).
train_summary, validation_summary = get_synthetic_summary_holdout(simulate_nuisance_and_easy_treatment,
n=10000,
valid_size=0.2,
k=10)
train_summary
Abs % Error of ATE | MSE | KL Divergence | |
---|---|---|---|
Actuals | 0.000000 | 0.000000 | 0.000000 |
S Learner (LR) | 0.363843 | 0.075464 | 3.703878 |
S Learner (XGB) | 0.134768 | 0.023479 | 0.165504 |
T Learner (LR) | 0.349172 | 0.035558 | 0.312561 |
T Learner (XGB) | 0.158056 | 0.072879 | 0.274262 |
X Learner (LR) | 0.349172 | 0.035558 | 0.312561 |
X Learner (XGB) | 0.074852 | 0.033084 | 0.077669 |
R Learner (LR) | 0.299947 | 0.036066 | 0.297235 |
R Learner (XGB) | 0.064613 | 0.061360 | 0.066290 |
validation_summary
Abs % Error of ATE | MSE | KL Divergence | |
---|---|---|---|
Actuals | 0.000000 | 0.000000 | 0.000000 |
S Learner (LR) | 0.369756 | 0.076275 | 3.741560 |
S Learner (XGB) | 0.151102 | 0.026515 | 0.207891 |
T Learner (LR) | 0.347122 | 0.034516 | 0.330129 |
T Learner (XGB) | 0.174249 | 0.071294 | 0.323597 |
X Learner (LR) | 0.347122 | 0.034516 | 0.330129 |
X Learner (XGB) | 0.075413 | 0.033826 | 0.094754 |
R Learner (LR) | 0.299680 | 0.031474 | 0.296118 |
R Learner (XGB) | 0.054303 | 0.051122 | 0.077068 |
scatter_plot_summary_holdout(train_summary,
validation_summary,
k=10,
label=['Train', 'Validation'],
drop_learners=[],
drop_cols=[])
bar_plot_summary_holdout(train_summary,
validation_summary,
k=10,
drop_learners=['S Learner (LR)'],
drop_cols=[])
# Single simulation
train_preds, valid_preds = get_synthetic_preds_holdout(simulate_nuisance_and_easy_treatment,
n=50000,
valid_size=0.2)
#distribution plot for signle simulation of Training
distr_plot_single_sim(train_preds, kind='kde', linewidth=2, bw_method=0.5,
drop_learners=['S Learner (LR)',' S Learner (XGB)'])
#distribution plot for signle simulation of Validaiton
distr_plot_single_sim(valid_preds, kind='kde', linewidth=2, bw_method=0.5,
drop_learners=['S Learner (LR)', 'S Learner (XGB)'])
# Scatter Plots for a Single Simulation of Training Data
scatter_plot_single_sim(train_preds)
# Scatter Plots for a Single Simulation of Validaiton Data
scatter_plot_single_sim(valid_preds)
# Cumulative Gain AUUC values for a Single Simulation of Training Data
get_synthetic_auuc(train_preds, drop_learners=['S Learner (LR)'])
Learner | cum_gain_auuc | |
---|---|---|
0 | Actuals | 12387.871130 |
2 | T Learner (LR) | 12379.946821 |
4 | X Learner (LR) | 12379.946821 |
6 | R Learner (LR) | 12379.688978 |
5 | X Learner (XGB) | 12160.326688 |
1 | S Learner (XGB) | 12091.486764 |
3 | T Learner (XGB) | 12084.763954 |
7 | R Learner (XGB) | 12019.181076 |
8 | Random | 10054.192265 |
# Cumulative Gain AUUC values for a Single Simulation of Validaiton Data
get_synthetic_auuc(valid_preds, drop_learners=['S Learner (LR)'])
Learner | cum_gain_auuc | |
---|---|---|
0 | Actuals | 3081.614293 |
2 | T Learner (LR) | 3079.658839 |
4 | X Learner (LR) | 3079.658839 |
6 | R Learner (LR) | 3079.581622 |
5 | X Learner (XGB) | 3024.527826 |
1 | S Learner (XGB) | 3007.237997 |
3 | T Learner (XGB) | 3005.787684 |
7 | R Learner (XGB) | 2987.524469 |
8 | Random | 2491.192935 |