`causalml` - Meta-Learner Example Notebook¶

Introduction¶

CausalML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research. It provides a standard interface that allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention T on outcome Y for users with observed features X, without strong assumptions on the model form. The package currently supports the following methods:

Tree-based algorithms
- Uplift tree/random forests on KL divergence, Euclidean Distance, and Chi-Square
- Uplift tree/random forests on Contextual Treatment Selection
Meta-learner algorithms
- S-learner
- T-learner
- X-learner
- R-learner

In this notebook, we will generate some synthetic data to demonstrate how to use the various Meta-Learner algorithms in order to estimate Individual Treatment Effects (and Average Treatment Effects with confidence intervals).

In [1]:

import os
base_path = os.path.abspath("../")
os.chdir(base_path)

In [2]:

%load_ext autoreload
%autoreload 2

In [4]:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from xgboost import XGBRegressor
import warnings

from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor, MLPTRegressor
from causalml.inference.meta import BaseXRegressor, BaseRRegressor, BaseSRegressor, BaseTRegressor
from causalml.match import NearestNeighborMatch, MatchOptimizer, create_table_one
from causalml.propensity import ElasticNetPropensityModel
from causalml.dataset import *
from causalml.metrics import *

warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')

%matplotlib inline

In [17]:

import causalml
print(causalml.__version__)

0.6.0

Part A: Example Workflow using Synthetic Data¶

Generate synthetic data¶

We have implemented 4 modes of generating synthetic data (specified by input parameter mode). Refer to the References section for more detail on these data generation processes.

In [5]:

# Generate synthetic data using mode 1
y, X, treatment, tau, b, e = synthetic_data(mode=1, n=10000, p=8, sigma=1.0)

Calculate Average Treatment Effect (ATE)¶

A meta-learner can be instantiated by calling a base learner class and providing an sklearn/xgboost regressor class as input. Alternatively, we have provided some ready-to-use learners that have already inherited their respective base learner class capabilities. This is more abstracted and allows these tools to be quickly and readily usable.

In [5]:

# Ready-to-use S-Learner using LinearRegression
learner_s = LRSRegressor()
ate_s = learner_s.estimate_ate(X=X, treatment=treatment, y=y)
print(ate_s)
print('ATE estimate: {:.03f}'.format(ate_s[0][0]))
print('ATE lower bound: {:.03f}'.format(ate_s[1][0]))
print('ATE upper bound: {:.03f}'.format(ate_s[2][0]))

(array([0.67948682]), array([0.63168778]), array([0.72728586]))
ATE estimate: 0.679
ATE lower bound: 0.632
ATE upper bound: 0.727

In [6]:

# Ready-to-use T-Learner using XGB
learner_t = XGBTRegressor()
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the ready-to-use XGBTRegressor class')
print(ate_t)

# Calling the Base Learner class and feeding in XGB
learner_t = BaseTRegressor(learner=XGBRegressor())
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseTRegressor class and using XGB (same result):')
print(ate_t)

# Calling the Base Learner class and feeding in LinearRegression
learner_t = BaseTRegressor(learner=LinearRegression())
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseTRegressor class and using Linear Regression (different result):')
print(ate_t)

Using the ready-to-use XGBTRegressor class
(array([0.58515988]), array([0.54731816]), array([0.6230016]))

Using the BaseTRegressor class and using XGB (same result):
(array([0.58515988]), array([0.54731816]), array([0.6230016]))

Using the BaseTRegressor class and using Linear Regression (different result):
(array([0.66952597]), array([0.62869069]), array([0.71036124]))

In [6]:

# X Learner with propensity score input
# Calling the Base Learner class and feeding in XGB
learner_x = BaseXRegressor(learner=XGBRegressor())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseXRegressor class and using XGB:')
print(ate_x)

# Calling the Base Learner class and feeding in LinearRegression
learner_x = BaseXRegressor(learner=LinearRegression())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('\nUsing the BaseXRegressor class and using Linear Regression:')
print(ate_x)

Using the BaseXRegressor class and using XGB:
(array([0.48938315]), array([0.45231978]), array([0.52644652]))

Using the BaseXRegressor class and using Linear Regression:
(array([0.65372048]), array([0.61345327]), array([0.69398768]))

In [8]:

# X Learner without propensity score input
# Calling the Base Learner class and feeding in XGB
learner_x = BaseXRegressor(XGBRegressor())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseXRegressor class and using XGB without propensity score input:')
print(ate_x)

# Calling the Base Learner class and feeding in LinearRegression
learner_x = BaseXRegressor(learner=LinearRegression())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseXRegressor class and using Linear Regression without propensity score input:')
print(ate_x)

Using the BaseXRegressor class and using XGB without propensity score input:
(array([0.49405919]), array([0.45699449]), array([0.53112388]))

Using the BaseXRegressor class and using Linear Regression without propensity score input:
(array([0.65372048]), array([0.61345327]), array([0.69398768]))

In [9]:

# R Learner with propensity score input
# Calling the Base Learner class and feeding in XGB
learner_r = BaseRRegressor(learner=XGBRegressor())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseRRegressor class and using XGB:')
print(ate_r)

# Calling the Base Learner class and feeding in LinearRegression
learner_r = BaseRRegressor(learner=LinearRegression())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseRRegressor class and using Linear Regression:')
print(ate_r)

Using the BaseRRegressor class and using XGB:
(array([0.4625999]), array([0.46219818]), array([0.46300161]))
Using the BaseRRegressor class and using Linear Regression:
(array([0.45658713]), array([0.45617783]), array([0.45699643]))

In [10]:

# R Learner without propensity score input
# Calling the Base Learner class and feeding in XGB
learner_r = BaseRRegressor(learner=XGBRegressor())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseRRegressor class and using XGB without propensity score input:')
print(ate_r)

# Calling the Base Learner class and feeding in LinearRegression
learner_r = BaseRRegressor(learner=LinearRegression())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseRRegressor class and using Linear Regression without propensity score input:')
print(ate_r)

Using the BaseRRegressor class and using XGB without propensity score input:
(array([0.43475727]), array([0.43435639]), array([0.43515815]))
Using the BaseRRegressor class and using Linear Regression without propensity score input:
(array([0.53967039]), array([0.53926028]), array([0.5400805]))

7. Calculate Individual Treatment Effect (ITE/CATE)¶

CATE stands for Conditional Average Treatment Effect.

In [12]:

# S Learner
learner_s = LRSRegressor()
cate_s = learner_s.fit_predict(X=X, treatment=treatment, y=y)

# T Learner
learner_t = BaseTRegressor(learner=XGBRegressor())
cate_t = learner_t.fit_predict(X=X, treatment=treatment, y=y)

# X Learner with propensity score input
learner_x = BaseXRegressor(learner=XGBRegressor())
cate_x = learner_x.fit_predict(X=X, treatment=treatment, y=y, p=e)

# X Learner without propensity score input
learner_x_no_p = BaseXRegressor(learner=XGBRegressor())
cate_x_no_p = learner_x_no_p.fit_predict(X=X, treatment=treatment, y=y)

# R Learner with propensity score input 
learner_r = BaseRRegressor(learner=XGBRegressor())
cate_r = learner_r.fit_predict(X=X, treatment=treatment, y=y, p=e)

# R Learner without propensity score input
learner_r_no_p = BaseRRegressor(learner=XGBRegressor())
cate_r_no_p = learner_r_no_p.fit_predict(X=X, treatment=treatment, y=y)

In [13]:

alpha=0.2
bins=30
plt.figure(figsize=(12,8))
plt.hist(cate_t, alpha=alpha, bins=bins, label='T Learner')
plt.hist(cate_x, alpha=alpha, bins=bins, label='X Learner')
plt.hist(cate_x_no_p, alpha=alpha, bins=bins, label='X Learner (no propensity score)')
plt.hist(cate_r, alpha=alpha, bins=bins, label='R Learner')
plt.hist(cate_r_no_p, alpha=alpha, bins=bins, label='R Learner (no propensity score)')
plt.vlines(cate_s[0], 0, plt.axes().get_ylim()[1], label='S Learner',
           linestyles='dotted', colors='green', linewidth=2)
plt.title('Distribution of CATE Predictions by Meta Learner')
plt.xlabel('Individual Treatment Effect (ITE/CATE)')
plt.ylabel('# of Samples')
_=plt.legend()

Part B: Validating Meta-Learner Accuracy¶

We will validate the meta-learners' performance based on the same synthetic data generation method in Part A (simulate_nuisance_and_easy_treatment).

In [14]:

train_summary, validation_summary = get_synthetic_summary_holdout(simulate_nuisance_and_easy_treatment,
                                                                  n=10000,
                                                                  valid_size=0.2,
                                                                  k=10)

In [15]:

train_summary

Out[15]:

	Abs % Error of ATE	MSE	KL Divergence
Actuals	0.000000	0.000000	0.000000
S Learner (LR)	0.363843	0.075464	3.703878
S Learner (XGB)	0.134768	0.023479	0.165504
T Learner (LR)	0.349172	0.035558	0.312561
T Learner (XGB)	0.158056	0.072879	0.274262
X Learner (LR)	0.349172	0.035558	0.312561
X Learner (XGB)	0.074852	0.033084	0.077669
R Learner (LR)	0.299947	0.036066	0.297235
R Learner (XGB)	0.064613	0.061360	0.066290

In [16]:

validation_summary

Out[16]:

	Abs % Error of ATE	MSE	KL Divergence
Actuals	0.000000	0.000000	0.000000
S Learner (LR)	0.369756	0.076275	3.741560
S Learner (XGB)	0.151102	0.026515	0.207891
T Learner (LR)	0.347122	0.034516	0.330129
T Learner (XGB)	0.174249	0.071294	0.323597
X Learner (LR)	0.347122	0.034516	0.330129
X Learner (XGB)	0.075413	0.033826	0.094754
R Learner (LR)	0.299680	0.031474	0.296118
R Learner (XGB)	0.054303	0.051122	0.077068

In [17]:

scatter_plot_summary_holdout(train_summary,
                             validation_summary,
                             k=10,
                             label=['Train', 'Validation'],
                             drop_learners=[],
                             drop_cols=[])

In [18]:

bar_plot_summary_holdout(train_summary,
                         validation_summary,
                         k=10,
                         drop_learners=['S Learner (LR)'],
                         drop_cols=[])

In [19]:

# Single simulation
train_preds, valid_preds = get_synthetic_preds_holdout(simulate_nuisance_and_easy_treatment,
                                                       n=50000,
                                                       valid_size=0.2)

In [20]:

#distribution plot for signle simulation of Training
distr_plot_single_sim(train_preds, kind='kde', linewidth=2, bw_method=0.5,
                      drop_learners=['S Learner (LR)',' S Learner (XGB)'])

In [21]:

#distribution plot for signle simulation of Validaiton
distr_plot_single_sim(valid_preds, kind='kde', linewidth=2, bw_method=0.5,
                      drop_learners=['S Learner (LR)', 'S Learner (XGB)'])

In [22]:

# Scatter Plots for a Single Simulation of Training Data
scatter_plot_single_sim(train_preds)

In [23]:

# Scatter Plots for a Single Simulation of Validaiton Data
scatter_plot_single_sim(valid_preds)

In [24]:

# Cumulative Gain AUUC values for a Single Simulation of Training Data
get_synthetic_auuc(train_preds, drop_learners=['S Learner (LR)'])

Out[24]:

	Learner	cum_gain_auuc
0	Actuals	12387.871130
2	T Learner (LR)	12379.946821
4	X Learner (LR)	12379.946821
6	R Learner (LR)	12379.688978
5	X Learner (XGB)	12160.326688
1	S Learner (XGB)	12091.486764
3	T Learner (XGB)	12084.763954
7	R Learner (XGB)	12019.181076
8	Random	10054.192265

In [25]:

# Cumulative Gain AUUC values for a Single Simulation of Validaiton Data
get_synthetic_auuc(valid_preds, drop_learners=['S Learner (LR)'])

Out[25]:

	Learner	cum_gain_auuc
0	Actuals	3081.614293
2	T Learner (LR)	3079.658839
4	X Learner (LR)	3079.658839
6	R Learner (LR)	3079.581622
5	X Learner (XGB)	3024.527826
1	S Learner (XGB)	3007.237997
3	T Learner (XGB)	3005.787684
7	R Learner (XGB)	2987.524469
8	Random	2491.192935

causalml - Meta-Learner Example Notebook¶