#!/usr/bin/env python
# coding: utf-8
# In[1]:
from IPython.display import Image
Image('../../Python_probability_statistics_machine_learning_2E.png',width=200)
# # Monte Carlo Sampling Methods
#
# So far, we have studied analytical ways to
# transform random variables and how
# to augment these methods using Python. In
# spite of all this, we frequently must
# resort to purely numerical methods to
# solve real-world problems. Hopefully,
# now that we have seen the deeper theory,
# these numerical methods will feel more
# concrete. Suppose we want to generate
# samples of a given density, $f(x)$,
# given we already can generate samples from a
# uniform distribution,
# $\mathcal{U}[0,1]$. How do we know a random sample $v$
# comes from the $f(x)$
# distribution? One approach is to look at how a histogram
# of samples of $v$
# approximates $f(x)$. Specifically,
#
#
#
#
# $$
# \begin{equation}
# \mathbb{P}( v \in N_{\Delta}(x) ) = f(x) \Delta x
# \end{equation}
# \label{eq:mc01} \tag{1}
# $$
#
#
#
#
#
# The histogram approximates the
# target probability density.
#
#
#
# which says that the probability that a sample is in some $N_\Delta$
# neighborhood
# of $x$ is approximately $f(x)\Delta x$. [Figure](#fig:Sampling_Monte_Carlo_000)
# shows the target probability density function
# $f(x)$ and a histogram that
# approximates it. The histogram is generated from
# samples $v$. The hatched
# rectangle in the center illustrates Equation
# [1](#eq:mc01). The area of this
# rectangle is approximately $f(x)\Delta x$ where
# $x=0$, in this case. The width
# of the rectangle is $N_{\Delta}(x)$ The quality
# of the approximation may be
# clear visually, but to know that $v$ samples are
# characterized by $f(x)$, we
# need the statement of Equation [1](#eq:mc01), which
# says that the proportion of
# samples $v$ that fill the hatched rectangle is
# approximately equal to
# $f(x)\Delta x$.
#
# Now that we know how to evaluate samples $v$ that are
# characterized by the density
# $f(x)$, let's consider how to create these samples
# for both discrete and
# continuous random variables.
#
# ## Inverse CDF Method for
# Discrete Variables
#
# Suppose we want to generate samples from a fair six-sided
# die. Our workhouse
# uniform random variable is defined continuously over the
# unit interval and the
# fair six-sided die is discrete. We must first create a
# mapping between the
# continuous random variable $u$ and the discrete outcomes of
# the die. This
# mapping is shown in [Figure](#fig:Sampling_Monte_Carlo_0001) where
# the unit
# interval is broken up into segments, each of length $1/6$. Each
# individual
# segment is assigned to one of the die outcomes. For example, if $u
# \in
# [1/6,2/6)$, then the outcome for the die is $2$. Because the die is fair,
# all
# segments on the unit interval are the same length. Thus, our new random
# variable $v$ is derived from $u$ by this assignment.
#
#
#
#
#
# A uniform distribution random
# variable on the unit interval is assigned to the six outcomes of a fair die
# using these segements.
#
#
#
#
#
# For
# example, for $v=2$, we have,
#
# $$
# \mathbb{P}(v=2) = \mathbb{P}(u\in [1/6,2/6)) = 1/6
# $$
#
# where, in the language of the Equation [1](#eq:mc01), $f(x)=1$
# (uniform
# distribution), $\Delta x = 1/6$, and $N_\Delta (2)=[1/6,2/6)$.
# Naturally, this
# pattern holds for all the other die outcomes in
# $\left\{1,2,3,..,6\right\}$.
# Let's consider a quick simulation to make this
# concrete. The following code
# generates uniform random samples and stacks them
# in a Pandas dataframe.
# In[2]:
import pandas as pd
import numpy as np
from pandas import DataFrame
u= np.random.rand(100)
df = DataFrame(data=u,columns=['u'])
# The next block uses `pd.cut` to map the individual samples to
# the set
# $\left\{1,2,\ldots,6\right\}$ labeled `v`.
# In[3]:
labels = [1,2,3,4,5,6]
df['v']=pd.cut(df.u,np.linspace(0,1,7),
include_lowest=True,labels=labels)
# This is what the dataframe contains. The `v` column contains
# the samples drawn
# from the fair die.
# In[4]:
df.head()
# The following is a count of the number of samples in each group. There
# should
# be roughly the same number of samples in each group because the die is fair.
# In[5]:
df.groupby('v').count()
# So far, so good. We now have a way to simulate a fair
# die from a uniformly
# distributed random variable.
#
# To extend this to unfair die, we need only make
# some small adjustments to this
# code. For example, suppose that we want an
# unfair die so that
# $\mathbb{P}(1)=\mathbb{P}(2)=\mathbb{P}(3)=1/12$ and
# $\mathbb{P}(4)=\mathbb{P}(5)=\mathbb{P}(6)=1/4$. The only change we have to
# make
# is with `pd.cut` as follows,
# In[6]:
df['v']=pd.cut(df.u,[0,1/12,2/12,3/12,2/4,3/4,1],
include_lowest=True,labels=labels)
df.groupby('v').count()/df.shape[0]
# where now these are the individual probabilities of each digit. You
# can take
# more than `100` samples to get a clearer view of the individual
# probabilities
# but the mechanism for generating them is the same. The method is
# called the
# inverse CDF [^CDF] method because the CDF
# (namely,$\texttt{[0,1/12,2/12,3/12,2/4,3/4,1]}$) in the last example has been
# inverted (using the `pd.cut` method) to generate the samples.
# The inversion is
# easier to see for continuous variables, which we consider
# next.
#
# [^CDF]:
# Cumulative density function. Namely, $F(x)=\mathbb{P}(X < x)$.
#
#
# ## Inverse CDF
# Method for Continuous Variables
#
# The method above applies to continuous random
# variables, but now we have to
# squeeze the intervals down to individual points.
# In the example above, our
# inverse function was a piecewise function that
# operated on uniform random
# samples. In this case, the piecewise function
# collapses to a continuous inverse
# function. We want to generate random samples
# for a CDF that is invertible.
# As before, the criterion for generating an
# appropriate sample $v$ is the
# following,
#
# $$
# \mathbb{P}(F(x) < v < F(x+\Delta x)) = F(x+\Delta x) - F(x) =
# \int_x^{x+\Delta x} f(u) du \approx f(x) \Delta x
# $$
#
# which says that the probability that the sample $v$ is contained in a
# $\Delta
# x$ interval is approximately equal to $f(x) \Delta x$, at that point.
# Once
# again, the trick is to use a uniform random sample $u$ and an invertible
# CDF
# $F(x)$ to construct these samples. Note that for a uniform random variable
# $u
# \sim \mathcal{U}[0,1]$, we have,
#
# $$
# \begin{align*}
# \mathbb{P}(x < F^{-1}(u) < x+\Delta x) & = \mathbb{P}(F(x) < u
# < F(x+\Delta x)) \\\
# & = F(x+\Delta x) -
# F(x) \\\
# & = \int_x^{x+\Delta x} f(p) dp
# \approx f(x) \Delta x
# \end{align*}
# $$
#
# This means that $ v=F^{-1}(u) $ is distributed according to $f(x)$,
# which is
# what we want.
#
# Let's try this to generate samples from the
# exponential
# distribution,
#
# $$
# f_{\alpha}(x) = \alpha e^{ -\alpha x }
# $$
#
# which has the following CDF,
#
# $$
# F(x) = 1-e^{ -\alpha x }
# $$
#
# and corresponding inverse,
#
# $$
# F^{-1}(u) = \frac{1}{\alpha}\ln \frac{1}{(1-u)}
# $$
#
# Now, all we have to do is generate some uniformly distributed
# random samples
# and then feed them into $F^{-1}$.
# In[7]:
from numpy import array, log
import scipy.stats
alpha = 1. # distribution parameter
nsamp = 1000 # num of samples
# define uniform random variable
u=scipy.stats.uniform(0,1)
# define inverse function
Finv=lambda u: 1/alpha*log(1/(1-u))
# apply inverse function to samples
v = array(list(map(Finv,u.rvs(nsamp))))
# Now, we have the samples from the exponential distribution, but how
# do we know
# the method is correct with samples distributed accordingly?
# Fortunately,
# `scipy.stats` already has a exponential distribution, so we can
# check our work
# against the reference using a *probability plot* (i.e., also
# known as a
# *quantile-quantile* plot). The following code sets up the
# probability plot from
# `scipy.stats`.
# In[8]:
get_ipython().run_line_magic('matplotlib', 'inline')
from matplotlib.pylab import setp, subplots
fig,ax = subplots()
fig.set_size_inches((7,5))
_=scipy.stats.probplot(v,(1,),dist='expon',plot=ax)
line=ax.get_lines()[0]
_=setp(line,'color','k')
_=setp(line,'alpha',.1)
line=ax.get_lines()[1]
_=setp(line,'color','gray')
_=setp(line,'lw',3.0)
_=setp(ax.yaxis.get_label(),'fontsize',18)
_=setp(ax.xaxis.get_label(),'fontsize',18)
_=ax.set_title('Probability Plot',fontsize=18)
_=ax.grid()
fig.tight_layout()
fig.savefig('fig-probability/Sampling_Monte_Carlo_005.png')
# In[9]:
fig,ax=subplots()
scipy.stats.probplot(v,(1,),dist='expon',plot=ax)
# Note that we have to supply an axes object (`ax`) for it to draw on.
# The result
# is [Figure](#fig:Sampling_Monte_Carlo_005). The more the samples
# line match the
# diagonal line, the more they match the reference distribution
# (i.e., exponential
# distribution in this case). You may also want to try
# `dist=norm` in the code
# above To see what happens when the normal distribution
# is the reference
# distribution.
#
#
#
#
#
# The samples created using the
# inverse cdf method match the exponential reference distribution.
#
#
#
#
#
# ## Rejection Method
#
# In some cases, inverting the CDF may be impossible.
# The *rejection*
# method can handle this situation. The idea is to pick two
# uniform random
# variables $u_1,u_2 \sim \mathcal{U}[a,b]$ so that
#
# $$
# \mathbb{P}\left(u_1 \in N_{\Delta}(x) \bigwedge u_2 < \frac{f(u_1)}{M}
# \right) \hspace{0.5em} \approx \frac{\Delta x}{b-a} \frac{f(u_1)}{M}
# $$
#
# where we take $x=u_1$ and $f(x) < M $. This is a two-step process.
# First, draw
# $u_1$ uniformly from the interval $[a,b]$. Second, feed it into
# $f(x)$ and if
# $u_2 < f(u_1)/M$, then you have a valid sample for $f(x)$. Thus,
# $u_1$ is the
# proposed sample from $f$ that may or may not be rejected depending
# on $u_2$. The
# only job of the $M$ constant is to scale down the $f(x)$ so that
# the $u_2$
# variable can span the range. The *efficiency* of this method is the
# probability
# of accepting $u_1$ which comes from integrating out the above
# approximation,
#
# $$
# \int \frac{f(x)}{M(b-a)} dx = \frac{1}{M(b-a)} \int f(x)dx =\frac{1}{M(b-a)}
# $$
#
# This means that we don't want an unecessarily large $M$ because that
# makes it
# more likely that samples will be discarded.
#
# Let's try this method for a
# density that does not have a continuous inverse [^normalization].
# [^normalization]: Note that this example density does not *exactly* integrate
# out to one like a probability density function should, but the normalization
# constant for this is distracting for our purposes here.
#
# $$
# f(x) = \exp\left(-\frac{(x-1)^2}{2x} \right) (x+1)/12
# $$
#
# where $x>0$. The following code implements the rejection plan.
# In[10]:
import numpy as np
x = np.linspace(0.001,15,100)
f= lambda x: np.exp(-(x-1)**2/2./x)*(x+1)/12.
fx = f(x)
M=0.3 # scale factor
u1 = np.random.rand(10000)*15 # uniform random samples scaled out
u2 = np.random.rand(10000) # uniform random samples
idx,= np.where(u2<=f(u1)/M) # rejection criterion
v = u1[idx]
# In[11]:
fig,ax=subplots()
fig.set_size_inches((9,5))
_=ax.hist(v,density=True,bins=40,alpha=.3,color='gray')
_=ax.plot(x,fx,'k',lw=3.,label='$f(x)$')
_=ax.set_title('Estimated Efficency=%3.1f%%'%(100*len(v)/len(u1)),
fontsize=18)
_=ax.legend(fontsize=18)
_=ax.set_xlabel('$x$',fontsize=24)
fig.tight_layout()
fig.savefig('fig-probability/Sampling_Monte_Carlo_007.png')
# [Figure](#fig:Sampling_Monte_Carlo_007) shows a histogram of the
# so-generated
# samples that nicely fits the probability density function. The
# title in the
# figure shows the efficiency (the number of rejected samples),
# which is poor. It
# means that we threw away most of the proposed samples. Thus,
# even though there
# is nothing conceptually wrong with this result, the low
# efficiency must be
# fixed, as a practical matter. [Figure](#fig:Sampling_Monte_Carlo_008) shows
# where the proposed samples were
# rejected. Samples under the curve were retained
# (i.e., $u_2 <
# \frac{f(u_1)}{M}$) but the vast majority of the samples are
# outside this
# umbrella.
#
#
#
#
#
# The rejection method generate
# samples in the histogram that nicely match the target distribution.
# Unfortunately, the efficiency is not so good.
#
#
#
# In[12]:
fig,ax=subplots()
fig.set_size_inches((9,5))
_=ax.plot(u1,u2,'+',label='rejected',alpha=.3,color='gray')
_=ax.plot(u1[idx],u2[idx],'.',label='accepted',alpha=.3,color='k')
_=ax.legend(fontsize=22)
fig.tight_layout()
fig.savefig('fig-probability/Sampling_Monte_Carlo_008.png')
#
#
#
#
# The proposed samples under the
# curve were accepted and the others were not. This shows the majority of samples
# were rejected.
#
#
#
#
#
# The rejection method uses $u_1$ to select
# along the domain of $f(x)$ and the
# other $u_2$ uniform random variable decides
# whether to accept or not. One idea
# would be to choose $u_1$ so that $x$ values
# are coincidentally those that are
# near the peak of $f(x)$, instead of uniformly
# anywhere in the domain,
# especially near the tails, which are low probability
# anyway. Now, the trick is
# to find a new density function $g(x)$ to sample from
# that has a similiar
# concentration of probability density. One way it to
# familiarize oneself with
# the probability density functions that have adjustable
# parameters and fast random
# sample generators already. There are lots of places
# to look and, chances are,
# there is likely already such a generator for your
# problem. Otherwise, the
# family of $\beta$ densities is a good place to start.
# To be explicit, what we want is $u_1 \sim g(x)$ so that, returning to our
# earlier argument,
#
# $$
# \mathbb{P}\left( u_1 \in N_{\Delta}(x) \bigwedge u_2 < \frac{f(u_1)}{M}
# \right) \approx g(x) \Delta x \frac{f(u_1)}{M}
# $$
#
# but this is *not* what we need here. The problem is with the
# second part of the
# logical $\bigwedge$ conjunction. We need to put
# something there that will give
# us something proportional to $f(x)$.
# Let us define the following,
#
#
#
#
# $$
# \begin{equation}
# h(x) = \frac{f(x)}{g(x)}
# \end{equation}
# \label{eq:rej01}
# \tag{2}
# $$
#
# with corresponding maximum on the domain as $h_{\max}$ and
# then go back and
# construct the second part of the clause as
#
# $$
# \mathbb{P}\left(u_1 \in N_{\Delta}(x) \bigwedge u_2 < \frac{h(u_1)}{h_{\max}}
# \right) \approx g(x) \Delta x \frac{h(u_1)}{h_{\max}} = f(x)/h_{\max}
# $$
#
# Recall that satisfying this criterion means that $u_1=x$. As before,
# we can
# estimate the probability of acceptance of the $u_1$ as $1/h_{\max}$.
#
# Now, how
# to construct the $g(x)$ function in the denominator of Equation
# [2](#eq:rej01)?
# Here's where familarity with some standard probability densities
# pays off. For
# this case, we choose the $\chi^2$ distribution. The following
# plots the $g(x)$
# and $f(x)$ (left plot) and the corresponding $h(x)=f(x)/g(x)$
# (right plot). Note
# that $g(x)$ and $f(x)$ have peaks that almost coincide,
# which is what we are
# looking for.
# In[13]:
ch=scipy.stats.chi2(4) # chi-squared
h = lambda x: f(x)/ch.pdf(x) # h-function
# In[14]:
fig,axs=subplots(1,2,sharex=True)
fig.set_size_inches(12,4)
_=axs[0].plot(x,fx,label='$f(x)$',color='k')
_=axs[0].plot(x,ch.pdf(x),'--',lw=2,label='$g(x)$',color='gray')
_=axs[0].legend(loc=0,fontsize=24)
_=axs[0].set_xlabel(r'$x$',fontsize=22)
_=axs[1].plot(x,h(x),'-k',lw=3)
_=axs[1].set_title('$h(x)=f(x)/g(x)$',fontsize=24)
_=axs[1].set_xlabel(r'$x$',fontsize=22)
fig.tight_layout()
fig.savefig('fig-probability/Sampling_Monte_Carlo_009.png')
#
#
#
#
# The plot on the right shows
# $h(x)=f(x)/g(x)$ and the one on the left shows $f(x)$ and $g(x)$ separately.
#
#
#
#
#
# Now, let's generate some samples from this $\chi^2$
# distribution
# with the rejection method.
# In[15]:
hmax=h(x).max()
u1 = ch.rvs(5000) # samples from chi-square distribution
u2 = np.random.rand(5000)# uniform random samples
idx = (u2 <= h(u1)/hmax) # rejection criterion
v = u1[idx] # keep these only
# In[16]:
fig,ax=subplots()
fig.set_size_inches((7,3))
_=ax.hist(v,density=True,bins=40,alpha=.3,color='gray')
_=ax.plot(x,fx,color='k',lw=3.,label='$f(x)$')
_=ax.set_title('Estimated Efficency=%3.1f%%'%(100*len(v)/len(u1)))
_=ax.axis(xmax=15)
_=ax.legend(fontsize=18)
fig.savefig('fig-probability/Sampling_Monte_Carlo_010.png')
#
#
#
#
# Using the updated method, the
# histogram matches the target probability density function with high
# efficiency.
#
#
#
#
#
# Using the $\chi^2$ distribution with the
# rejection method results in throwing
# away less than 10% of the generated samples
# compared with our prior example
# where we threw out at least 80%. This is
# dramatically more
# efficient! [Figure](#fig:Sampling_Monte_Carlo_010) shows that
# the histogram
# and the probability density function match. For completeness,
# [Figure](#fig:Sampling_Monte_Carlo_011) shows the samples with the corresponding
# threshold $h(x)/h_{\max}$ that was used to select them.
# In[17]:
fig,ax=subplots()
fig.set_size_inches((7,4))
_=ax.plot(u1,u2,'+',label='rejected',alpha=.3,color='gray')
_=ax.plot(u1[idx],u2[idx],'g.',label='accepted',alpha=.3,color='k')
_=ax.plot(x,h(x)/hmax,color='k',lw=3.,label='$h(x)$')
_=ax.legend(fontsize=16,loc=0)
_=ax.set_xlabel('$x$',fontsize=24)
_=ax.set_xlabel('$h(x)$',fontsize=24)
_=ax.axis(xmax=15,ymax=1.1)
fig.tight_layout()
fig.savefig('fig-probability/Sampling_Monte_Carlo_011.png')
#
#
#
#
# Fewer proposed points were
# rejected in this case, which means better efficiency.
#
#
#
#
#
# #
# Sampling Importance Resampling
#
# An alternative to the Rejection Method that does
# not involve rejecting samples
# or coming up with $M$ bounds or bounding functions
# is the Sampling Importance
# Resampling (SIR) method. Choose a tractable $g$
# probability density function
# and draw a $n$ samples from it, $\lbrace x_i
# \rbrace_{i=1}^n$. Our objective is
# to derive samples $f$. Next, compute the
# following,
#
# $$
# q_i = \frac{w_i}{\sum w_i}
# $$
#
# where
#
# $$
# w_i = \frac{f(x_i)}{g(x_i)}
# $$
#
# The $q_i$ define a probability mass function whose samples
# approximate samples
# from $f$. To see this, consider,
#
# $$
# \begin{align*}
# \mathbb{P}(X\le a) =& \sum_{i=1}^n q_i
# \mathbb{I}_{(-\infty,a]}(x_i) \\
# =& \frac{\sum_{i=1}^n w_i
# \mathbb{I}_{(-\infty,a]}(x_i)}{\sum_{i=1}^n w_i} \\
# =&
# \frac{\frac{1}{n}\sum_{i=1}^n \frac{f(x_i)}{g(x_i)}
# \mathbb{I}_{(-\infty,a]}(x_i)}{\frac{1}{n}\sum_{i=1}^n \frac{f(x_i)}{g(x_i)}}
# \end{align*}
# $$
#
# Because the samples are generated from the $g$ probability distribution,
# the
# numerator is approximately,
#
# $$
# \mathbb{E}_g\left(\frac{f(x)}{g(x)}\right) = \int_{-\infty}^a f(x) dx
# $$
#
# which gives
#
# $$
# \mathbb{P}(X\le a) = \int_{-\infty}^a f(x) dx
# $$
#
# which shows that the samples generated this way are $f$-distributed.
# Note more
# samples have to be generated from this probability mass function the
# further
# away $g$ is from the desired function $f$. Further, because there is no
# rejection step, we no longer have the issue of efficiency.
#
# For example, let us
# choose a beta distribution for $g$, as in the following code,
# In[18]:
g = scipy.stats.beta(2,3)
# This distribution does not bear a strong resemblance to our desired
# $f$
# function from last section. as shown in the
# [Figure](#fig:Sampling_Monte_Carlo_012) below. Note that we scaled the domain
# of the
# beta distribution to get it close to the support of $f$.
# In[19]:
fig,ax=subplots(figsize=[10,5])
g = scipy.stats.beta(2,3)
_=ax.plot(x,fx,label='$f(x)$',color='k')
_=ax.plot(np.linspace(0,1,100)*15,1/15*g.pdf(np.linspace(0,1,100)),label='$g(x)$',color='r')
_=ax.legend(fontsize=16)
_=ax.set_xlabel('$x$',fontsize=18)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.savefig('fig-probability/Sampling_Monte_Carlo_012.png')
#
#
#
# Histogram of samples generated using SIR comparted to target probability
# density function.
#
#
#
#
#
# In the next block, we sample from the $g$
# distribution and compute
# the weights as described above. The final step is to
# sample from this new
# probability mass function. The resulting normalized
# histogram is shown
# compared to the target $f$ probability density function in
# [Figure](#fig:Sampling_Monte_Carlo_013).
# In[20]:
xi = g.rvs(500)
w = np.array([f(i*15)/g.pdf(i) for i in xi])
fsamples=np.random.choice(xi*15,5000,p = w/w.sum())
# In[21]:
fig,ax=subplots(figsize=[10,5])
_=ax.hist(fsamples,rwidth=.8,density=True,bins=20,color='gray')
_=ax.plot(x,fx,label='$f(x)$',color='k')
_=ax.legend(fontsize=14)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.savefig('fig-probability/Sampling_Monte_Carlo_013.png')
#
#
#
#
# Histogram and probability density
# function using SIR.
#
#
#
#
#
#
# In this section, we investigated how to
# generate random samples from a given
# distribution, beit discrete or continuous.
# For the continuous case, the key
# issue was whether or not the cumulative density
# function had a continuous
# inverse. If not, we had to turn to the rejection
# method, and find an
# appropriate related density that we could easily sample from
# to use as part of
# a rejection threshold. Finding such a function is an art, but
# many families of
# probability densities have been studied over the years that
# already have fast
# random number generators.
#
# The rejection method has many
# complicated extensions that involve careful
# partitioning of the domains and lots
# of special methods for corner cases.
# Nonetheless, all of these advanced
# techniques are still variations on the same
# fundamental theme we illustrated
# here [[dunn2011exploring]](#dunn2011exploring),
# [[johnson1995continuous]](#johnson1995continuous).