#!/usr/bin/env python
# coding: utf-8
# In[1]:
from IPython.display import Image
Image('../../Python_probability_statistics_machine_learning_2E.png',width=200)
# # Projection Methods
#
#
# The concept of
# projection is key to developing an intuition about conditional
# probability. We
# already have a natural intuition of projection from looking at
# the shadows of
# objects on a sunny day. As we will see, this simple idea
# consolidates many
# abstract ideas in optimization and mathematics. Consider
# [Figure](#fig:probability_001) where we want to find a point along the blue
# line
# (namely, $\mathbf{x}$) that is closest to the black square (namely,
# $\mathbf{y}$). In other words, we want to inflate the gray circle until it just
# touches the black line. Recall that the circle boundary is the set of points for
# which
#
# $$
# \sqrt{(\mathbf{y}-\mathbf{x})^T(\mathbf{y}-\mathbf{x})}
# =\|\mathbf{y}-\mathbf{x} \| = \epsilon
# $$
#
# for some value of $\epsilon$. So we want a point $\mathbf{x}$ along
# the line
# that satisfies this for the smallest $\epsilon$. Then, that point
# will be the
# closest point on the black line to the black square.
# It may be obvious from the
# diagram, but the closest point on the line
# occurs where the line segment from
# the black square to the black line is
# perpedicular to the line. At this point,
# the gray circle just touches the black
# line. This is illustrated below in
# [Figure](#fig:probability_002).
#
#
#
#
#
# Given the point $\mathbf{y}$ (black
# square) we want to find the $\mathbf{x}$ along the line that is closest to it.
# The gray circle is the locus of points within a fixed distance from
# $\mathbf{y}$.
#
#
#
#
# **Programming Tip.**
#
# [Figure](#fig:probability_001) uses
# the `matplotlib.patches` module. This
# module contains primitive shapes like
# circles, ellipses, and rectangles that
# can be assembled into complex graphics.
# After importing a particular shape, you
# can apply that shape to an existing axis
# using the `add_patch` method. The
# patches themselves can by styled using the
# usual formatting keywords like
# `color` and `alpha`.
#
#
#
#
#
#
#
# The closest point on the line occurs when
# the line is tangent to the circle. When this happens, the black line and the
# line (minimum distance) are perpedicular.
#
#
#
#
#
# Now that we
# can see what's going on, we can construct the the solution
# analytically. We can
# represent an arbitrary point along the black line as:
#
# $$
# \mathbf{x}=\alpha\mathbf{v}
# $$
#
# where $\alpha\in\mathbb{R}$ slides the point up and down the line with
#
# $$
# \mathbf{v} = \left[ 1,1 \right]^T
# $$
#
# Formally, $\mathbf{v}$ is the *subspace* onto which we want to
# *project*
# $\mathbf{y}$. At the closest point, the vector between
# $\mathbf{y}$ and
# $\mathbf{x}$ (the *error* vector above) is
# perpedicular to the line. This means
# that
#
# $$
# (\mathbf{y}-\mathbf{x} )^T \mathbf{v} = 0
# $$
#
# and by substituting and working out the terms, we obtain
#
# $$
# \alpha = \frac{\mathbf{y}^T\mathbf{v}}{ \|\mathbf{v} \|^2}
# $$
#
# The *error* is the distance between $\alpha\mathbf{v}$ and $
# \mathbf{y}$. This
# is a right triangle, and we can use the Pythagorean
# theorem to compute the
# squared length of this error as
#
# $$
# \epsilon^2 = \|( \mathbf{y}-\mathbf{x} )\|^2 = \|\mathbf{y}\|^2 - \alpha^2
# \|\mathbf{v}\|^2 = \|\mathbf{y}\|^2 -
# \frac{\|\mathbf{y}^T\mathbf{v}\|^2}{\|\mathbf{v}\|^2}
# $$
#
# where $ \|\mathbf{v}\|^2 = \mathbf{v}^T \mathbf{v} $. Note that since
# $\epsilon^2 \ge 0 $, this also shows that
#
# $$
# \| \mathbf{y}^T\mathbf{v}\| \le \|\mathbf{y}\| \|\mathbf{v}\|
# $$
#
# which is the famous and useful Cauchy-Schwarz inequality which we
# will exploit
# later. Finally, we can assemble all of this into the *projection*
# operator
#
# $$
# \mathbf{P}_v = \frac{1}{\|\mathbf{v}\|^2 } \mathbf{v v}^T
# $$
#
# With this operator, we can take any $\mathbf{y}$ and find the closest
# point on
# $\mathbf{v}$ by doing
#
# $$
# \mathbf{P}_v \mathbf{y} = \mathbf{v} \left( \frac{ \mathbf{v}^T \mathbf{y}
# }{\|\mathbf{v}\|^2} \right)
# $$
#
# where we recognize the term in parenthesis as the $\alpha$ we
# computed earlier.
# It's called an *operator* because it takes a vector
# ($\mathbf{y}$) and produces
# another vector ($\alpha\mathbf{v}$). Thus,
# projection unifies geometry and
# optimization.
#
# ## Weighted distance
#
# We can easily extend this projection
# operator to cases where the measure of
# distance between $\mathbf{y}$ and the
# subspace $\mathbf{v}$ is weighted. We can
# accommodate these weighted distances
# by re-writing the projection operator as
#
#
#
#
# $$
# \begin{equation}
# \mathbf{P}_v=\mathbf{v}\frac{\mathbf{v}^T\mathbf{Q}^T}{\mathbf{v}^T\mathbf{Q v}}
# \end{equation}
# \label{eq:weightedProj} \tag{1}
# $$
#
# where $\mathbf{Q}$ is positive definite matrix. In the previous
# case, we
# started with a point $\mathbf{y}$ and inflated a circle centered at
# $\mathbf{y}$
# until it just touched the line defined by $\mathbf{v}$ and this
# point was
# closest point on the line to $\mathbf{y}$. The same thing happens
# in the general
# case with a weighted distance except now we inflate an
# ellipse, not a circle,
# until the ellipse touches the line.
#
#
#
#
#
#
#
# In the weighted
# case, the closest point on the line is tangent to the ellipse and is still
# perpedicular in the sense of the weighted distance.
#
#
#
#
#
# Note that the
# error vector ($\mathbf{y}-\alpha\mathbf{v}$) in [Figure](#fig:probability_003)
# is still perpendicular to the line (subspace
# $\mathbf{v}$), but in the space of
# the weighted distance. The difference
# between the first projection (with the
# uniform circular distance) and the
# general case (with the elliptical weighted
# distance) is the inner product
# between the two cases. For example, in the first
# case we have $\mathbf{y}^T
# \mathbf{v}$ and in the weighted case we have
# $\mathbf{y}^T \mathbf{Q}^T
# \mathbf{v}$. To move from the uniform circular case
# to the weighted ellipsoidal
# case, all we had to do was change all of the vector
# inner products. Before we
# finish, we need a formal property of projections:
#
# $$
# \mathbf{P}_v \mathbf{P}_v = \mathbf{P}_v
# $$
#
# known as the *idempotent* property which basically says that once we
# have
# projected onto a subspace, subsequent projections leave us in the
# same subspace.
# You can verify this by computing Equation [1](#eq:weightedProj).
#
# Thus,
# projection ties a minimization problem (closest point to a line) to an
# algebraic
# concept (inner product). It turns out that these same geometric ideas
# from
# linear algebra [[strang2006linear]](#strang2006linear) can be translated to the
# conditional
# expectation. How this works is the subject of our next section.