#!/usr/bin/env python # coding: utf-8 # In[1]: from IPython.display import Image Image('../../Python_probability_statistics_machine_learning_2E.png',width=200) # # Projection Methods #

# # The concept of # projection is key to developing an intuition about conditional # probability. We # already have a natural intuition of projection from looking at # the shadows of # objects on a sunny day. As we will see, this simple idea # consolidates many # abstract ideas in optimization and mathematics. Consider # [Figure](#fig:probability_001) where we want to find a point along the blue # line # (namely, $\mathbf{x}$) that is closest to the black square (namely, # $\mathbf{y}$). In other words, we want to inflate the gray circle until it just # touches the black line. Recall that the circle boundary is the set of points for # which # # $$ # \sqrt{(\mathbf{y}-\mathbf{x})^T(\mathbf{y}-\mathbf{x})} # =\|\mathbf{y}-\mathbf{x} \| = \epsilon # $$ # # for some value of $\epsilon$. So we want a point $\mathbf{x}$ along # the line # that satisfies this for the smallest $\epsilon$. Then, that point # will be the # closest point on the black line to the black square. # It may be obvious from the # diagram, but the closest point on the line # occurs where the line segment from # the black square to the black line is # perpedicular to the line. At this point, # the gray circle just touches the black # line. This is illustrated below in # [Figure](#fig:probability_002). # # # #

# #

Given the point $\mathbf{y}$ (black # square) we want to find the $\mathbf{x}$ along the line that is closest to it. # The gray circle is the locus of points within a fixed distance from # $\mathbf{y}$.

# # # # **Programming Tip.** # # [Figure](#fig:probability_001) uses # the `matplotlib.patches` module. This # module contains primitive shapes like # circles, ellipses, and rectangles that # can be assembled into complex graphics. # After importing a particular shape, you # can apply that shape to an existing axis # using the `add_patch` method. The # patches themselves can by styled using the # usual formatting keywords like # `color` and `alpha`. # # # # # #

# #

The closest point on the line occurs when # the line is tangent to the circle. When this happens, the black line and the # line (minimum distance) are perpedicular.

# # # # # Now that we # can see what's going on, we can construct the the solution # analytically. We can # represent an arbitrary point along the black line as: # # $$ # \mathbf{x}=\alpha\mathbf{v} # $$ # # where $\alpha\in\mathbb{R}$ slides the point up and down the line with # # $$ # \mathbf{v} = \left[ 1,1 \right]^T # $$ # # Formally, $\mathbf{v}$ is the *subspace* onto which we want to # *project* # $\mathbf{y}$. At the closest point, the vector between # $\mathbf{y}$ and # $\mathbf{x}$ (the *error* vector above) is # perpedicular to the line. This means # that # # $$ # (\mathbf{y}-\mathbf{x} )^T \mathbf{v} = 0 # $$ # # and by substituting and working out the terms, we obtain # # $$ # \alpha = \frac{\mathbf{y}^T\mathbf{v}}{ \|\mathbf{v} \|^2} # $$ # # The *error* is the distance between $\alpha\mathbf{v}$ and $ # \mathbf{y}$. This # is a right triangle, and we can use the Pythagorean # theorem to compute the # squared length of this error as # # $$ # \epsilon^2 = \|( \mathbf{y}-\mathbf{x} )\|^2 = \|\mathbf{y}\|^2 - \alpha^2 # \|\mathbf{v}\|^2 = \|\mathbf{y}\|^2 - # \frac{\|\mathbf{y}^T\mathbf{v}\|^2}{\|\mathbf{v}\|^2} # $$ # # where $ \|\mathbf{v}\|^2 = \mathbf{v}^T \mathbf{v} $. Note that since # $\epsilon^2 \ge 0 $, this also shows that # # $$ # \| \mathbf{y}^T\mathbf{v}\| \le \|\mathbf{y}\| \|\mathbf{v}\| # $$ # # which is the famous and useful Cauchy-Schwarz inequality which we # will exploit # later. Finally, we can assemble all of this into the *projection* # operator # # $$ # \mathbf{P}_v = \frac{1}{\|\mathbf{v}\|^2 } \mathbf{v v}^T # $$ # # With this operator, we can take any $\mathbf{y}$ and find the closest # point on # $\mathbf{v}$ by doing # # $$ # \mathbf{P}_v \mathbf{y} = \mathbf{v} \left( \frac{ \mathbf{v}^T \mathbf{y} # }{\|\mathbf{v}\|^2} \right) # $$ # # where we recognize the term in parenthesis as the $\alpha$ we # computed earlier. # It's called an *operator* because it takes a vector # ($\mathbf{y}$) and produces # another vector ($\alpha\mathbf{v}$). Thus, # projection unifies geometry and # optimization. # # ## Weighted distance # # We can easily extend this projection # operator to cases where the measure of # distance between $\mathbf{y}$ and the # subspace $\mathbf{v}$ is weighted. We can # accommodate these weighted distances # by re-writing the projection operator as # # #

# # $$ # \begin{equation} # \mathbf{P}_v=\mathbf{v}\frac{\mathbf{v}^T\mathbf{Q}^T}{\mathbf{v}^T\mathbf{Q v}} # \end{equation} # \label{eq:weightedProj} \tag{1} # $$ # # where $\mathbf{Q}$ is positive definite matrix. In the previous # case, we # started with a point $\mathbf{y}$ and inflated a circle centered at # $\mathbf{y}$ # until it just touched the line defined by $\mathbf{v}$ and this # point was # closest point on the line to $\mathbf{y}$. The same thing happens # in the general # case with a weighted distance except now we inflate an # ellipse, not a circle, # until the ellipse touches the line. # # # # # #

# #

In the weighted # case, the closest point on the line is tangent to the ellipse and is still # perpedicular in the sense of the weighted distance.

# # # # # Note that the # error vector ($\mathbf{y}-\alpha\mathbf{v}$) in [Figure](#fig:probability_003) # is still perpendicular to the line (subspace # $\mathbf{v}$), but in the space of # the weighted distance. The difference # between the first projection (with the # uniform circular distance) and the # general case (with the elliptical weighted # distance) is the inner product # between the two cases. For example, in the first # case we have $\mathbf{y}^T # \mathbf{v}$ and in the weighted case we have # $\mathbf{y}^T \mathbf{Q}^T # \mathbf{v}$. To move from the uniform circular case # to the weighted ellipsoidal # case, all we had to do was change all of the vector # inner products. Before we # finish, we need a formal property of projections: # # $$ # \mathbf{P}_v \mathbf{P}_v = \mathbf{P}_v # $$ # # known as the *idempotent* property which basically says that once we # have # projected onto a subspace, subsequent projections leave us in the # same subspace. # You can verify this by computing Equation [1](#eq:weightedProj). # # Thus, # projection ties a minimization problem (closest point to a line) to an # algebraic # concept (inner product). It turns out that these same geometric ideas # from # linear algebra [[strang2006linear]](#strang2006linear) can be translated to the # conditional # expectation. How this works is the subject of our next section.