%autosave 10
Autosaving every 10 seconds
%matplotlib inline
from matplotlib import pyplot
import numpy
Suppose you observe $m$ data points $(x^{(i)}, y^{(i)})$ and you develop the hypothesis that these data points were generated by the following model:
$$h_{\theta}(x) = <\theta,x> = \theta_0 + \theta_1 x_1$$A mathematically tractable measure of how well your model reproduces the observed data is the following cost function
$$J(\theta) = \frac{1}{2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2$$We now wish to choose the best hypothesis from our pool of hypotheses $h_{\theta}$ as measured by our cost function $J$. To do so we need to modify parameter values $\theta$ so that $J$ assumes its global minimum.
Given the fact that we chose $J$ as a quadratic function we know that there must be one unique minimum (this type of function has a special name ...).
One way to approximate the minimum of $J$ in $\theta$-space numerically is to start in a random selected point in $\theta$-space and then take small successive steps that always move us in the direction of greatest change (the gradient).
The gradient of $J$ (with respect to $\theta$) is
\begin{equation} \nabla J = \frac{1}{m} \sum_{i=1}^m \begin{pmatrix} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) \\ \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_1^{(i)} \end{pmatrix}. \end{equation}This now allows us to take small successive steps in the direction of $\nabla J$ and continue doing so until $\nabla J = (0, 0)$ or is numerically close to the zero vector:
$$\theta^{k+1} = \theta^k - \alpha \nabla J.$$import csv
data = []
with open('data/ex1data1.txt', 'r') as f:
data_reader = csv.reader(f, delimiter=',')
for row in data_reader:
data.append((float(row[0]), float(row[1])))
data = numpy.asarray(data)
data[:5]
array([[ 6.1101, 17.592 ], [ 5.5277, 9.1302], [ 8.5186, 13.662 ], [ 7.0032, 11.854 ], [ 5.8598, 6.8233]])
pyplot.scatter(data[:,0], data[:,1])
<matplotlib.collections.PathCollection at 0x44e4650>
theta = numpy.zeros(shape=(2,))
alpha = 0.001
m = len(data[:,0])
h = lambda theta, x: numpy.add(theta[0], theta[1]*x)
J = lambda theta: (1./(2.*m))*numpy.sum(numpy.power(h(theta,data[:,0])-data[:,1], 2))
feature_vector = numpy.column_stack((numpy.ones(shape=(m,)), data[:,0]))
nabla_J = lambda theta: (1./m)*numpy.sum(numpy.column_stack((h(theta,data[:,0])-data[:,1],h(theta,data[:,0])-data[:,1]))*feature_vector)
feature_vector[:5]
array([[ 1. , 6.1101], [ 1. , 5.5277], [ 1. , 8.5186], [ 1. , 7.0032], [ 1. , 5.8598]])
print('Before GD, J = %g' % J(theta))
for i in range(10000):
theta = theta - alpha*nabla_J(theta)
print('After GD, J = %g' % J(theta))
Before GD, J = 32.0727 After GD, J = 6.42089
theta
array([ 0.72088159, 0.72088159])