In [1]:

%autosave 10

Autosaving every 10 seconds

In [2]:

%matplotlib inline
from matplotlib import pyplot

In [3]:

import numpy

Linear Regression With One Variable¶

Problem¶

Suppose you observe $m$ data points $(x^{(i)}, y^{(i)})$ and you develop the hypothesis that these data points were generated by the following model:

$$h_{\theta}(x) = <\theta,x> = \theta_0 + \theta_1 x_1$$

A mathematically tractable measure of how well your model reproduces the observed data is the following cost function

$$J(\theta) = \frac{1}{2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2$$

We now wish to choose the best hypothesis from our pool of hypotheses $h_{\theta}$ as measured by our cost function $J$. To do so we need to modify parameter values $\theta$ so that $J$ assumes its global minimum.

Given the fact that we chose $J$ as a quadratic function we know that there must be one unique minimum (this type of function has a special name ...).

Gradient Descent¶

One way to approximate the minimum of $J$ in $\theta$-space numerically is to start in a random selected point in $\theta$-space and then take small successive steps that always move us in the direction of greatest change (the gradient).

The gradient of $J$ (with respect to $\theta$) is

\begin{equation} \nabla J = \frac{1}{m} \sum_{i=1}^m \begin{pmatrix} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) \\ \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_1^{(i)} \end{pmatrix}. \end{equation}

This now allows us to take small successive steps in the direction of $\nabla J$ and continue doing so until $\nabla J = (0, 0)$ or is numerically close to the zero vector:

$$\theta^{k+1} = \theta^k - \alpha \nabla J.$$

In [4]:

import csv

In [5]:

data = []
with open('data/ex1data1.txt', 'r') as f:
    data_reader = csv.reader(f, delimiter=',')
    for row in data_reader:
        data.append((float(row[0]), float(row[1])))
data = numpy.asarray(data)

In [6]:

data[:5]

Out[6]:

array([[  6.1101,  17.592 ],
       [  5.5277,   9.1302],
       [  8.5186,  13.662 ],
       [  7.0032,  11.854 ],
       [  5.8598,   6.8233]])

In [7]:

pyplot.scatter(data[:,0], data[:,1])

Out[7]:

<matplotlib.collections.PathCollection at 0x44e4650>

In [8]:

theta = numpy.zeros(shape=(2,))
alpha = 0.001

m = len(data[:,0])
h = lambda theta, x: numpy.add(theta[0], theta[1]*x)
J = lambda theta: (1./(2.*m))*numpy.sum(numpy.power(h(theta,data[:,0])-data[:,1], 2))

feature_vector = numpy.column_stack((numpy.ones(shape=(m,)), data[:,0]))
nabla_J = lambda theta: (1./m)*numpy.sum(numpy.column_stack((h(theta,data[:,0])-data[:,1],h(theta,data[:,0])-data[:,1]))*feature_vector)

In [9]:

feature_vector[:5]

Out[9]:

array([[ 1.    ,  6.1101],
       [ 1.    ,  5.5277],
       [ 1.    ,  8.5186],
       [ 1.    ,  7.0032],
       [ 1.    ,  5.8598]])

In [10]:

print('Before GD, J = %g' % J(theta))
for i in range(10000):
    theta = theta - alpha*nabla_J(theta)
print('After GD, J = %g' % J(theta))

Before GD, J = 32.0727
After GD, J = 6.42089

In [11]:

theta

Out[11]:

array([ 0.72088159,  0.72088159])