Currently using a pretty naive minimization algorithm. It will approximate the gradient and move in that direction. This requires lots of function calls.
We have no way to get the Hessian - which is needed for the variance estimates of our parameters.

Automatic differentiation - how to differentiate an algorithm¶

Symbolic differentiation is what we learned in school, and software like SymPy, Wolfram and Mathematica do this well. But I want to differentiate the following:

In [1]:

%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt


def f(x):
    y = 0.
    for i in range(100):
        y = np.sin(y + x)
    return y

x = np.linspace(-2, 2, 100)
plt.plot(x, [f(x_) for x_ in x])

Out[1]:

[<matplotlib.lines.Line2D at 0x116b57320>]

Mathematically, it's something like:

$$f(x) = \sin(x + \sin(x + \sin(x + ...\sin(x)))...)$$

Good luck differentiating that and getting a nice closed form. If this is not complicated enough for you, feel free to add some if statements.

We can use autograd, an automatical diff package, to compute exact, pointwise derivatives.

In [2]:

from autograd import grad
from autograd import numpy as np

def f(x):
    y = 0.
    for i in range(100):
        y = np.sin(y + x)
    return y


grad(f)(1.)
# magic! 

Out[2]:

-0.26242653107144726

Of course, you can string together these pointwise derivatives into a function:

In [3]:

grad_f = grad(f)

x = np.linspace(-2, 2, 100)
plt.plot(x, [grad_f(x_) for x_ in x])

Out[3]:

[<matplotlib.lines.Line2D at 0x119477d30>]

To use this in our optimization routines:

we can automatically compute gradients that the optimizer can use.
Hessians can be exactly calculated (we will do this later)

In [4]:

T = (np.random.exponential(size=1000)/1.5) ** 2.3
E = np.random.binomial(1, 0.95, size=1000)

In [5]:

def cumulative_hazard(params, t):
    lambda_, rho_ = params
    return (t / lambda_) ** rho_

def log_hazard(params, t):
    lambda_, rho_ = params
    return np.log(rho_) - np.log(lambda_) + (rho_ - 1) * (np.log(t) - np.log(lambda_))

def log_likelihood(params, t, e):
    return np.sum(e * log_hazard(params, t)) - np.sum(cumulative_hazard(params, t))

def negative_log_likelihood(params, t, e):
    return -log_likelihood(params, t, e)

So grad(negative_log_likelihood) will find the gradient of negative_log_likelihood with respect to the first parameter, params.

In [6]:

grad_negative_log_likelihood = grad(negative_log_likelihood)

print(grad_negative_log_likelihood(np.array([1., 1.]), T, E))

[-199.44295882 2906.3619735 ]

In [7]:

from scipy.optimize import minimize

results = minimize(negative_log_likelihood, 
        x0 = np.array([1.0, 1.0]),
        method=None, 
        args=(T, E),
        jac=grad_negative_log_likelihood,
        bounds=((0.00001, None), (0.00001, None)))

print(results)

      fun: 243.1558229286153
 hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 0.00101051, -0.00293405])
  message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 11
      nit: 7
   status: 0
  success: True
        x: array([0.4693   , 0.4273751])

In [8]:

from autograd import value_and_grad

results = minimize(
        value_and_grad(negative_log_likelihood), 
        x0 = np.array([1.0, 1.0]),
        method=None, 
        args=(T, E),
        jac=True, # notice this set to True now.
        bounds=((0.00001, None), (0.00001, None)))

In [9]:

results

Out[9]:

      fun: 243.1558229286153
 hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 0.00101051, -0.00293405])
  message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 11
      nit: 7
   status: 0
  success: True
        x: array([0.4693   , 0.4273751])

Let's continue this analytical-train 🚂 to Part 4!