Symbolic differentiation is what we learned in school, and software like SymPy, Wolfram and Mathematica do this well. But I want to differentiate the following:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
def f(x):
y = 0.
for i in range(100):
y = np.sin(y + x)
return y
x = np.linspace(-2, 2, 100)
plt.plot(x, [f(x_) for x_ in x])
[<matplotlib.lines.Line2D at 0x116b57320>]
Mathematically, it's something like:
$$f(x) = \sin(x + \sin(x + \sin(x + ...\sin(x)))...)$$Good luck differentiating that and getting a nice closed form. If this is not complicated enough for you, feel free to add some if
statements.
We can use autograd
, an automatical diff package, to compute exact, pointwise derivatives.
from autograd import grad
from autograd import numpy as np
def f(x):
y = 0.
for i in range(100):
y = np.sin(y + x)
return y
grad(f)(1.)
# magic!
-0.26242653107144726
Of course, you can string together these pointwise derivatives into a function:
grad_f = grad(f)
x = np.linspace(-2, 2, 100)
plt.plot(x, [grad_f(x_) for x_ in x])
[<matplotlib.lines.Line2D at 0x119477d30>]
To use this in our optimization routines:
T = (np.random.exponential(size=1000)/1.5) ** 2.3
E = np.random.binomial(1, 0.95, size=1000)
def cumulative_hazard(params, t):
lambda_, rho_ = params
return (t / lambda_) ** rho_
def log_hazard(params, t):
lambda_, rho_ = params
return np.log(rho_) - np.log(lambda_) + (rho_ - 1) * (np.log(t) - np.log(lambda_))
def log_likelihood(params, t, e):
return np.sum(e * log_hazard(params, t)) - np.sum(cumulative_hazard(params, t))
def negative_log_likelihood(params, t, e):
return -log_likelihood(params, t, e)
So grad(negative_log_likelihood)
will find the gradient of negative_log_likelihood
with respect to the first parameter, params
.
grad_negative_log_likelihood = grad(negative_log_likelihood)
print(grad_negative_log_likelihood(np.array([1., 1.]), T, E))
[-199.44295882 2906.3619735 ]
from scipy.optimize import minimize
results = minimize(negative_log_likelihood,
x0 = np.array([1.0, 1.0]),
method=None,
args=(T, E),
jac=grad_negative_log_likelihood,
bounds=((0.00001, None), (0.00001, None)))
print(results)
fun: 243.1558229286153 hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64> jac: array([ 0.00101051, -0.00293405]) message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH' nfev: 11 nit: 7 status: 0 success: True x: array([0.4693 , 0.4273751])
from autograd import value_and_grad
results = minimize(
value_and_grad(negative_log_likelihood),
x0 = np.array([1.0, 1.0]),
method=None,
args=(T, E),
jac=True, # notice this set to True now.
bounds=((0.00001, None), (0.00001, None)))
results
fun: 243.1558229286153 hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64> jac: array([ 0.00101051, -0.00293405]) message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH' nfev: 11 nit: 7 status: 0 success: True x: array([0.4693 , 0.4273751])
Let's continue this analytical-train 🚂 to Part 4!