The approxmation $h_{\alpha}(x)$ (dotted line) moves the next iteration from $x=1$ to the indicated point that is near the minimum of $f(x)$ by finding an appropriate step size ($\alpha$).
#
#
# For stochastic gradient descent, the above code changes to the following:
# In[77]:
import random
sgdWK = winit # initialize
Jout=[] # container for output
# don't sum along all data as before
grads=np.array([grad.subs({x0:i,x1:j,y:y_i})
for (i,j),y_i in zip(X,labels)])
for i in range(niter):
gradsf = sm.lambdify((b0,b1,bias),random.choice(grads))
sgdWK = sgdWK - alpha * np.array(gradsf(*sgdWK))
Jout.append(Jf(*sgdWK))
#
#
# The main difference here is that the gradient calculation no longer
# sums across all of the input data (i.e., `grads` list) and is instead randomly
# chosen by the `random.choice` function the above body of the loop. The
# extension to batch gradient descent from this code just requires
# averaging over a sub-selection of the data for the gradients in the
# `batch` variable.
# In[78]:
mbsgdWK = winit # initialize
Jout=[] # container for output
mb = 10 # number of elements in batch
for i in range(niter):
batch = np.vstack([random.choice(grads)
for i in range(mb)]).mean(axis=0)
gradsf = sm.lambdify((b0,b1,bias),batch)
mbsgdWK = mbsgdWK-alpha*np.array(gradsf(*mbsgdWK))
Jout.append(Jf(*mbsgdWK))
# It is straight-forward to incorporate momentum into this loop using a
# Python `deque`, as in the following,
#
#
# In[79]:
from collections import deque
momentum = deque([winit,winit],2)
mbsgdWK = winit # initialize
Jout=[] # container for output
mb = 10 # number of elements in batch
for i in range(niter):
batch=np.vstack([random.choice(grads)
for i in range(mb)]).mean(axis=0)
gradsf=sm.lambdify((b0,b1,bias),batch)
mbsgdWK=mbsgdWK-alpha*np.array(gradsf(*mbsgdWK))+0.5*(momentum[1]-momentum[0])
Jout.append(Jf(*mbsgdWK))
#
#
# [Figure](#fig:gradient_descent_006) shows the three variants of the gradient
# descent algorithm. Notice that the stochastic gradient descent algorithm is the
# most erratic, as it is characterized by taking a new direction for every
# randomly selected data element. Mini-batch gradient descent smoothes these
# out by averaging across multiple data elements. The momentum variant
# is somewhere in-between the to as the effect of the momentum term is not
# pronounced in this example.
#
#
#
#
#
# Different variations of gradient descent.
#
#
#
#
#
# ### Python Example Using Theano
#
# The code shown makes each step of the gradient descent algorithms explicit
# using Sympy, but this implementation is far too slow. The `theano` module
# provides thoughtful and powerful high-level abstractions for algorithm
# implementation that relies upon underlying C/C++ and GPU execution models. This
# means that calculations that are prototyped with `theano` can be executed
# downstream outside of the Python interpreter which makes them much faster. The
# downside of this approach is that calculations can become much harder to
# debug because of the multiple levels of abstraction. Nonetheless, `theano` is
# a powerful tool for algorithm development and execution.
#
# To get started we need some basics from `theano`.
# In[80]:
import theano
import theano.tensor as T
from theano import function, shared
# the next step is to define variables, which are essentially
# placeholders for values that will be computed downstream later. The next block
# defines two named variables as a double-sized float matrix and vector. Note
# that we did not have to specify the dimensions of each at this point.
# In[81]:
x = T.dmatrix("x") # double matrix
y = T.dvector("y") # double vector
# The parameters of our implementation of gradient descent come next,
# as the following:
# In[82]:
w = shared(np.random.randn(2), name="w") # parameters to fit
b = shared(0.0, name="b") # bias term
# variables that are `shared` are ones whose values can be set
# separately via other computations or directly via the `set_value()` method.
# These values can also be retrieved using the `get_value()` method. Now, we need
# to define the probability of obtaining a `1` from the given data as `p`. The
# cross-entropy function and the `T.dot` function are already present (along with
# a wide range of other related functions) in `theano`. The conformability of
# the constituent arguments is the responsibility of the user.
# In[83]:
p=1/(1+T.exp(-T.dot(x,w)-b)) # probability of 1
error = T.nnet.binary_crossentropy(p,y)
loss = error.mean()
gw, gb = T.grad(loss, [w, b])
# The `error` variable is `TensorVariable` type which has many
# built-in methods such as `mean`. The so-derived `loss` function is therefore
# also a `TensorVariable`. The last `T.grad` line is the best part of Theano because
# it can compute these gradients automatically.
# In[84]:
train = function(inputs=[x,y],
outputs=[error],
updates=((w, w - alpha * gw),
(b, b - alpha * gb)))
# The last step is to set up training by defining the training function
# in `theano`. The user will supply the previously defined and named input
# variables (`x` and `y`) and `theano` will return the previously defined `error`
# variable. Recall that the `w` and `b` variables were defined as `shared`
# variables. This means that the function `train` can update their values between
# calls using the update formula specified in the `updates` keyword variable. In
# this case, the update is just plain gradient descent with the previously
# defined `alpha` step-size variable.
#
# We can execute the training plain using the `train` function in the following loop:
# In[85]:
training_steps=1000
for i in range(training_steps):
error = train(X, labels)
# The `train(X,labels)` call is where the `X` and `labels` arrays we
# defined earlier replace the placeholder variables. The update step refreshes
# all of the `shared` variables at each iterative step. At the end of the
# iteration, the so-computed parameters are in the `w` and `b` variables with
# values available via `get_value()`. The implementation for stochastic gradient
# descent requires just a little modification to this loop, as in the following:
# In[86]:
for i in range(training_steps):
idx = np.random.randint(0,X.shape[0])
error = train([X[idx,:]], [labels[idx]])
# where the `idx` variable selects a random data element from the set and
# uses that for the update step at every iteration. Likewise, batch stochastic
# gradient descent follows with the following modification,
# In[87]:
batch_size = 50
indices = np.arange(X.shape[0])
for i in range(training_steps):
idx = np.random.permutation(indices)[:batch_size]
error = train(X[idx,:], labels[idx])
print (w.get_value())
print (b.get_value()) # bias term
# Here, we set up an `indices` variable that is used for randomly
# selecting subsets in the `idx` variable that are passed to the `train`
# function. All of these implementations parallel the corresponding previous
# implementations in Sympy, but these are many orders of magnitude faster due to
# `theano`.
#
#
#
# ## Image Processing Using Convolutional Neural Networks
# In[88]:
import numpy as np
from matplotlib.pylab import subplots, cm
def text_grid(res,i,j,t,ax,**kwds):
'''
put text `t` on grid `i,j` position
passing down `kwds` to `ax.text`
'''
assert isinstance(t,str)
assert isinstance(res,np.ndarray)
color = kwds.pop('color','r')
ax.text(i-0.25,j+0.25,t,color=color,**kwds)
def text_grid_array(res,ax=None,fmt='%d',title=None,title_kw=dict(),**kwds):
'''
put values of `res` array as text on grid
'''
assert isinstance(res,np.ndarray)
if ax is None:
fig, ax = subplots()
ax.imshow(res,cmap=cm.gray_r)
ii,jj = np.where(res.T)
for i,j in zip(ii,jj):
text_grid(res,i,j,fmt%(res[j,i]),ax,**kwds)
if title:
ax.set_title(title,**title_kw)
try:
return fig
except:
pass
def draw_ndimage(c,m=4,n=5,figsize=[10,10]):
t,mt,nt = c.shape
assert m*n == t
fig,axs=subplots(m,n,figsize=figsize)
for ax,i in zip(axs.flatten(),c):
text_grid_array(i,fontsize=6,fmt='%.1f',ax=ax)
_= ax.tick_params(labelleft=False,left=False,labelbottom=False,bottom=False)
return fig
#
#
# In this section, we develop the Convolutional Neural Network (CNN) which is the
# fundamental deep learning image processing application. We deconstruct every
# layer of this network to develop insight into the purpose of the individual
# operations. CNNs take image as inputs and images can be represented as
# Numpy arrays, which makes them fast and easy to use with any of the scientific
# Python tools. The individual entries of the Numpy array are the pixels and the
# row/column dimensions are the height/width of the image respectively. The
# array values are between `0` through `255` and correspond to the intensity of
# the pixel at that location. Three-dimensional images have a third
# third depth-dimension as the color channel (e.g., red, green,
# blue). Two-dimensional image arrays are grayscale.
#
# **Programming Tip.**
#
# Matplotlib makes it easy to draw images using the underlying Numpy arrays. For
# instance, we can draw [Figure](#fig:image_processing_001) using the following
# MNIST image from `sklearn.datasets`, which represents grayscale hand-drawn
# digits (the number zero in this case).
# In[89]:
from matplotlib.pylab import subplots, cm
from sklearn import datasets
mnist = datasets.load_digits()
fig, ax = subplots()
ax.imshow(mnist.images[0],
interpolation='nearest',
cmap=cm.gray)
# In[90]:
fig.savefig('fig-machine_learning/image_processing_001.png')
# The `cmap` keyword argument specifies the colormap as gray. The
# `interpolation` keyword means that the resulting image from `imshow` does not
# try to visually smooth out the data, which can be confusing when working at the
# pixel level. The other hand drawn digits are shown below in [Figure](#fig:image_processing_002).
#
#
#
#
#
#
#
# Image of a hand drawn number zero from the MNIST dataset.
#
#
#
# In[119]:
fig, axs = subplots(2,5, constrained_layout=False)
fig.tight_layout()
for i,(ax,j) in enumerate(zip(axs.flatten(),mnist.images)):
_=ax.imshow(j,interpolation='nearest',cmap=cm.gray)
_=ax.set_title('digit %d'%(i))
_=ax.spines['top'].set_visible(False)
_=ax.spines['bottom'].set_visible(False)
_=ax.spines['left'].set_visible(False)
_=ax.spines['right'].set_visible(False)
_=ax.spines['top'].axis.set_ticks_position('none')
_=ax.spines['bottom'].axis.set_ticks_position('none')
_=ax.spines['left'].axis.set_ticks_position('none')
_=ax.spines['right'].axis.set_ticks_position('none')
_=ax.xaxis.set_visible(False)
_=ax.yaxis.set_visible(False)
fig.tight_layout()
fig.savefig('fig-machine_learning/image_processing_002.png')
#
#
#
#
# Samples of the other hand drawn digits from MNIST.
#
#
#
#
#
# ### Convolution
#
# Convolution is an intensive calculation and it is the core of convolutional
# neural networks. The purpose of convolution is to create alternative
# representations of the input image that emphasize or demphasize certain
# features represented by the *kernel*. The convolution operation consists of a
# kernel and an *input matrix*. The convolution operation is a way of aligning
# and comparing image data with the corresponding data in an image kernel. You
# can think of an image kernel as a template for a canonical feature that the
# convolution operation will uncover. To keep it simple suppose we have the
# following `3x3` kernel matrix,
# In[120]:
import numpy as np
kern = np.eye(3,dtype=np.int)
kern
# Using this kernel, we want to find anything in an input image that
# looks like a diagonal line. Let's suppose we have the following input Numpy image
# In[121]:
tmp = np.hstack([kern,kern*0])
x = np.vstack([tmp,tmp])
x
# Note that this image is just the kernel stacked into a larger Numpy
# array. We want to see if the convolution can pull out the kernel that is
# embedded in the image. Of course, in a real application we would not know
# whether or not the kernel is present in the image, but this example
# helps us understand the convolution operation step-by-step. There is a
# convolution function available in the `scipy` module.
# In[122]:
from scipy.ndimage.filters import convolve
res = convolve(x,kern,mode='constant',cval=0)
res
# Each step of the convolution operation is represented in [Figure](#fig:image_processing_003). The `kern` matrix (light blue square) is
# overlaid upon the `x` matrix and the element-wise product is computed and
# summed. Thus, the `0,0` array output corresponds to this operation applied to
# the top-left 3x3 slice of the input, which results in `3`. The convolution
# operation is sensitive to boundary conditions. For this example, we have chosen
# `mode=constant` and `cval=0` which means that the input image is bordered by
# zeros when the kernel sweeps outside of the input image boundary. This is the
# simplest option for managing the edge conditions and
# `scipy.ndimage.filters.convolve` provides other practical alternatives. It also
# common to normalize the output of the convolution operation by dividing by the
# number of pixels in the kernel (i.e., `3` in this example). Another way to
# think about the convolution operation is as a matched filter that peaks when it
# finds a compatible sub-feature. The final output of the convolution operation
# is shown in [Figure](#fig:image_processing_004). The values of the
# individual pixels are shown in color. Notice where the maximum values of the
# output image are located on the diagonals.
#
#
#
#
#
# The convolution process that produces the res
array. As shown in the sequence, the light blue kern
array is slid around, overlaid, mutiplied, and summed upon the x
array to generate the values of shown in the title. The output of the convolution is shown in [Figure](#fig:image_processing_004).
#
#
#
#
#
#
#
#
#
# The res
array output of the convolution is shown in [Figure](#fig:image_processing_003). The values (in red) shown are the individual outputs of the convolution operation. The grayscale indicates relative magnitude of the shown values (darker is greater).
#
#
#
#
#
# However, the convolution operation is not a perfect detector and results in
# nonzero values for other cases. For example, suppose the input image is a
# forward-slash diagonal line. The step-by-step convolution with the kernel is
# shown in [Figure](#fig:image_processing_005) with corresponding output in
# [Figure](#fig:image_processing_006) that looks nothing like the kernel or the
# input image.
#
#
#
#
#
# The input array is a forward slash diagonal. This sequence shows the step-by-step convolution operation. The output of this convolution is shown in [Figure](#fig:image_processing_006).
#
#
#
#
#
#
#
#
#
# The output of the convolution operation shown in [Figure](#fig:image_processing_005). Note that the output has nonzero elements where there is no match between the input image and the kernel.
#
#
#
#
#
# We can use multiple kernels to explore an input image.
# For example, suppose we have the input image shown on the left in [Figure](#fig:image_processing_007). The two kernels are shown in the upper row, with
# corresponding outputs on the bottom row. Each kernel is able to emphasize its
# particular feature but extraneous features appear in both outputs. We can have
# as many outputs as we have kernels but because each output image is as large as the
# input image, we need a way to reduce the size of this data.
#
#
#
#
#
# Given two kernels (upper row) and the input image on the left, the output images are shown on the bottom row. Note that each kernel is able to emphasize its feature on the input composite image but other extraneous features appear in the outputs.
#
#
#
#
#
# ### Maximum Pooling
#
# To reduce the size of the output images, we can apply *maximum pooling*
# to replace a tiled subset of the image with the maximum pixel value
# in that particular subset. The following Python code illustrates maximum
# pooling,
# In[123]:
def max_pool(res,width=2,height=2):
m,n = res.shape
xi = [slice(i,i+width) for i in range(0,m,width)]
yi = [slice(i,i+height) for i in range(0,n,height)]
out = np.zeros((len(xi),len(yi)),dtype=res.dtype)
for ni,i in enumerate(xi):
for nj,j in enumerate(yi):
out[ni,nj]= res[i,j].max()
return out
# **Programming Tip.**
#
# The `slice` object provides programmatic array slicing. For
# example, `x[0,3]=x[slice(0,3)]`. This means you can
# separate the `slice` from the array, which makes it easier to manage.
#
#
#
# Pooling reduces the dimensionality of the output of the convolution
# and makes stacking convolutions together computationally feasible. [Figure](#fig:image_processing_008) shows the output of the `max_pool` function on
# the indicated input images.
#
#
#
#
#
# The max_pool
function reduces the size of the output images (left column) to the images on the right column. Note that the pool size is 2x2
so that the resulting pooled images are half the size of the original images in each dimension.
#
#
#
#
#
# ### Rectified Linear Activation
#
# Rectified Linear Activation Units (ReLUs) are neural network units that
# implement the following activation function,
# $$
# r(x) = \begin{cases}
# x & \mbox{if } x>0 \\
# 0 & \mbox{otherwise }
# \end{cases}
# $$
# To use this activation properly, the kernels in the convolutional
# layer must be scaled to the $\lbrace -1,1 \rbrace$ range. We
# can implement our own rectified linear activation function using the
# following code,
# In[96]:
def relu(x):
'rectified linear activation function'
out = np.zeros(x.shape,dtype=x.dtype)
idx = x>=0
out[idx]=x[idx]
return out
# Now that we understand the basic building blocks, let us investigate how the
# operations fit together. To create some training image data, we use the
# following function to create some random backwards and forwards slashed images
# as shown in [Figure](#fig:image_processing_009). As before, we have the
# scaled kernels shown in [Figure](#fig:image_processing_010). We are going to
# apply the convolution, max-pooling, and rectified linear activation function
# sequence step-by-step and observe the outputs at each step.
# In[97]:
def gen_rand_slash(m=6,n=6,direction='back'):
'''generate random forward/backslash images.
Must have at least two pixels'''
assert direction in ('back','forward')
assert n>=2 and m>=2
import numpy as np
import random
out = -np.ones((m,n),dtype=float)
i = random.randint(2,min(m,n))
j = random.randint(-i,max(m,n)-1)
t = np.diag([1,]*i,j)
if direction == 'forward':
t = np.flipud(t)
try:
assert t.sum().sum()>=2
out[np.where(t)]=1
return out
except:
return gen_rand_slash(m=m,n=n,direction=direction)
# create slash-images training data with classification id 1 or 0
training=[(gen_rand_slash(),1) for i in range(10)] + \
[(gen_rand_slash(direction='forward'),0) for i in range(10)]
# In[98]:
fig,axs=subplots(4,5,figsize=[8,8])
for ax,i in zip(axs.flatten(),training):
_=ax.imshow(i[0],cmap='gray_r')
_=ax.set_title(f'category={i[1]}')
_=ax.tick_params(labelleft=False,left=False,labelbottom=False,bottom=False)
fig.tight_layout()
fig.savefig('fig-machine_learning/image_processing_009.png')
#
#
#
#
# The training data set for our convolutional neural network. The forward slash images are labelled category 0
and the backward slash images are category 1
.
#
#
#
#
#
#
#
#
#
# The two scaled feature kernels for the convolutional neural network.
#
#
#
#
#
# [Figure](#fig:image_processing_011) shows the output of convolving the
# training data in [Figure](#fig:image_processing_009) with `kern1`, as shown on
# the left panel of [Figure](#fig:image_processing_010). Note that
# the following code defines each of these kernels,
# In[99]:
kern1 = (np.eye(3,dtype=np.int)*2-1)/9. # scale
kern2 = np.flipud(kern1)
# The next operation is the activation function for the rectified linear unit
# with output shown in [Figure](#fig:image_processing_012). Note that all of the
# negative terms have been replaced by zeros. The next step is the maximum
# pooling operation as shown in [Figure](#fig:image_processing_013). Notice that
# the number of total pixels in the training data has reduced from thirty-six per
# image to nine per image. With these processed images, we have the inputs we
# need for the final classification step.
# In[100]:
fig,axs=subplots(4,5,figsize=[10,10])
for ax,(i,cat) in zip(axs.flatten(),training):
res1=convolve(i,kern1,mode='constant',cval=0)
_= text_grid_array(res1,fontsize=6,fmt='%.1f',ax=ax)
_= ax.tick_params(labelleft=False,left=False,labelbottom=False,bottom=False)
_=fig.suptitle('Training Set Convolution with kern1',fontsize=22)
fig.tight_layout()
fig.savefig('fig-machine_learning/image_processing_011.png')
# In[101]:
fig,axs=subplots(4,5,figsize=[10,10])
for ax,(i,cat) in zip(axs.flatten(),training):
res1=convolve(i,kern1,mode='constant',cval=0)
_=text_grid_array(relu(res1),fontsize=6,fmt='%.1f',ax=ax)
_=ax.tick_params(labelleft=False,left=False,labelbottom=False,bottom=False)
_=fig.suptitle('ReLU of Convolution with kern1',fontsize=22)
fig.tight_layout()
fig.savefig('fig-machine_learning/image_processing_012.png')
# In[102]:
fig,axs=subplots(4,5,figsize=[8,8])
for ax,(i,cat) in zip(axs.flatten(),training):
res1=convolve(i,kern1,mode='constant',cval=0)
tmp= max_pool(relu(res1))
_=text_grid_array(tmp,fontsize=6,fmt='%.1f',ax=ax)
_=ax.tick_params(labelleft=False,left=False,labelbottom=False,bottom=False)
_=fig.suptitle('Max-pool of ReLU Output for kern1',fontsize=22)
fig.tight_layout()
fig.savefig('fig-machine_learning/image_processing_013.png')
#
#
#
#
# The output of convolving the training data in [Figure](#fig:image_processing_009) with kern1
, as shown on the left panel of [Figure](#fig:image_processing_010).
#
#
#
#
#
#
#
#
#
# The output of the rectified linear unit activation function with the input shown in [Figure](#fig:image_processing_011).
#
#
#
#
#
#
#
#
#
# The output of maximum pooling operation with the input shown in [Figure](#fig:image_processing_012) for fixed image kernel kern1
.
#
#
#
#
#
# ### Convolutional Neural Network Using Keras
#
# Now that we have experimented with the individual operations using our own
# Python code, we can construct the convolutional neural network using Keras. In
# particular, we use the Keras functional interface to define this neural
# network because that makes it easy to unpack the operations at the individual
# layers.
# In[103]:
from keras import metrics
from keras.models import Model
from keras.layers.core import Dense, Activation, Flatten
from keras.layers import Input
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.optimizers import SGD
from keras import backend as K
from keras.utils import to_categorical
# Note that the names of the modules are consistent with their
# operations. We also need to tell `Keras` how to manage the input images,
# In[104]:
K.set_image_data_format('channels_first') # image data format
inputs = Input(shape=(1,6,6)) # input data shape
# Now we can build the individual convolutional layers. Note the specification of
# the activations at each layer and placement of the `inputs`.`
# In[105]:
clayer = Conv2D(2,(3,3),padding='same',
input_shape=(1,6,6),name='conv',
use_bias=False,
trainable=False)(inputs)
relu_layer= Activation('relu')(clayer)
maxpooling = MaxPooling2D(pool_size=(2,2),
name='maxpool')(relu_layer)
flatten = Flatten()(maxpooling)
softmax_layer = Dense(2,
activation='softmax',
name='softmax')(flatten)
model = Model(inputs=inputs, outputs=softmax_layer)
# inject fixed kernels into convolutional layer
fixed_kernels = [np.dstack([kern1,kern2]).reshape(3,3,1,2)]
model.layers[1].set_weights(fixed_kernels)
# Observe that the functional interface means that each layer is
# explicitly a function of the previous one. Note that `trainable=False` for the
# convolutional layer because we want to inject our fixed kernels into it at the
# end. The `flatten` layer reshapes the data so that the entire processed image
# at the point is fed into the `softmax_layer`, whose output is proportional to
# the probability that the image belongs to either class. The `set_weights()`
# function is where we inject our fixed kernels. These are not going to be
# updated by the optimization algorithm because of the prior `trainable=False`
# option. With the topology of the neural network defined, we now have to choose
# the optimization algorithm and pack all of this configuration into the model
# with the `compile` step.
# In[106]:
lr = 0.01 # learning rate
sgd = SGD(lr=lr, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy',
metrics.categorical_crossentropy])
# The `metrics` specification means that we want to training process to
# keep track of those named items. Next, we generate some training data using our
# `gen_rand_slash` function with the associated class of each image (`1` or `0`).
# Most of this code is just shaping the tensors for Keras. The final
# `model.fit()` step is where the internal weights of the neural network are
# adjusted according to the given inputs.
# In[107]:
# generate some training data
ntrain = len(training)
t=np.dstack([training[i][0].T
for i in range(ntrain)]).T.reshape(ntrain,1,6,6)
y_binary=to_categorical(np.hstack([np.ones(ntrain//2),
np.zeros(ntrain//2)]))
# fit the configured model
h=model.fit(t,y_binary,epochs=500,verbose=0)
# With that completed, we can investigate the functional mapping of each layer
# with `K.function`. The following creates a mapping between the input layer and
# the convolutional layer,
# In[108]:
convFunction = K.function([inputs],[clayer])
# Now, we can feed the training data into this function as see the output of
# just the convolutional layer, which is shown,
# In[109]:
fig=draw_ndimage(convFunction([t])[0][:,0,:,:],4,5)
_=fig.suptitle('Keras convolution layer output given kern1',fontsize=22);
fig.tight_layout()
fig.savefig('fig-machine_learning/image_processing_015.png')
# In[110]:
fig=draw_ndimage(convFunction([t])[0][:,1,:,:],4,5)
_=fig.suptitle('Keras convolution layer output given kern2',fontsize=22);
fig.tight_layout()
fig.savefig('fig-machine_learning/image_processing_016.png')
#
#
#
#
# Compare this to [Figure](#fig:image_processing_011). This shows our hand-tooled convolution is the same as that implemented by Keras.
#
#
#
#
#
# We can do this again for the pooling layer by creating another Keras function,
# In[111]:
maxPoolingFunction = K.function([inputs],[maxpooling])
# whose output is shown in [Figure](#fig:image_processing_017). We can examine the
# final output of this network using the `predict` function,
# In[112]:
fixed_kernels = model.predict(t)
fixed_kernels
# and we can see the weights given to each of the classes. Taking the
# maximum of these across the columns gives the following,
# In[113]:
np.argmax(fixed_kernels,axis=1)
# which means that our convolutional neural network with the fixed kernels
# did well predicting the classes of each of our input images. Recall that our model
# configuration prevented our fixed kernels from updating in the training process. Thus, the
# main work of model training was changing the weights of the final output layer. We can
# re-do this exercise by removing this constraint and see how the network performs if
# it is able to adaptively re-weight the kernel terms as part of training by changing the `trainable`
# keyword argument and then re-build and train the model, as shown next.
# In[114]:
clayer = Conv2D(2,(3,3),padding='same',
input_shape=(1,6,6),name='conv',
use_bias=False)(inputs)
relu_layer= Activation('relu')(clayer)
maxpooling = MaxPooling2D(pool_size=(2,2),
name='maxpool')(relu_layer)
flatten = Flatten()(maxpooling)
softmax_layer = Dense(2,
activation='softmax',
name='softmax')(flatten)
model = Model(inputs=inputs, outputs=softmax_layer)
model.compile(loss='categorical_crossentropy',
optimizer=sgd)
h=model.fit(t,y_binary,epochs=500,verbose=0)
new_kernels = model.predict(t)
new_kernels
# with corresponding max output,
# In[115]:
np.argmax(new_kernels,axis=1)
# In[116]:
fig=draw_ndimage(maxPoolingFunction([t])[0][:,0,:,:],4,5)
_=fig.suptitle('Keras Pooling layer output given kern1',fontsize=22);
fig.tight_layout()
fig.savefig('fig-machine_learning/image_processing_017.png')
#
#
#
#
# Output of max-pooling layer for fixed kernel kern1
. Compare this to [Figure](#fig:image_processing_013). This shows our hand-tooled implemention is equivalent to that by Keras.
#
#
#
# In[117]:
fig,axs=subplots(1,2,sharey=True,sharex=True)
text_grid_array(model.layers[1].get_weights()[0][:,:,0,0],fmt='%.1f',title='Updated kern1',ax=axs[0])
text_grid_array(model.layers[1].get_weights()[0][:,:,0,1],fmt='%.1f',title='Updated kern2',ax=axs[1])
for ax in axs:
ax.tick_params(labelleft=False,left=False,labelbottom=False,bottom=False)
fig.savefig('fig-machine_learning/image_processing_018.png')
#
#
#
#
# Kernels updated during the training process. Compare to [Figure](#fig:image_processing_010).
#
#
#
#
#
# The newly updated kernels are shown in [Figure](#fig:image_processing_018). Note how different these are from the original
# fixed kernels. We can see the change in the respective predictions in [Figure](#fig:image_processing_019). Thus, the benefit of updating the kernels in the
# training process is to improve the accuracy overall, but at the cost of
# interpretability of the kernels themselves. Nonetheless, it is seldom the case
# that the kernels are known ahead of time, as in our artificial example here, so
# in practice, there may be nothing to really interpet anyway. Nonetheless, for other
# problems where there is a target feature in the data for which good a-priori
# exemplars exist that could serve a kernels, then priming these kernels early in training
# may help to tune into those target features, especially if they are rare in the training
# data.
# In[118]:
fig,ax=subplots()
_=ax.plot(fixed_kernels[:,0],label='fixed kernels')
_=ax.plot(new_kernels[:,0],label='updated kernels')
_=ax.set_xlabel('Training index',fontsize=16)
_=ax.set_ylabel('Probability of Category 1',fontsize=16)
_=ax.spines['top'].set_visible(False)
_=ax.spines['right'].set_visible(False)
_=ax.legend()
_=ax.set_title('Updated kernels improve classification',fontsize=22)
fig.savefig('fig-machine_learning/image_processing_019.png')
#
#
#
#
# Recall that the second half of the training set was classified as category 1
. The updated kernels provide a wider margin for classification than our fixed kernels, even though the ultimate performance is very similar between them.
#
#
#