Solutions to Exercises¶
Module 1¶
Exercise 1: Motivating Automatic differentiation¶
A. Here is an example python function:
import numpy as np
import matplotlib.pyplot as plt
def numerical_diff(f, h):
def num_der(x):
return (f(x+h)-f(x))/h
return num_der
nd = numerical_diff(np.log, 1E-1)
nd2 = numerical_diff(np.log, 1E-7)
nd3 = numerical_diff(np.log, 1E-15)
x = np.linspace(.2, .4, 20)
plt.plot(x, nd(x), label = 'h=.1')
plt.plot(x, nd2(x), label = 'h=1e-7')
plt.plot(x, nd3(x), label = 'h=1e-15')
plt.plot(x, 1/x, '--', label='true')
plt.legend()
plt.xlabel('x')
plt.ylabel('df/dx')
- The corresponding graph output is:
- The numerical differentiation performs best for h=1e-7. For h too small, we encounter round off errors as a result of limited machine precision. For h too large, the numerical approximation is not exact. Automatic differentiation addresses this by evaluating derivatives exactly to machine precision without taking numerical estimates.
Exercise 2: Basic Graph Structure of Calculations¶
You should notice that while your labelling of the nodes may be different, the overall connectivity structure and edges should be the same.
Exercise 3: Looking Toward Multiple Inputs¶
Module 2¶
Exercise 1: Neural Network Problem¶
The corresponding computational graph is given by:
The table corresponding to this graph is given by:
| Trace | Elementary Function | Current Value | Elementary Function Derivative | \(\nabla_x\) Evaluated at (x,y) | \(\nabla_y\) Evaluated at (x,y) |
|---|---|---|---|---|---|
| \(x_1\) | \(x\) | \(x\) | 1 | 1 | 0 |
| \(x_2\) | \(y\) | \(y\) | 1 | 0 | 1 |
| \(x_3\) | \(w_{11} \times x_1\) | \(w_{11}x\) | \(w_{11}\dot{x_1}\) | \(w_{11}\) | 0 |
| \(x_4\) | \(w_{21} \times x_1\) | \(w_{21}x\) | \(w_{21}\dot{x_1}\) | \(w_{21}\) | 0 |
| \(x_5\) | \(w_{12} \times x_2\) | \(w_{12}y\) | \(w_{12}\dot{x_2}\) | 0 | \(w_{12}\) |
| \(x_6\) | \(w_{22} \times x_2\) | \(w_{22}y\) | \(w_{22}\dot{x_2}\) | 0 | \(w_{22}\) |
| \(x_7\) | \(x_3+x_5\) | \(w_{11}x+w_{12}y\) | \(\dot{x_3}+\dot{x_5}\) | \(w_{11}\) | \(w_{12}\) |
| \(x_8\) | \(x_4+x_6\) | \(w_{21}x+w_{22}y\) | \(\dot{x_4}+\dot{x_6}\) | \(w_{21}\) | \(w_{22}\) |
| \(x_9\) | \(z(x_7)\) | \(z(w_{11}x+w_{12}y)\) | \(z^\prime (x_7)\dot{x_7}\) | \(z^\prime (w_{11}x+w_{12}y)w_{11}\) | \(z^\prime (w_{11}x+w_{12}y)w_{12}\) |
| \(x_{10}\) | \(z(x_8)\) | \(z(w_{21}x+w_{22}y)\) | \(z^\prime (x_8)\dot{x_8}\) | \(z^\prime (w_{21}x+w_{22}y)w_{21}\) | \(z^\prime (w_{21}x+w_{22}y)w_{22}\) |
| \(x_{11}\) | \(w_{out,1}\times x_9\) | \(w_{out,1}z(w_{11}x+w_{12}y)\) | \(w_{out, 1}\dot{x_9}\) | \(w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{11}\) | \(w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{12}\) |
| \(x_{12}\) | \(w_{out,2}\times x_{10}\) | \(w_{out,2}z(w_{21}x+w_{22}y)\) | \(w_{out, 2}\dot{x_{10}}\) | \(w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{21}\) | \(w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{22}\) |
| \(x_{13}\) | \(x_{11}+x_{12}\) | \(w_{out,1}z(w_{11}x+w_{12}y)+w_{out,2}z(w_{21}x+w_{22}y)\) | \(\dot{x_{11}}+\dot{x_{12}}\) | \(w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{11}+w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{21}\) | \(w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{12}+w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{22}\) |
Exercise 2: Operation Count Problem¶
Computing the derivative at the six nodes \(x_3, x_4, x_5, x_6, x_{11}, x_{12}\) requires one multiplication in each of the 2 components, contributing \(6 \times 1 \times 2 = 12\) operations.
Computing the derivative at the 3 nodes \(x_7, x_8, x_{13}\) requires one addition in each of the 2 components, contributing \(3\times 1 \times 2 = 6\) operations.
Computing the derivative at the nodes \(x_9, x_{10}\) requires 2 operations (an elementary function evaluation of \(z^\prime\) and a multiplication) in each of the 2 components, contributing \(2\times 2 \times 2 = 8\) operations.
This gives us a total of 26 operations.
Module 3¶
Exercise 1: Reverse Mode Computational Table and Derivatives¶
The reverse computational table is given by:
| Node | Current Value | Numerical Value | \(\partial_1\) | \(\partial_1\) Value | \(\partial_2\) | \(\partial_2\) Value |
|---|---|---|---|---|---|---|
| \(x_1\) | \(x\) | 1 | 1 | 1 | 0 | 0 |
| \(x_2\) | \(y\) | 2 | 0 | 0 | 1 | 1 |
| \(x_3\) | \(x_1x_2\) | 2 | \(x_2\) | 2 | \(x_1\) | 1 |
| \(x_4\) | \(\exp(x_3)\) | \(e^2\) | \(\exp(x_3)\) | \(e^2\) | ||
| \(x_5\) | \(x_3+x_4\) | 2+:math:e^2 | 1 | 1 | 1 | 1 |
We can now trace back through the table to find the adjoints.
We note that in our bar notation we have \(\bar{x_1} = \frac{\partial f}{\partial x}\) and \(\bar{x_2} = \frac{\partial f}{\partial y}\).
Beyond the Basics¶
Exercise 1: Dual Numbers¶
To find the derivative, we look to the dual part when we replace x with \(a+b\epsilon\). For our function,
Since \(\epsilon^2=0\),
Expanding \(e^{2ab\epsilon}\) using a Taylor series,
So we have that the derivative evaluated at a is \(2ae^{a^2}\).
Exercise 2: Toy AD Example¶
An example AutoDiffToy class could look like:
class AutoDiffToy():
""" Creates an object for autodifferentiation.
ATTRIBUTES
==========
val : the value of the object
der : the derivative of the object
EXAMPLES
========
>>> x = AutoDiffToy(4)
>>> x.val
4
>>> x.der
1
"""
def __init__(self, a, d=1.0):
self.val = a
self.der = d
def __add__(self, other): #overload addition
try:
return AutoDiffToy(self.val+other.val, self.der+other.der)
except AttributeError:
other = AutoDiffToy(other, 0) # derivative of a constant is zero
return AutoDiffToy(self.val+other.val, self.der+other.der)
def __radd__(self, other): #ensure commutativity of addition
return self.__add__(other)
def __mul__(self, other): #overload multiplication
try:
return AutoDiffToy(self.val*other.val, self.val*other.der+other.val*self.der)
except AttributeError:
other = AutoDiffToy(other, 0)
return AutoDiffToy(self.val*other.val, self.val*other.der+other.val*self.der)
def __rmul__(self, other):
return self.__mul__(other)