Solutions to Exercises¶

Module 1¶

Exercise 1: Motivating Automatic differentiation¶

A. Here is an example python function:

import numpy as np
import matplotlib.pyplot as plt

def numerical_diff(f, h):
    def num_der(x):
        return (f(x+h)-f(x))/h
    return num_der

nd = numerical_diff(np.log, 1E-1)
nd2 = numerical_diff(np.log, 1E-7)
nd3 = numerical_diff(np.log, 1E-15)

x = np.linspace(.2, .4, 20)

plt.plot(x, nd(x), label = 'h=.1')
plt.plot(x, nd2(x), label = 'h=1e-7')
plt.plot(x, nd3(x), label = 'h=1e-15')
plt.plot(x, 1/x, '--', label='true')
plt.legend()
plt.xlabel('x')
plt.ylabel('df/dx')

The corresponding graph output is:

The numerical differentiation performs best for h=1e-7. For h too small, we encounter round off errors as a result of limited machine precision. For h too large, the numerical approximation is not exact. Automatic differentiation addresses this by evaluating derivatives exactly to machine precision without taking numerical estimates.

Exercise 2: Basic Graph Structure of Calculations¶

You should notice that while your labelling of the nodes may be different, the overall connectivity structure and edges should be the same.

Exercise 3: Looking Toward Multiple Inputs¶

Module 2¶

Exercise 1: Neural Network Problem¶

The corresponding computational graph is given by:

The table corresponding to this graph is given by:

Trace	Elementary Function	Current Value	Elementary Function Derivative	\(\nabla_x\) Evaluated at (x,y)	\(\nabla_y\) Evaluated at (x,y)
\(x_1\)	\(x\)	\(x\)	1	1	0
\(x_2\)	\(y\)	\(y\)	1	0	1
\(x_3\)	\(w_{11} \times x_1\)	\(w_{11}x\)	\(w_{11}\dot{x_1}\)	\(w_{11}\)	0
\(x_4\)	\(w_{21} \times x_1\)	\(w_{21}x\)	\(w_{21}\dot{x_1}\)	\(w_{21}\)	0
\(x_5\)	\(w_{12} \times x_2\)	\(w_{12}y\)	\(w_{12}\dot{x_2}\)	0	\(w_{12}\)
\(x_6\)	\(w_{22} \times x_2\)	\(w_{22}y\)	\(w_{22}\dot{x_2}\)	0	\(w_{22}\)
\(x_7\)	\(x_3+x_5\)	\(w_{11}x+w_{12}y\)	\(\dot{x_3}+\dot{x_5}\)	\(w_{11}\)	\(w_{12}\)
\(x_8\)	\(x_4+x_6\)	\(w_{21}x+w_{22}y\)	\(\dot{x_4}+\dot{x_6}\)	\(w_{21}\)	\(w_{22}\)
\(x_9\)	\(z(x_7)\)	\(z(w_{11}x+w_{12}y)\)	\(z^\prime (x_7)\dot{x_7}\)	\(z^\prime (w_{11}x+w_{12}y)w_{11}\)	\(z^\prime (w_{11}x+w_{12}y)w_{12}\)
\(x_{10}\)	\(z(x_8)\)	\(z(w_{21}x+w_{22}y)\)	\(z^\prime (x_8)\dot{x_8}\)	\(z^\prime (w_{21}x+w_{22}y)w_{21}\)	\(z^\prime (w_{21}x+w_{22}y)w_{22}\)
\(x_{11}\)	\(w_{out,1}\times x_9\)	\(w_{out,1}z(w_{11}x+w_{12}y)\)	\(w_{out, 1}\dot{x_9}\)	\(w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{11}\)	\(w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{12}\)
\(x_{12}\)	\(w_{out,2}\times x_{10}\)	\(w_{out,2}z(w_{21}x+w_{22}y)\)	\(w_{out, 2}\dot{x_{10}}\)	\(w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{21}\)	\(w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{22}\)
\(x_{13}\)	\(x_{11}+x_{12}\)	\(w_{out,1}z(w_{11}x+w_{12}y)+w_{out,2}z(w_{21}x+w_{22}y)\)	\(\dot{x_{11}}+\dot{x_{12}}\)	\(w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{11}+w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{21}\)	\(w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{12}+w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{22}\)

Exercise 2: Operation Count Problem¶

Computing the derivative at the six nodes \(x_3, x_4, x_5, x_6, x_{11}, x_{12}\) requires one multiplication in each of the 2 components, contributing \(6 \times 1 \times 2 = 12\) operations.

Computing the derivative at the 3 nodes \(x_7, x_8, x_{13}\) requires one addition in each of the 2 components, contributing \(3\times 1 \times 2 = 6\) operations.

Computing the derivative at the nodes \(x_9, x_{10}\) requires 2 operations (an elementary function evaluation of \(z^\prime\) and a multiplication) in each of the 2 components, contributing \(2\times 2 \times 2 = 8\) operations.

This gives us a total of 26 operations.

Module 3¶

Exercise 1: Reverse Mode Computational Table and Derivatives¶

The reverse computational table is given by:

Node	Current Value	Numerical Value	\(\partial_1\)	\(\partial_1\) Value	\(\partial_2\)	\(\partial_2\) Value
\(x_1\)	\(x\)	1	1	1	0	0
\(x_2\)	\(y\)	2	0	0	1	1
\(x_3\)	\(x_1x_2\)	2	\(x_2\)	2	\(x_1\)	1
\(x_4\)	\(\exp(x_3)\)	\(e^2\)	\(\exp(x_3)\)	\(e^2\)
\(x_5\)	\(x_3+x_4\)	2+:math:e^2	1	1	1	1

We can now trace back through the table to find the adjoints.

\[\bar{x_5} = \frac{\partial f}{\partial x_5} = 1 \bar{x_4} = \frac{\partial f}{\partial x_5}\frac{\partial x_5}{\partial x_4} = 1 \cdot 1 = 1 \bar{x_3} = \frac{\partial f}{\partial x_4}\frac{\partial x_4}{\partial x_5}+\frac{\partial f}{\partial x_5}\frac{\partial x_5}{\partial x_3} = 1\cdot e^2 + 1\cdot 1 = 1+e^2 \bar{x_2} = \frac{\partial f}{\partial x_3}\frac{\partial x_3}{\partial x_2} = (1+e^2)x_1 = 1+e^2 \bar{x_1} = \frac{\partial f}{\partial x_3}\frac{\partial x_3}{\partial x_1} = (1+e^2)x_2 = 2+2e^2\]

We note that in our bar notation we have \(\bar{x_1} = \frac{\partial f}{\partial x}\) and \(\bar{x_2} = \frac{\partial f}{\partial y}\).

Beyond the Basics¶

Exercise 1: Dual Numbers¶

To find the derivative, we look to the dual part when we replace x with \(a+b\epsilon\). For our function,

\[y= e^{(a+b\epsilon)^2} = e^{a^2}e^{2ab\epsilon}e^{b^2\epsilon^2}\]

Since \(\epsilon^2=0\),

\[=e^{a^2}e^{2ab\epsilon}\]

Expanding \(e^{2ab\epsilon}\) using a Taylor series,

\[= e^{a^2}\left( 1+ 2ab\epsilon + \frac{(2ab\epsilon)^2}{2} + \cdots \right) = e^{a^2}+2ae^{a^2}b\epsilon\]

So we have that the derivative evaluated at a is \(2ae^{a^2}\).

Exercise 2: Toy AD Example¶

An example AutoDiffToy class could look like:

class AutoDiffToy():
    """ Creates an object for autodifferentiation.

    ATTRIBUTES
    ==========
    val : the value of the object
    der : the derivative of the object

    EXAMPLES
    ========
    >>> x = AutoDiffToy(4)
    >>> x.val
    4
    >>> x.der
    1
    """

    def __init__(self, a, d=1.0):
        self.val = a
        self.der = d

    def __add__(self, other): #overload addition
        try:
            return AutoDiffToy(self.val+other.val, self.der+other.der)
        except AttributeError:
            other = AutoDiffToy(other, 0) # derivative of a constant is zero
            return AutoDiffToy(self.val+other.val, self.der+other.der)

    def __radd__(self, other): #ensure commutativity of addition
        return self.__add__(other)

    def __mul__(self, other): #overload multiplication
        try:
            return AutoDiffToy(self.val*other.val, self.val*other.der+other.val*self.der)
        except AttributeError:
            other = AutoDiffToy(other, 0)
            return AutoDiffToy(self.val*other.val, self.val*other.der+other.val*self.der)

      def __rmul__(self, other):
        return self.__mul__(other)