Solutions to Exercises
======================

Module 1
--------

Exercise 1: Motivating Automatic differentiation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A. Here is an example python function:
::
        import numpy as np
        import matplotlib.pyplot as plt

        def numerical_diff(f, h):
            def num_der(x):
                return (f(x+h)-f(x))/h
            return num_der

        nd = numerical_diff(np.log, 1E-1)
        nd2 = numerical_diff(np.log, 1E-7)
        nd3 = numerical_diff(np.log, 1E-15)

        x = np.linspace(.2, .4, 20)

        plt.plot(x, nd(x), label = 'h=.1')
        plt.plot(x, nd2(x), label = 'h=1e-7')
        plt.plot(x, nd3(x), label = 'h=1e-15')
        plt.plot(x, 1/x, '--', label='true')
        plt.legend()
        plt.xlabel('x')
        plt.ylabel('df/dx')

B. The corresponding graph output is:

.. image:: Mod1Ex1Sol.PNG

C.  The numerical differentiation performs best for h=1e-7.  For h too small, we encounter round off errors as a result of limited machine precision.  For h too large, the numerical approximation is not exact.  Automatic differentiation addresses this by evaluating derivatives exactly to machine precision without taking numerical estimates.

Exercise 2: Basic Graph Structure of Calculations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. image:: Mod1Ex2Sol.PNG

You should notice that while your labelling of the nodes may be different, the overall connectivity structure and edges should be the same.

Exercise 3: Looking Toward Multiple Inputs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. image:: Mod1Ex3Sol.PNG

Module 2
--------
Exercise 1: Neural Network Problem
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The corresponding computational graph is given by:

.. image:: Mod2Ex1Sol.png

The table corresponding to this graph is given by:

.. list-table::
        :widths: 5 20 20 20 20 20
        :header-rows: 1

        * - Trace
          - Elementary Function
          - Current Value
          - Elementary Function Derivative
          - :math:`\nabla_x` Evaluated at (x,y)
          - :math:`\nabla_y` Evaluated at (x,y)
        * - :math:`x_1`
          - :math:`x`
          - :math:`x`
          - 1
          - 1
          - 0
        * - :math:`x_2`
          - :math:`y`
          - :math:`y`
          - 1
          - 0
          - 1
        * - :math:`x_3`
          - :math:`w_{11} \times x_1`
          - :math:`w_{11}x`
          - :math:`w_{11}\dot{x_1}`
          - :math:`w_{11}`
          - 0
        * - :math:`x_4`
          - :math:`w_{21} \times x_1`
          - :math:`w_{21}x`
          - :math:`w_{21}\dot{x_1}`
          - :math:`w_{21}`
          - 0
        * - :math:`x_5`
          - :math:`w_{12} \times x_2`
          - :math:`w_{12}y`
          - :math:`w_{12}\dot{x_2}`
          - 0
          - :math:`w_{12}`
        * - :math:`x_6`
          - :math:`w_{22} \times x_2`
          - :math:`w_{22}y`
          - :math:`w_{22}\dot{x_2}`
          - 0
          - :math:`w_{22}`
        * - :math:`x_7`
          - :math:`x_3+x_5`
          - :math:`w_{11}x+w_{12}y`
          - :math:`\dot{x_3}+\dot{x_5}`
          - :math:`w_{11}`
          - :math:`w_{12}`
        * - :math:`x_8`
          - :math:`x_4+x_6`
          - :math:`w_{21}x+w_{22}y`
          - :math:`\dot{x_4}+\dot{x_6}`
          - :math:`w_{21}`
          - :math:`w_{22}`
        * - :math:`x_9`
          - :math:`z(x_7)`
          - :math:`z(w_{11}x+w_{12}y)`
          - :math:`z^\prime (x_7)\dot{x_7}`
          - :math:`z^\prime (w_{11}x+w_{12}y)w_{11}`
          - :math:`z^\prime (w_{11}x+w_{12}y)w_{12}`
        * - :math:`x_{10}`
          - :math:`z(x_8)`
          - :math:`z(w_{21}x+w_{22}y)`
          - :math:`z^\prime (x_8)\dot{x_8}`
          - :math:`z^\prime (w_{21}x+w_{22}y)w_{21}`
          - :math:`z^\prime (w_{21}x+w_{22}y)w_{22}`
        * - :math:`x_{11}`
          - :math:`w_{out,1}\times x_9`
          - :math:`w_{out,1}z(w_{11}x+w_{12}y)`
          - :math:`w_{out, 1}\dot{x_9}`
          - :math:`w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{11}`
          - :math:`w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{12}`
        * - :math:`x_{12}`
          - :math:`w_{out,2}\times x_{10}`
          - :math:`w_{out,2}z(w_{21}x+w_{22}y)`
          - :math:`w_{out, 2}\dot{x_{10}}`
          - :math:`w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{21}`
          - :math:`w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{22}`
        * - :math:`x_{13}`
          - :math:`x_{11}+x_{12}`
          - :math:`w_{out,1}z(w_{11}x+w_{12}y)+w_{out,2}z(w_{21}x+w_{22}y)`
          - :math:`\dot{x_{11}}+\dot{x_{12}}`
          - :math:`w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{11}+w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{21}`
          - :math:`w_{out,1}z^\prime (w_{11}x+w_{12}y)w_{12}+w_{out,2}z^\prime (w_{21}x+w_{22}y)w_{22}`
            

Exercise 2: Operation Count Problem
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Computing the derivative at the six nodes :math:`x_3, x_4, x_5, x_6, x_{11}, x_{12}` requires one multiplication in each of the 2 components, contributing :math:`6 \times 1 \times 2 = 12` operations.

Computing the derivative at the 3 nodes :math:`x_7, x_8, x_{13}` requires one addition in each of the 2 components, contributing :math:`3\times 1 \times 2 = 6` operations.

Computing the derivative at the nodes :math:`x_9, x_{10}` requires 2 operations (an elementary function evaluation of :math:`z^\prime` and a multiplication) in each of the 2 components, contributing :math:`2\times 2 \times 2 = 8` operations.

This gives us a total of 26 operations.


Module 3
--------
Exercise 1: Reverse Mode Computational Table and Derivatives
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The reverse computational table is given by:

.. list-table::
        :widths: 5 10 10 10 10 10 10
        :header-rows: 1

        * - Node
          - Current Value
          - Numerical Value
          - :math:`\partial_1`
          - :math:`\partial_1` Value
          - :math:`\partial_2`
          - :math:`\partial_2` Value
        * - :math:`x_1`
          - :math:`x`
          - 1
          - 1
          - 1
          - 0
          - 0
        * - :math:`x_2`
          - :math:`y`
          - 2
          - 0
          - 0
          - 1
          - 1
        * - :math:`x_3`
          - :math:`x_1x_2`
          - 2
          - :math:`x_2`
          - 2
          - :math:`x_1`
          - 1
        * - :math:`x_4`
          - :math:`\exp(x_3)`
          - :math:`e^2`
          - :math:`\exp(x_3)`
          - :math:`e^2`
          - -
          - -
        * - :math:`x_5`
          - :math:`x_3+x_4`
          - 2+:math:`e^2`
          - 1
          - 1
          - 1
          - 1

We can now trace back through the table to find the adjoints.

.. math::
        \bar{x_5} = \frac{\partial f}{\partial x_5} = 1
        \bar{x_4} = \frac{\partial f}{\partial x_5}\frac{\partial x_5}{\partial x_4} = 1 \cdot 1 = 1
        \bar{x_3} = \frac{\partial f}{\partial x_4}\frac{\partial x_4}{\partial x_5}+\frac{\partial f}{\partial x_5}\frac{\partial x_5}{\partial x_3} = 1\cdot e^2 + 1\cdot 1 = 1+e^2
        \bar{x_2} = \frac{\partial f}{\partial x_3}\frac{\partial x_3}{\partial x_2} = (1+e^2)x_1 = 1+e^2
        \bar{x_1} = \frac{\partial f}{\partial x_3}\frac{\partial x_3}{\partial x_1} = (1+e^2)x_2 = 2+2e^2

We note that in our bar notation we have :math:`\bar{x_1} = \frac{\partial f}{\partial x}` and :math:`\bar{x_2} = \frac{\partial f}{\partial y}`.


Beyond the Basics
-----------------
Exercise 1: Dual Numbers
^^^^^^^^^^^^^^^^^^^^^^^^
To find the derivative, we look to the dual part when we replace x with :math:`a+b\epsilon`.  For our function,

.. math::
        y= e^{(a+b\epsilon)^2}
        = e^{a^2}e^{2ab\epsilon}e^{b^2\epsilon^2}

Since :math:`\epsilon^2=0`,

.. math::
        =e^{a^2}e^{2ab\epsilon}

Expanding :math:`e^{2ab\epsilon}` using a Taylor series,

.. math::
        = e^{a^2}\left( 1+ 2ab\epsilon + \frac{(2ab\epsilon)^2}{2} + \cdots \right)
        = e^{a^2}+2ae^{a^2}b\epsilon

So we have that the derivative evaluated at a is :math:`2ae^{a^2}`.

Exercise 2: Toy AD Example
^^^^^^^^^^^^^^^^^^^^^^^^^^

An example AutoDiffToy class could look like:      
::
      class AutoDiffToy():
          """ Creates an object for autodifferentiation.

          ATTRIBUTES
          ==========
          val : the value of the object
          der : the derivative of the object

          EXAMPLES
          ========
          >>> x = AutoDiffToy(4)
          >>> x.val
          4
          >>> x.der
          1
          """

          def __init__(self, a, d=1.0):
              self.val = a
              self.der = d

          def __add__(self, other): #overload addition
              try:
                  return AutoDiffToy(self.val+other.val, self.der+other.der)
              except AttributeError:
                  other = AutoDiffToy(other, 0) # derivative of a constant is zero
                  return AutoDiffToy(self.val+other.val, self.der+other.der)

          def __radd__(self, other): #ensure commutativity of addition
              return self.__add__(other)

          def __mul__(self, other): #overload multiplication
              try:
                  return AutoDiffToy(self.val*other.val, self.val*other.der+other.val*self.der)
              except AttributeError:
                  other = AutoDiffToy(other, 0)
                  return AutoDiffToy(self.val*other.val, self.val*other.der+other.val*self.der)

            def __rmul__(self, other):
              return self.__mul__(other)