Preamble

This notebook explores the backpropagation algorithm and the use of PyTorch for neural networks.

Last updated by Ethan Chau, November 2020.

Backpropagation and Computation Graphs in PyTorch

This section visualizes the backpropagation algorithm as it occurs in PyTorch. We begin with an example of the computation graph for simple functions, then apply to a neural network with more complicated derivatives.

Side note: if we don't want gradients, we can switch them off with the torch.no_grad() flag.

A More Complex Function

Recall the following function: $$f(x, y) = \frac{6 \exp (-y)}{1 + x^2 + y^2} + 2x^3$$ Let's see how the computational graph looks, in action.

First, we'll see that breaking up the computation yields the same results.

Now, let's visualize what the computation graph looks like.

Finally, we'll see that the gradients that PyTorch computes w.r.t. each parameter can be used to reconstruct the autograd-based solution, which is also equal to the closed form solutions: $$\nabla_x f(x, y) = 6x^2 - \frac{12x \exp(-y)}{(x^2 + y^2 + 1)^2}$$ $$\nabla_y f(x, y) = - \frac{6 \exp (-y) (x^2 + (y + 1)^2)}{(x^2 + y^2 + 1)^2}$$

Neural Network

Let's implement the neural network from class:Capture.PNG

Note that we can write this as follows (why?): $$h_\theta (x) = g(b_2 + W_2^T g(b_1 + W_1^T x))$$ where $g$ is the activation function, $x \in \mathbb{R}^3$, $W_1 \in \mathbb{R}^{3 \times 3}$, $W_2 \in \mathbb{R}^{3 \times 1}$, $b_1 \in \mathbb{R}^3$, and $b_2 \in \mathbb{R}$.

Now, let's backpropagate the gradients through our graph, automagically.

Training a Neural Network

This section demonstrates how we can train a neural network from scratch with PyTorch.

It is copied from the PyTorch tutorial for compactness.

First, let's generate some data based on the function: $$y(x) = 4 \sin(x \pi) \cos(6\pi x^2)$$ for random values of $x$.

Here we define a simple two hidden layer neural network with Tanh activations. There are a few hyper parameters to play with to get a feel for how they change the results.

CrossEntropyLoss

So far, we have been considering regression tasks and have used the MSELoss module. For the homework, we will be performing a classification task and will use the cross entropy loss.

PyTorch implements a version of the cross entropy loss in one module called CrossEntropyLoss. Its usage is slightly different than MSE, so we will break it down here.