Defining the Network

Here we define a simple 1-hidden-layer neural network for classification on MNIST. It takes a parameter that determines the hidden size of the hidden layer.

Instantiating the Networks

We will consider three networks.

  1. One that only has a single hidden unit and all of its weights are initialized to 0.
  2. One that has 64 hidden units and all of its weights are initialized to 0.
  3. One that has 64 hidden units and the weights are initialized randomly.

Training

We will train all three networks simulateneously using the same learning rate. After each epoch, we print the current loss of each network.

Tensor and Layer sizes

Below is an implementation of the network from the section handout. In the forward pass, there are print statements to print the size of the data as it flows through the network and the size of the weights and biases of the layers. Note that this network is just for demonstration and would not work well at all in practice.