CSEP 546: Data Mining (Autumn 2017)
Assignment 3: Neural Networks & Ensemble Methods

Due: Sunday, November 19, 2017 at 11:59pm PST.

Problem 1: Neural Networks and Backpropagation

For this question you will implement backpropagation and train a multi-layer perceptron (MLP) to classify digit images using the classical MNIST dataset.

Problem writeup:

1.1 Write a paragraph describing your design choices. In particular, specify all your parameter choices: your learning rate (or learning rate scheme), mini-batch size, initialization scheme.

1.2 For each of the 6 networks trained (5 different values of $n_h$ and the ReLU network), plot the squared loss after every half epoch (starting with your initial squared error). Please label your axes in a readable way. Plot the loss of both the training and test losses on the same plot.

1.3 Do the same as 1.2 for the 0/1 loss (i.e. 1 - accuracy), but this time start when the loss is below 7% (or when $\frac{2}{3}$ of epochs have elapsed, whichever comes first) to make to plot more readable.

1.4 What is your final squared loss and 0/1 loss for both the training and test sets for each network?

1.5 How does using ReLUs compare to using the sigmoid function? Why?

Problem 2: Ensemble Methods

Consider an ensemble learning algorithm that uses simple majority voting among $K$ learned hypotheses. Suppose that each hypothesis has error $\epsilon$ and that the errors made by each hypothesis are independent of the others'.

2.1 Calculate a formula for the error of the ensemble algorithm in terms of $K$ and $\epsilon$, and evaluate it for the case where $K$ = 5, 10 and 20 and $\epsilon$ = 0.1, 0.2 and 0.4.

2.2 If the independence assumption is removed, is it possible for the ensemble error to be worse than $\epsilon$? Justify you answer.