Objectives:
To practice solving problems by modeling them as binary search problems.
Due date: Sunday, January 21th by 11:00pm.
Materials: starter code
What to turn in:
How to submit:
Use the canvas page
to submit your solution.
In this assignment, you will write code to fit a model of NFL team offensive and defensive quality so that it best matches the data from a collection of NFL games. To do so, you will implement the coordinate descent algorithm described in lecture, which solves the problem of fitting the model by using a sequence of calls to ternary search.
When you are done, you can use your model to simulate games between the teams using this web site (link). By performing a large number of simulations, the web site can also predict the win probability for each team (and the corresponding Vegas point spread).
You are provided with data from the entire 2016 NFL season along with the first six weeks of the 2017 NFL season. We will use the 2016 data to determine the penalty coefficient for our model (see below) that makes the best predictions. You will then fit the model with that penalty coefficient using the 2017 data in order to make predictions for weeks seven and later of the 2017 season.
Each row in the two provided data files describes a single "drive" that took place during an NFL game. The first three columns identify the team on offense, the team on defense, and the week of the NFL season (from 1 to 17). The last two columns give the "expected points" for the positions on the field where the drive started and ended. The expected points are the historical average number of points scored by an offense in that particular situation. A drive that starts closer to the opponent's goal line will have higher expected points at the start.
The goal of the model is to explain the change in expected points on each drive from where it starts to where it ends. Our model will have one parameter for each team's offense and one for each team's defense. There will also be one constant term, C, included for every drive. With 32 NFL teams, that is a total of 65 parameters.
For a drive with team A on offense and team B on defense, the model tries to explain the change in expected points as:
change ~ offenseA - defenseB + C
The quality of the fit of the model is described by the loss function. We will use a loss function that is the sum of the squared errors on each drive — i.e., the square of the difference between the actual change in expected points and the amount predicted by the formula above — and an additional term that is proportional to the sum of the absolute values of the parameters. The proportionality constant on the second term can be set by the user. The larger it is, the fewer non-zero parameters will appear in the model that best fits the data. (This model is just linear regression plus an "L1 regularization" term.)
To complete the assignment, you need to fill in the bodies of the three empty methods in TeamModeler.java. You can do so in any order. However, we recommend the following approach.
Implement findBestModel using coordinate descent. You can start from a model with all parameters set to zero. You can stop when the change in the model, as measured by the norm0 method applied to the difference between the two models, is less than the passed in tol parameter. On each iteration, call ternary search once for each of the 65 model parameters to find its best value with all other parameters fixed. You can do so in any order you choose, although you may find that some orders give better performance.
Note that the provided implementation of ternary search requires you to give the range over which to search for the minimum. Hence, you will need to decide what range to use in order to call the function. In this case, it is possible to use domain knowledge to limit the range. In particular, from the description above, you can put an absolute bound on how large or small any parameter could be.
To execute your code, you can use the run target of the build.xml file provided.
Implement train as described below and use it to figure out how sparse — specifically, how many non-zero parameters — gives the best predictions with six weeks of data.
Your method should examine each period of WEEKS+1 consecutive weeks in the passed-in (2016) data, not including week 17, where WEEKS is a static constant in the code. WEEKS is currently set to six, so these are seven week periods. You will use the first WEEKS weeks to fit the model using the penalty (described below) and the last week to test the fitted model produced.
For each period, you will also try a number of different penalties. Specifically, try each penalty coefficient from 0.050 down to 0.000, stepping by 0.001 at a time. In total, there are 10 different periods (testing in weeks 7 .. 16) and 51 different penalties, for a total of 510 different period/penalty combinations to try.
You can load all of the drives during a particular sequence of weeks using the loadDrives method, which is provided for you. To find the best model, use the findBestModel method you wrote in problem 1, passing in the drives from the first WEEKS weeks of the period only. To test the model on the last week, just call model.evalLoss(drives, 0.0), passing in the drives from the last week only. We'll call this last number the "test error". Note that the penalty is used to find the model, but not to test it.
To get the final numbers that you will output, we need to normalize the test errors. For each period, compute the minimum of all the test errors from that period. Subtract this from each of the test errors to get the normalized error for that period/penalty combination. (Since we are subtracting the minimum, one penalty from each period will have a normalized error of zero and every normalized error will be non-negative.)
For each period/penalty combination, you also need to record the number of non-zero parameters. (That's the quantity we're trying to optimize.) You can get that by calling model.countNonZeroParameters(0.005).
Finally, print out one line of output for each period/penalty combination, showing the number of non-zero parameters and the normalized error. Use the printf format "%2d %g", where the "%2d" is filled in with the number of non-zero coefficients and the "%g" is filled in with the normalized error.
To execute your code, you can use the train target of the build.xml file provided.
Now answer the following questions:
How does the speed of coordinate descent relate to the penalty used? That is, does it seem faster or slower in certain circumstances?
(Hint: The easiest way to figure this out is to temporarily add a line of code that prints after each model is computed. You should see a fairly clear pattern in how quickly these are printed after watching the computations for just a couple of periods.)
Create a scatter plot with the number of non-zero parameters on the X axis and the errors on the Y axis.
What number of non-zero parameters seems to give the least error?
Answer the following question:
Extra credit: Implement your solution in the method findBestSparseModel of TeamModeler.
The starter code is organized into the following classes:
The starter code also includes a build.xml file that includes these targets:
If you are working from the command-line, you can invoke
findBestModel by running the command ant -Dpenalty=X run
,
and you can invoke findBestSparseModel by running the command
ant -Dnum-nonzero=Y run
.