CSE 470 — HW 6

Due: June 1st on Canvas

Introduction

The purpose of this assignment is to introduce two sides of the problem of running neural networks efficiently: fast software, and fast hardware. The first half focuses on the low-level software side of the challenge (scheduling a computation). In this portion, we will implement an operator commonly found in vision models: 2D convolution. (You can refer to these online articles [cs231n, TDS] to understand the 2D convolution operator used in this assignment.) The critical concept in this portion is the schedule, which is an abstraction that defines the ordering of the compute operations used to perform the convolution. For example, nested loops are an example representation of a schedule, and reordering the loop nest is an example of a schedule transformation. First, we will define the arithmetic expression describing convolution to produce a baseline schedule that enables us to run the code on a CPU. Next, we will manipulate the schedule using TVM schedule primitives to improve the performance of 2D convolution. For understanding the basic usage and power of TVM, we recommend reading this tutorial.

The second half focuses on the hardware side of the challenge (choosing a good hardware design). We will provide a data log containing performance profiles of a collection of VTA hardware designs. The goal is to build a reasonably accurate statistical performance model of each hardware design from the collected data. The vta.py contains a skeleton implementation that uses a simple linear model for performance that you will extend to improve.

Prerequisites: TVM & Tool Installation

This assignment requires a TVM installation and a working Python3 environment. We recommend you do it in a Linux or OSX platform. TVM requires an installation of llvm (version 4.0 or newer). You can install llvm via homebrew on a macOS system or by using the nightly packages. TVM can be downloaded by pulling the repository from git git clone --recursive https://github.com/apache/incubator-tvm. To build TVM, first create a build directory build/ in the project directory. Copy the cmake config cmake/config.cmake to the build directory and set the USE_LLVM variable to the path to your llvm-config binary. (You can just set it to ON if llvm-config is in your path). Run cmake .. from within the build directory to generate the Makefile for building tvm. Build tvm with make -j$(nrpoc) (e.g. make -j4) or the amount of usable parallelism on your system. Install the tvm python dependencies: pip3 install --user numpy decorator attrs. Finally, set the correct environment variables for using tvm: export LD_LIBRARY_PATH=/path/to/tvm/build:$LD_LIBRARY_PATH and export PYTHONPATH=/path/to/tvm/python:/path/to/tvm/topi/python:$PYTHONPATH. You can verify that the setup is correct by opening a python3 interpreter and running import tvm.

For further reference, see the install tvm from source docs here.

Files

Download the Python files and data you need for the assignment here: hw6-specialized-hardware-compiler.tar.gz.

First Half:

check conv2d.py and follow the directions in the file.

Second Half:

Check vta.py and follow the directions in the file. Note, vta.py will take the evaluation data: vta_data.csv.

Turn-in

Canvas: Zip your modified conv2d.py, vta.py, and your write up report.pdf into one single file hw6.zip with command e.g. zip hw6.zip conv2d.py vta.py report.pdf.

Note: You will receive full grade if your implementation is correct and achieve a reasonable speedup and accuracy for conv2d and vta model. TA will give you 10 bonus points if your vta model has less than 15% error rate.

Writeup: Please explain what you did in the conv2d.py and vta.py in a single PDF file report.pdf. Remember to include your speedup for conv2d and the error rate for vta. Please also describe your process when doing this assignment, e.g. what happened when you did A, what you learned when you did B.

HW 6 Optimizing Deep Learning Hardware and Software CSE 470