CSE 548 — HW 4

Due: May 31st via Canvas

Introduction

The purpose of this assignment is to introduce two sides of the problem of running neural networks efficiently: fast software, and fast hardware. The first half focuses on the low-level software side of the challenge (scheduling a computation). In this portion, we will implement an operator commonly found in vision models: 2D convolution. The critical concept in this portion is the schedule, which is an abstraction that defines the ordering of the compute operations used to perform the convolution. For example, nested loops are an example representation of a schedule, and reordering the loop nest is an example of a schedule transformation. First, we will define the arithmetic expression describing convolution to produce a baseline schedule that enables us to run the code on a CPU. Next, we will manipulate the schedule using TVM schedule primitives to improve the performance of 2D convolution. You can find a very detailed tutorial for a similar operator (matrix multiplication) on the TVM website.

The second half focuses on the hardware side of the challenge (choosing a good hardware design). We will provide a data log containing performance profiles of a collection of VTA hardware designs. The goal is to build a reasonably accurate statistical performance model of each hardware design from the collected data. The vta.py contains a skeleton implementation that uses a simple linear model for performance that you will extend to improve.

Prerequisites: TVM & Tool Installation

This assignment requires a TVM installation and working Python3 environment. TVM requires an installation of llvm (version 4.0 or newer). You can install llvm via homebrew on a macOS system or by using the nightly packages. TVM can be downloaded by pulling the repository from git git clone --recursive https://github.com/dmlc/tvm.git. To build TVM, first create a build directory build/ in the project directory. Copy the cmake config config.cmake to the build directory and set the USE_LLVM variable to the path to your llvm-config binary. (You can just set it to ON if llvm-config is in your path). Run cmake .. from within the build directory to generate the makefile for building tvm. Build tvm with make -j$(nrpoc) or the amount of usable parallelism on your system. Install the tvm python dependencies: pip3 install --user numpy decorator attrs. Finally, set the correct environment variables for using tvm: export LD_LIBRARY_PATH=/path/to/tvm/build:$LD_LIBRARY_PATH and export PYTHONPATH=/path/to/tvm/python:/path/to/tvm/topi/python:$PYTHONPATH. You can verify that the setup is correct by opening a python3 interpreter and running import tvm.

For further reference, see the install tvm from source docs here.

First Half:

Download conv2d.py and follow the directions in the file.

Second Half:

Download vta.py and follow the directions in the file. Note, you will also need the evaluation data: vta_data.csv

Turn-in

Code: Tar and gzip your modified conv2d.py, along with your vta.py.

Writeup: Inline your comments in the conv2d.py and vta.py file.

HW 4 Optimizing Deep Learning Hardware and Software 548

Introduction

Prerequisites: TVM & Tool Installation

First Half:

Second Half:

Turn-in

HW 4 Optimizing Deep Learning Hardware and Software
548