Fall, 2022
 

Lecture time and place: Mondays / Wednesdays, 1:30-2:50pm, Bill and Melinda Gates Center (CSE2) G04

Instructor: Byron Boots
office: Bill and Melinda Gates Center (CSE2) 210
office hours: Wednesdays 3:00pm-4:00pm

TA: Anqi Li, Office Hours: Tuesdays 11:10am-12:10pm in Allen (CSE) 218, Fridays 2:00-3:00pm zoom

TA: Adam Fishman, Office Hours: Mondays from 5:00-6:00pm in Gates (CSE2) 121, Thursdays 2:00-3:00pm zoom

Announcements will be posted via Canvas.

Ed discussion board (link): All questions that are not of a personal nature should be posted to the discussion board.

Contact: course staff can be reached at cse579-staff@cs.washington.edu. Please communicate with the instructor and TAs only through this email.

Submit anonymous feedback here

Description: A growing number of state-of-the-art systems including field robots, acrobatic aerial vehicles, walking robots, and computer programs for games (Chess, Hex, Go, StarCraft) rely on machine learning to make decisions. The machine learning problems in these domains represent a fundamental departure from traditional classification and regression problems. The learner must contend with: a) the effect of their own actions on the world; b) sequential decision making and credit assignment; and c) the tradeoffs between exploration and exploitation. In the past ten years, the understanding of these problems have developed dramatically. One key to the advance of learning methods has been a tight integration with optimization techniques, and we will focus on this throughout the course.

This course is directed to graduate students who want to build adaptive software that interacts with the world. Although much of the material will be driven by robotics applications, anyone interested in applying learning to decision-making or an interest in complex adaptive systems is welcome.

Suggested readings will be posted in the schedule below.

Prerequesits
 

As an advanced course, familiarity with basic ideas from probability, machine learning, and decision making/control will all be helpful. As the course will be project driven, prototyping skills including C, C++, and Python will also be important. Creative thought and enthusiasm are required.

Schedule
Date Due Topic Reading Material HW
09/28/22 Course Overview Course Conduct
10/03/22 MDPs, Value Iteration Notes on Markov Decision Problems
MDP Slides -- Dan Klein
How to Design Good Tetris Players
Probabilistic Robotics, Chapter 14
HW1
10/05/22 Q-Functions, Policy Iteration Notes on Policy Iteration
Policy Iteration Slides -- Dan Klein
Think about projects!!
10/10/22 The Linear Quadratic Regulator Notes on Linear Quadratic Regulators
LQR Slides -- Pieter Abbeel
RL for Helicopter Flight
10/12/22 No Class (Instructor Travel)
10/17/22 HW1 Time Varying Systems, Affine Quadratic Regulation, Tracking with LQR Notes on Linear Quadratic Regulators
Sequential Compositions of Behaviors
Speeding Up Dynamic Programming
LQR Trees
HW2
10/19/22 Iterative LQR, Receding Horizon Control / Model Predictive Control Notes on Linear Quadratic Regulators
Receding Horizon DDP
Differentiable MPC in Pytorch
An Online Learning Approach to Model Predictive Contol
10/24/22 Inverse Optimal Control Notes on Imitation Learning
Maximum Entopy IOC
10/26/22 Fitted Q-Iteration Notes on Approximate Dynamic Programming
Learning to Drive a Real Car
Generalization in RL
Stable Function Approximation
10/31/22 HW2 Approximate Policy Iteration Notes on Approximate Dynamic Programming
API Survey
HW3
11/02/22 TD Learning, Eligibility Traces Notes on TD, Q-Learning
Sutton & Barto: Ch. 6
11/07/22 Project Proposal SARSA, Q-Learning, Replay Buffers Notes on TD, Q-Learning
Deep Q-Learning
11/09/22 Brute Force Simulation-Based Policy Search: Cross Entropy, Nelder Mead Notes on Black Box Optimization
Nelder Mead -- Wikipedia
PEGASUS
CEM
Optimization Stories
11/14/22 Backpropagation Notes on Backpropagation
Deep Learning: Ch. 6
Blog Post on the Adjoint Method
Fluid Control
11/16/22 Policy Gradients, Actor Critic Notes on Policy Gradients
Sutton & Barto: Ch. 13
REINFORCE
Policy Gradient Methods -- Sutton et al.
Policy Gradient Slides -- Levine
11/21/22 HW3 Natural Policy Gradient Notes on Policy Gradients
Natural Policy Gradient
Covariant Policy Search
Natural Actor Critic
Trust Region Policy Optimization
Actor Critic Slides -- Levine
11/23/22 No Class (Instructor Travel/Thanksgiving)
11/28/22 Online Learning, Imitation Learning, DAgger, AggreVaTeD Notes on Imitation Learning
DAgger
AggreVateD
11/30/22 Iterative Learning Control Notes on Iterative Learning Control
Using Inaccurate Models in RL
DAgger for SysID
12/05/22 Student Project Presentations
12/07/22 Student Project Presentations
12/12/22 Project Report Due at 11:59pm

Grading
 

Final grades will be based on course projects (40%) and homework assignments (60%).

Typsetting your homework solutions in LaTex is required.

Late homework policy: Assignments are due at the beginning of class on the day that they are due. You will be allowed 3 total late days without penalty for the entire quarter. Please use these wisely, and plan ahead for conferences, travel, deadlines, unanticipated emergencies, etc. Once those days are used, you will be penalized according to the following policy:

  • Homework is worth full credit at the beginning of class on the due date.
  • It is worth half credit for the next 48 hours.
  • It is worth zero credit after that.

Collaboration on homework: I expect that each student will conduct themself with integrity. You are researchers-in-training, and I expect that you understand proper attribution and the importance of intellectual honesty. While you are certainly encouraged to read outside sources for a deeper understanding of the course material, if you do use materials in the preparation of an assignment, they must be acknowledged clearly with an appropriate citation. Unless otherwise specified, homeworks will be done individually and each student must hand in their own assignment. It is acceptable, however, for students to collaborate in figuring out answers and helping each other understand the underlying concepts. When collaborating, the "whiteboard policy" is in effect: You may discuss assignments on a whiteboard, but, at the end of a discussion the whiteboard must be erased, and you must not transcribe or take with you anything that has been written on the board during your discussion. You must be able to reproduce the results solely on your own after any such discussion. Finally, you must write the names of the students you collaborated with on each homework.

Audit policy: If you wish to audit the course, you must either:

  • Do two homework assignments.
  • Do the course project

Disclaimer: I reserve the right to modify any of these plans as need be during the course of the class; however, I won't do anything capriciously, anything I do change won't be too drastic, and you'll be informed as far in advance as possible.

Projects
 

The course project is an opportunity for you to deeply explore one (or several) of the techniques covered in class and apply them to a problem that is of interest to you. Since the projects require a substantial amount of work, you may form groups of up to three students. The research topic is up to you, as long as it makes use of adaptive control or RL methods.

Project proposals: Your proposal should be 2-3 pages, and it should introduce the problem you are trying to solve, the approach you will take, and also address the following questions:

  • What are some impacts of this research?
  • What is novel about the approach you are taking?
  • How do learning and/or probabilistic inference techniques play a key role?
  • What is your metric for success?
  • What are key technical issues you will have to confront? Are there any other big challenges?
  • What software or datasets will you use?
  • What is your timeline? Include specific targets for the progress report.

Note on current research: You may use your current research as a course project, as long as you explore a new area of the problem, and you cannot use previous results. Your proposal should clearly state what novel part you will be tackling in your course project.

Final presentations: You’ll present your findings to the class at the end of the semester. This will be a presentation:

  • No more than 5 minutes! There will be a hard cutoff.
  • No more than 5 slides, exluding title slide.
  • Every group member must speak.
  • You must send me a copy of the slides in advance (noon on Monday).
  • Don't "decorate" your slides with equations. If there is an equation, I expect you to explain every variable.
  • Don't read your slides / show lots of text. Slides should contain brief, salient points.

Final Report: The final report will consist of one deliverable:

  1. Written report: This is the detailed report of your approach and findings. You should re-state the problem you are solving and your approach, and summarize your results. The report should be no longer than a NeurIPS paper in size (8 pages including figures and tables), but a shorter and more concrete report is preferred.

Sample Projects: You should connect RL to your own research, if possible. If you don't want to do that or the connection is hard to make, here are some examples of projects that might be appropriate. These are just suggestions!

  • Train autonomous cars to navigate in CARLA
  • Learn how to race in Open-Ai gym with deep Q-learning
  • Imitation learning for aerial vehicle control
  • Train a starcraft II agent to win a minigame
  • Train a reactive controller to avoid obstacles in FlightGoggles
  • Consider how to safely explore in RL
  • Control a MuSHR car with MPC
  • Show how curriculum learning can help with difficult games

Example Environments:

Here are some enviornments that you can use for training an RL agent. You are by no means required to use any of these simulation environments. If you find other environments, feel free to share them on Ed.

Course Conduct
 

We take academic integrity very seriously. Behaving with integrity is part of our responsibility to our shared learning community. Please read the UW Student Conduct Code: Academic Misconduct for more information. If you’re uncertain about if something is academic misconduct, please ask the instructor or TAs. We are happy to discuss any questions you may have.

This course welcomes all students of all backgrounds. The computer science and computer engineering industries have significant lack of diversity. This is due to a lack of sufficient past efforts by the field toward even greater diversity, equity, and inclusion. The Allen School seeks to create a more diverse, inclusive, and equitable environment for our community and our field. You should expect and demand to be treated by your classmates and the course staff with respect. If any incident occurs that challenges this commitment to a supportive, diverse, inclusive, and equitable environment, please let the instructor know so the issue can be addressed.

University policy prohibits all forms of sexual harassment. If you feel you have been a victim of sexual harassment or if you feel you have been discriminated against, you may speak with your instructor, teaching assistant, the chair of the department, or you can file a complaint with the UW Ombudsman's Office for Sexual Harassment. Their office is located at 339 HUB, (206) 543-6028. There is a second office, the University Complaint Investigation and Resolution Office (UCIRO), who also investigate complaints. The UCIRO is located at 22 Gerberding Hall. Please see additional resources at:
http://www.washington.edu/about/ombudsman/role.html and http://f2.washington.edu/treasury/riskmgmt/UCIRO .

Accessibility & Accommodations
 

Embedded in the core values of the University of Washington is a commitment to ensuring access to a quality higher education experience for a diverse student population. Disability Resources for Students (DRS) recognizes disability as an aspect of diversity that is integral to society and to our campus community. DRS serves as a partner in fostering an inclusive and equitable environment for all University of Washington students. The DRS office is in 011 Mary Gates Hall. Please see the UW resources at: http://depts.washington.edu/uwdrs/current-students/accommodations/.

Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW’s policy, including more information about how to request an accommodation, is available at Religious Accommodations Policy:
(https://registrar.washington.edu/staffandfaculty/religious-accommodations-policy/).
Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form: (https://registrar.washington.edu/students/religious-accommodations-request/).

Acknowledgements
 

Assignments, lectures, and ideas on this syllabus are partially adapted from Drew Bagnell's course at Carnegie Mellon University. I would like to thank Drew for helpful discussions and access to his course materials.

The University of Washington acknowledges the Coast Salish peoples of this land, the land which touches the shared waters of all tribes and bands within the Suquamish, Tulalip and Muckleshoot nations.