CSE 542, Spring 2026 Statistical Reinforcement Learning

Lecture: Monday, Wednesday 10:00–11:20 AM, ECE 045

Contact: cse542-staff@cs.washington.edu

TA office hours:

Mars Gao (marsgao@cs.washington.edu): TBD
Kevin Huang (kehuang@cs.washington.edu): TBD

Instructor office hours:

Kevin Jamieson: Tuesday 2:30–3:30, CSE 340

About the Course and Prerequisites

Reinforcement learning (RL) is the study of how an agent should act in an uncertain, sequential environment to maximize cumulative reward. This course develops the mathematical and algorithmic foundations of RL, with an emphasis on provably efficient methods and their theoretical guarantees. We will progress from the classical tabular setting to modern function approximation, studying both offline (batch) and online (interactive) learning paradigms along the way.

The field of RL theory has matured rapidly in recent years, yielding a rich set of tools spanning Markov decision process (MDP) theory, statistical learning, and online learning. This course will equip you with the core ideas and proof techniques that underlie this literature. By the end of the course, you will be positioned to read and contribute to research in statistical reinforcement learning.

We will cover selected topics from [AgarwalBrantleyJiangKakadeSun]:

MDP fundamentals: value functions, Bellman equations, policy and value iteration
Offline RL: fitted value iteration, offline policy evaluation and optimization
Online RL (tabular): model-based exploration, UCB-style algorithms, sample complexity
Linear function approximation: linear MDPs, LSVI-UCB
General function approximation: Bellman rank, Eluder dimension, provably efficient algorithms

Prerequisites: The course will make frequent references to introductory concepts of machine learning (e.g., CSE 446/546) but it is not a prerequisite. Fluency in basic concepts from linear algebra, statistics, and calculus will be assumed (see HW0). Some review materials:

Linear Algebra Review by Zico Kolter and Chuong Do.
Linear Algebra, David Cherney, Tom Denton, Rohit Thomas and Andrew Waldron. Introductory linear algebra text.
Probability Review by Arian Maleki and Tom Do.

Students should also be familiar with concentration inequalities (e.g., Hoeffding, Bernstein, Azuma–Hoeffding); see Chapter 5 of [SzepesvariLattimore] and Chapter 1 of [Jamieson] for a review. Some background in online learning and multi-armed bandits is strongly recommended. Students who have not taken CSE 541 or equivalent may wish to consult the CSE 541 (Winter 2026) course website and the textbook [SzepesvariLattimore] for a self-contained introduction to bandits. You are strongly encouraged to complete the self-test of fundamental prerequisites on your own (not to be turned in or graded). You should be able to complete most of these in your head or with minimal computation.

Class Materials

The course will pull from the following textbook and course notes.

[AgarwalBrantleyJiangKakadeSun] Reinforcement Learning: Theory and Algorithms, Alekh Agarwal, Kianté Brantley, Nan Jiang, Sham M. Kakade, Wen Sun
[SzepesvariLattimore] Bandit Algorithms, Csaba Szepesvari and Tor Lattimore (recommended background reference)
[Jamieson] Informal lecture notes on bandits, Kevin Jamieson

Additional resources (not used directly in the course, but useful especially for practical implementation of RL):

[SuttonBarto] Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto
[Szepesvari] Algorithms for Reinforcement Learning, Csaba Szepesvári
[Murphy] Reinforcement Learning: An Overview, Kevin Murphy

Discussion Forum and Email Communication

We will use Ed as a discussion board (you should have received an invite if registered for the course, otherwise email the instructor). We will not be using Canvas discussion board. Ed is your first resource for questions. For private or confidential questions email cse542-staff@cs.washington.edu or the instructor directly. You may also get messages to the instructor through anonymous course feedback (though, I cannot respond to you personally so this is far from ideal).

Grading and Evaluation

There will be 3 homeworks (each worth 20%) and a final project worth 40%. The final project involves choosing a topic in reinforcement learning and writing a summary and literature review of that area — think of it as preparing a primer for someone about to enter research in that field. Details forthcoming.

Submission Guidelines

Each homework assignment will be submitted as a single PDF to Gradescope. Any code for a programming problem should come at the end of the problem, after any requested figures for the problem. You will receive an email invite once you join the course -- if not please let me know! We expect all assignments to be typeset (i.e., no photos or scans of written work). This can be done in an editor like Microsoft Word or Latex (highly recommended). There exist convenient packages for listing Python code in Latex.

Regrades: If you feel that we have made an error in grading your homework, please submit a regrade request via Gradescope, and we will consider your request. Please note that regrading of a homework may cause your grade to go up or down.
Here is Gradescope help.
You will automatically be enrolled in Gradescope.

Latex resources:

Learn Latex in 30 minutes
Overleaf. An online Latex editor.
Standalone Latex editor on your local machine
Latex Math symbols
Detexify LaTeX handwritten symbol recognition

Collaboration Policy

Homeworks must be done individually: each student must hand in their own answers. In addition, each student must write their own code in a programming part of the assignment. It is acceptable, however, for students to collaborate in figuring out answers and helping each other solve the problems. You also must indicate on each homework with whom you collaborated. If you ever find yourself copy and pasting code / latex / whatever, you have crossed the line of the collaboration policy. You may use LLMs (e.g., ChatGPT) while completing homeworks, and you are encouraged to use them as a learning aid. However, if you use an LLM you must attach a link to the full transcript (supported by all major chatbots). This is to protect you: a transcript you are comfortable sharing is a reasonable check that the LLM helped you learn rather than did the work for you. If you find yourself copying and pasting parts of the assignment into the prompt or receiving substantial, non-boilerplate code or derivations from the LLM, you have crossed the line.

Violations of the collaboration or LLM policy are taken very seriously. If a violation is even suspected, I am obligated to report it to the Community Standards and Student Conduct office, and I will err on the side of caution. If LLM use is suspected and no transcript is attached, a report will be filed. You know the difference between an LLM helping you learn and an LLM doing it for you but if in doubt, ask on Ed!

The homework problems have been carefully chosen for their pedagogical value and hence might be similar or identical to those given out in past offerings of this course at UW, or similar courses at other schools. Using any pre-existing solutions from these sources, from the Web or other textbooks constitutes a violation of the academic integrity expected of you and is strictly prohibited.

Late Policy (Homeworks Only)

If you need an extra 24 hours on any homework assignment, that's no big deal and you do not require prior permission. If you need multiple days due to personal reasons, please email the instructor before the due date. I will try to be accommodating but please don't abuse it, I will evaluate requests as I get them. Any homework turned in more than 24 hours after the due date without getting permission from the instructor first will be considered late and may receive zero credit. The final project has its own deadlines and the late policy does not apply — please plan accordingly.

Regrading Requests

All requests for regrading should be submitted to Gradescope directly. Office hours and in person discussions are limited solely to asking knowledge related questions, not grade related questions. If you feel that we have made an error in grading your homework, please let us know with a written explanation, and we will consider the request. Please note that regrading of a homework means the entire assignment may be regraded which may cause your grade on the entire homework set to go up or down. Regrade requests must be submitted within 7 days (24*7 hours) of the time in which grades are released.

Assignments

Homework 0: (Self-examination, Not due but recommend you complete within the first week) PDF
Homework 1: Due date TBD.
Homework 2: Due date TBD.
Homework 3: Due date TBD.
Final Project: Details forthcoming.

Schedule

Lecture 1: 3/30

Welcome, logistics, overview of course topics, MDP basics
Review prerequisites and "self-test" of above on your own (not to be turned in)
Lecture notes: PDF