Schedule

This schedule is tentative and subject to change.

DATETOPICDUE
January 4 Introduction: Why is Deep Learning a good tool for Robotics? [slides] [recording]
※ Intelligence without representation
※ 
The free-energy principle: a unified brain theory?
※ 
Computing Machinery and Intelligence
 Optional Readings
   On the measure of intelligence
   
From Socrates to expert systems: The limits of calculative rationality
   
A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence
   
Reforcement Learning in the brain
   
Does intelligence require a body
January 9 Reinforcement Learning - Policy Gradient [slides] [recording]
※ What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
※ 
Approximately Optimal Approximate RL
※ 
Mirage of Action Dependent Baselines
※ 
Scalable Trust-Region Method for Deep Reinforcement Learning using Kronecker-factored Approximation
 Optional Readings
   Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
   
Sample Efficient Actor Critic with Experience Replay
   
Implementation Matters in Deep RL: A Case Study on PPO and TRPO
   
Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
   
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
January 11 Student-lead Discussion 1
※ A Closer Look at Deep Policy Gradients
※ 
Backpropagation Through the Void
January 16 Holiday: Martin Luther King Jr. Day
January 18 Reinforcement Learning - Off-policy Methods [slides] [recording]
※ QT-opt
※ 
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
※ 
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
※ 
Soft Actor Critic
 Optional Readings
   TD3: Addressing Function Approximation Error in Actor-Critic Methods
   
Deep Reinforcement Learning with Double Q-learning
   
MPO: Maximum a Posteriori Policy Optimisation
   
REDQ: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
   
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
   
Continuous Deep Q-Learning with Model-based Acceleration
January 23 Model-based Reinforcement Learning [slides]
※ Information Theoretic MPC for Model-Based Reinforcement Learning
※ 
Generative Temporal Difference Learning for Infinite-Horizon Prediction
※ 
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
※ 
Blending MPC and Value Function Approximation for Efficient RL
 Optional Readings
   PILCO
   
Dreamer
   
TD-models
   
Successor features
   
STEVE
   
Guided Policy Search
 Optional Readings (Application)
   AlphaGo
   
Iterative Residual Policy for Goal-Conditioned Dynamic Manipulation of Deformable Objects
   
Deep Dynamics Models for Learning Dexterous Manipulation
   
DayDreamer
January 25 Student-lead Discussion 2
※ Diagnosing Bottlenecks in Deep Q-learning Algorithms
※ 
When to Trust Your Model: Model-Based Policy Optimization
January 30 Imitation Learning [slides] Project Proposal
※ Feedback in Imitation Learning: The Three Regimes of Covariate Shift
※ 
Towards the Fundamental Limits of Imitation Learning
※ 
Discriminator Actor Critic
 Optional Readings
   An Invitation to Imitation Learning
   
An Algorithmic Perspective on Imitation Learning
   
Imitation Learning as F-Divergence Minimization
   
Of Moments and Matching
   
Provably Efficient Imitation Learning from Observations Alone
   
Dagger
   
DART Noise Injection:
   
Zero Shot Visual Imitation
February 1 Inverse Reinforcement Learning [slides]
※ Adversarial Inverse Reinforcement Learning
※ 
Bayesian IRL (Ramachandran Amir)
※ 
A Connection Between Max Entropy IRL and GAN
 Optional Readings
   Guided Cost Learning
   
Deep Imitative Models
   
Max Margin Planning
   
Max Entropy Deep IRL
   
GAIL
   
InfoGAIL
February 6 Inverse RL and other forms of supervision [slides]
February 8 Student-lead Discussion 3
※ Casual Confusion in Imitation Learning
※ 
Cooperative Inverse Reinforcement Learning
February 13 Learning from Prior Data and Offline Reinforcement Learning [slides]
※ Conservative Q learning
※ 
Learning Latent Plans from Play
※ 
Decision Transformer
※ 
Implicit Q-learning
 Optional Readings
   BEAR: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
   
BRAC:Behavior Regularized Offline Reinforcement Learning
   
GenDICE: Generalized Offline Estimation of Stationary Values
   
Doubly Robust Off-Policy Value Estimation
   
Trajectory Transformer: Offline Reinforcement Learning as One Big Sequence Modeling Problem
   
BCQ: Off-Policy Deep Reinforcement Learning without Exploration
   
TD3 + BC
   
MOREL: Model-Based Offline Reinforcement Learning
February 15 Multi-task and Meta Learning [slides] Project Milestone
※ RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
※ 
Model-Agnostic Meta-Learning
※ 
Gradient Surgery for Multi-Task RL
※ 
MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale
 Optional Readings
   Human-Timescale Adaptation in an Open-Ended Task Space
   
MELD: Meta-Reinforcement Learning from Images via Latent State Models
   
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
   
DREAM: Decoupling Exploration and Exploitation in Meta-Reinforcement Learning without Sacrifices
   
VIMA: General Robot Manipulation with Multimodal Prompts
February 20 Holiday: President's Day
February 22 Student-lead Discussion 4
※ Implicit Q-Learning
※ 
Ray Interference: A Source of Plateaus in Deep RL
February 27 Simulator and Domain Transfer [slides]
March 1 Guest Lecture: Deep Learning in Robot Perception by Pete Florence
March 6 Frontiers and Perspectives
March 8 Student-lead Discussion 5
※ VariBad
※ 
EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
Mar 13 Final Project Presentation Project Report