Schedule

This schedule is tentative and subject to change.

DATE	TOPIC	DUE
January 4	Introduction: Why is Deep Learning a good tool for Robotics? [slides] [recording]
	※ Intelligence without representation ※ The free-energy principle: a unified brain theory? ※ Computing Machinery and Intelligence
	Optional Readings On the measure of intelligence From Socrates to expert systems: The limits of calculative rationality A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence Reforcement Learning in the brain Does intelligence require a body
January 9	Reinforcement Learning - Policy Gradient [slides] [recording]
	※ What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study ※ Approximately Optimal Approximate RL ※ Mirage of Action Dependent Baselines ※ Scalable Trust-Region Method for Deep Reinforcement Learning using Kronecker-factored Approximation
	Optional Readings Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning Sample Efficient Actor Critic with Experience Replay Implementation Matters in Deep RL: A Case Study on PPO and TRPO Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
January 11	Student-lead Discussion 1
	※ A Closer Look at Deep Policy Gradients ※ Backpropagation Through the Void
January 16	Holiday: Martin Luther King Jr. Day
January 18	Reinforcement Learning - Off-policy Methods [slides] [recording]
	※ QT-opt ※ Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning ※ DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction ※ Soft Actor Critic
	Optional Readings TD3: Addressing Function Approximation Error in Actor-Critic Methods Deep Reinforcement Learning with Double Q-learning MPO: Maximum a Posteriori Policy Optimisation REDQ: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning Continuous Deep Q-Learning with Model-based Acceleration
January 23	Model-based Reinforcement Learning [slides]
	※ Information Theoretic MPC for Model-Based Reinforcement Learning ※ Generative Temporal Difference Learning for Infinite-Horizon Prediction ※ Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models ※ Blending MPC and Value Function Approximation for Efficient RL
	Optional Readings PILCO Dreamer TD-models Successor features STEVE Guided Policy Search
	Optional Readings (Application) AlphaGo Iterative Residual Policy for Goal-Conditioned Dynamic Manipulation of Deformable Objects Deep Dynamics Models for Learning Dexterous Manipulation DayDreamer
January 25	Student-lead Discussion 2
	※ Diagnosing Bottlenecks in Deep Q-learning Algorithms ※ When to Trust Your Model: Model-Based Policy Optimization
January 30	Imitation Learning [slides]	Project Proposal
	※ Feedback in Imitation Learning: The Three Regimes of Covariate Shift ※ Towards the Fundamental Limits of Imitation Learning ※ Discriminator Actor Critic
	Optional Readings An Invitation to Imitation Learning An Algorithmic Perspective on Imitation Learning Imitation Learning as F-Divergence Minimization Of Moments and Matching Provably Efficient Imitation Learning from Observations Alone Dagger DART Noise Injection: Zero Shot Visual Imitation
February 1	Inverse Reinforcement Learning [slides]
	※ Adversarial Inverse Reinforcement Learning ※ Bayesian IRL (Ramachandran Amir) ※ A Connection Between Max Entropy IRL and GAN
	Optional Readings Guided Cost Learning Deep Imitative Models Max Margin Planning Max Entropy Deep IRL GAIL InfoGAIL
February 6	Inverse RL and other forms of supervision [slides]
February 8	Student-lead Discussion 3
	※ Casual Confusion in Imitation Learning ※ Cooperative Inverse Reinforcement Learning
February 13	Learning from Prior Data and Offline Reinforcement Learning [slides]
	※ Conservative Q learning ※ Learning Latent Plans from Play ※ Decision Transformer ※ Implicit Q-learning
	Optional Readings BEAR: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction BRAC:Behavior Regularized Offline Reinforcement Learning GenDICE: Generalized Offline Estimation of Stationary Values Doubly Robust Off-Policy Value Estimation Trajectory Transformer: Offline Reinforcement Learning as One Big Sequence Modeling Problem BCQ: Off-Policy Deep Reinforcement Learning without Exploration TD3 + BC MOREL: Model-Based Offline Reinforcement Learning
February 15	Multi-task and Meta Learning [slides]	Project Milestone
	※ RL2: Fast Reinforcement Learning via Slow Reinforcement Learning ※ Model-Agnostic Meta-Learning ※ Gradient Surgery for Multi-Task RL ※ MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale
	Optional Readings Human-Timescale Adaptation in an Open-Ended Task Space MELD: Meta-Reinforcement Learning from Images via Latent State Models VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning DREAM: Decoupling Exploration and Exploitation in Meta-Reinforcement Learning without Sacrifices VIMA: General Robot Manipulation with Multimodal Prompts
February 20	Holiday: President's Day
February 22	Student-lead Discussion 4
	※ Implicit Q-Learning ※ Ray Interference: A Source of Plateaus in Deep RL
February 27	Simulator and Domain Transfer [slides]
March 1	Guest Lecture: Deep Learning in Robot Perception by Pete Florence
March 6	Frontiers and Perspectives
March 8	Student-lead Discussion 5
	※ VariBad ※ EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
Mar 13	Final Project Presentation	Project Report