January 4 |
Introduction: Why is Deep Learning a good tool for Robotics? [slides] [recording] |
|
|
※ Intelligence without representation ※ The free-energy principle: a unified brain theory? ※ Computing Machinery and Intelligence
|
|
|
Optional Readings On the measure of intelligence From Socrates to expert systems: The limits of calculative rationality A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence Reforcement Learning in the brain Does intelligence require a body
|
|
January 9 |
Reinforcement Learning - Policy Gradient [slides] [recording] |
|
|
※ What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study ※ Approximately Optimal Approximate RL ※ Mirage of Action Dependent Baselines ※ Scalable Trust-Region Method for Deep Reinforcement Learning using Kronecker-factored Approximation
|
|
|
Optional Readings Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning Sample Efficient Actor Critic with Experience Replay Implementation Matters in Deep RL: A Case Study on PPO and TRPO Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
|
|
January 11 |
Student-lead Discussion 1 |
|
|
※ A Closer Look at Deep Policy Gradients ※ Backpropagation Through the Void
|
|
January 16 |
Holiday: Martin Luther King Jr. Day |
|
January 18 |
Reinforcement Learning - Off-policy Methods [slides] [recording] |
|
|
※ QT-opt ※ Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning ※ DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction ※ Soft Actor Critic
|
|
|
Optional Readings TD3: Addressing Function Approximation Error in Actor-Critic Methods Deep Reinforcement Learning with Double Q-learning MPO: Maximum a Posteriori Policy Optimisation REDQ: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning Continuous Deep Q-Learning with Model-based Acceleration
|
|
January 23 |
Model-based Reinforcement Learning [slides] |
|
|
※ Information Theoretic MPC for Model-Based Reinforcement Learning ※ Generative Temporal Difference Learning for Infinite-Horizon Prediction ※ Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models ※ Blending MPC and Value Function Approximation for Efficient RL
|
|
|
Optional Readings PILCO Dreamer TD-models Successor features STEVE Guided Policy Search
|
|
|
Optional Readings (Application) AlphaGo Iterative Residual Policy for Goal-Conditioned Dynamic Manipulation of Deformable Objects Deep Dynamics Models for Learning Dexterous Manipulation DayDreamer
|
|
January 25 |
Student-lead Discussion 2 |
|
|
※ Diagnosing Bottlenecks in Deep Q-learning Algorithms ※ When to Trust Your Model: Model-Based Policy Optimization
|
|
January 30 |
Imitation Learning [slides] |
Project Proposal |
|
※ Feedback in Imitation Learning: The Three Regimes of Covariate Shift ※ Towards the Fundamental Limits of Imitation Learning ※ Discriminator Actor Critic
|
|
|
Optional Readings An Invitation to Imitation Learning An Algorithmic Perspective on Imitation Learning Imitation Learning as F-Divergence Minimization Of Moments and Matching Provably Efficient Imitation Learning from Observations Alone Dagger DART Noise Injection: Zero Shot Visual Imitation
|
|
February 1 |
Inverse Reinforcement Learning [slides] |
|
|
※ Adversarial Inverse Reinforcement Learning ※ Bayesian IRL (Ramachandran Amir) ※ A Connection Between Max Entropy IRL and GAN
|
|
|
Optional Readings Guided Cost Learning Deep Imitative Models Max Margin Planning Max Entropy Deep IRL GAIL InfoGAIL
|
|
February 6 |
Inverse RL and other forms of supervision [slides] |
|
February 8 |
Student-lead Discussion 3 |
|
|
※ Casual Confusion in Imitation Learning ※ Cooperative Inverse Reinforcement Learning
|
|
February 13 |
Learning from Prior Data and Offline Reinforcement Learning [slides] |
|
|
※ Conservative Q learning ※ Learning Latent Plans from Play ※ Decision Transformer ※ Implicit Q-learning
|
|
|
Optional Readings BEAR: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction BRAC:Behavior Regularized Offline Reinforcement Learning GenDICE: Generalized Offline Estimation of Stationary Values Doubly Robust Off-Policy Value Estimation Trajectory Transformer: Offline Reinforcement Learning as One Big Sequence Modeling Problem BCQ: Off-Policy Deep Reinforcement Learning without Exploration TD3 + BC MOREL: Model-Based Offline Reinforcement Learning
|
|
February 15 |
Multi-task and Meta Learning [slides] |
Project Milestone |
|
※ RL2: Fast Reinforcement Learning via Slow Reinforcement Learning ※ Model-Agnostic Meta-Learning ※ Gradient Surgery for Multi-Task RL ※ MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale
|
|
|
Optional Readings Human-Timescale Adaptation in an Open-Ended Task Space MELD: Meta-Reinforcement Learning from Images via Latent State Models VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning DREAM: Decoupling Exploration and Exploitation in Meta-Reinforcement Learning without Sacrifices VIMA: General Robot Manipulation with Multimodal Prompts
|
|
February 20 |
Holiday: President's Day |
|
February 22 |
Student-lead Discussion 4 |
|
|
※ Implicit Q-Learning ※ Ray Interference: A Source of Plateaus in Deep RL
|
|
February 27 |
Simulator and Domain Transfer [slides] |
|
March 1 |
Guest Lecture: Deep Learning in Robot Perception by Pete Florence |
|
March 6 |
Frontiers and Perspectives |
|
March 8 |
Student-lead Discussion 5 |
|
|
※ VariBad ※ EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
|
|
Mar 13 |
Final Project Presentation |
Project Report |