| January 4 |
Introduction: Why is Deep Learning a good tool for Robotics? [slides] [recording] |
|
|
※ Intelligence without representation ※ The free-energy principle: a unified brain theory? ※ Computing Machinery and Intelligence
|
|
|
Optional Readings On the measure of intelligence From Socrates to expert systems: The limits of calculative rationality A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence Reforcement Learning in the brain Does intelligence require a body
|
|
| January 9 |
Reinforcement Learning - Policy Gradient [slides] [recording] |
|
|
※ What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study ※ Approximately Optimal Approximate RL ※ Mirage of Action Dependent Baselines ※ Scalable Trust-Region Method for Deep Reinforcement Learning using Kronecker-factored Approximation
|
|
|
Optional Readings Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning Sample Efficient Actor Critic with Experience Replay Implementation Matters in Deep RL: A Case Study on PPO and TRPO Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
|
|
| January 11 |
Student-lead Discussion 1 |
|
|
※ A Closer Look at Deep Policy Gradients ※ Backpropagation Through the Void
|
|
| January 16 |
Holiday: Martin Luther King Jr. Day |
|
| January 18 |
Reinforcement Learning - Off-policy Methods [slides] [recording] |
|
|
※ QT-opt ※ Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning ※ DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction ※ Soft Actor Critic
|
|
|
Optional Readings TD3: Addressing Function Approximation Error in Actor-Critic Methods Deep Reinforcement Learning with Double Q-learning MPO: Maximum a Posteriori Policy Optimisation REDQ: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning Continuous Deep Q-Learning with Model-based Acceleration
|
|
| January 23 |
Model-based Reinforcement Learning [slides] |
|
|
※ Information Theoretic MPC for Model-Based Reinforcement Learning ※ Generative Temporal Difference Learning for Infinite-Horizon Prediction ※ Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models ※ Blending MPC and Value Function Approximation for Efficient RL
|
|
|
Optional Readings PILCO Dreamer TD-models Successor features STEVE Guided Policy Search
|
|
|
Optional Readings (Application) AlphaGo Iterative Residual Policy for Goal-Conditioned Dynamic Manipulation of Deformable Objects Deep Dynamics Models for Learning Dexterous Manipulation DayDreamer
|
|
| January 25 |
Student-lead Discussion 2 |
|
|
※ Diagnosing Bottlenecks in Deep Q-learning Algorithms ※ When to Trust Your Model: Model-Based Policy Optimization
|
|
| January 30 |
Imitation Learning |
Project Proposal |
| February 1 |
Reward Inference and Specification |
|
| February 6 |
Student-lead Discussion 3 |
|
| February 8 |
Learning from Prior Data and Offline Reinforcement Learning |
|
| February 13 |
Student-lead Discussion 4 |
|
| February 15 |
Multi-task and Meta Learning |
Project Milestone |
| February 20 |
Holiday: President's Day |
|
| February 22 |
Simulator and Domain Transfer |
|
| February 27 |
Student-lead Discussion 5 |
|
| March 1 |
Deep Learning for Perception |
|
| March 6 |
Frontiers and Perspectives |
|
| March 8 |
Student-lead Discussion 6 |
|
| Mar 13 |
Final Project Presentation |
Project Report |