Reinforcement Learning

The problem is to learn the model parameters based only on observed state and reward information
- Transition probabilities
- Reward function and discount factor
- Optimal policy

Two main approaches:
- learn the model then infer the policy
- learn the policy without learning the explicit model parameters

Continue studying infinite-horizon discounted fully observable problems