Reinforcement Learning
Continue studying infinite-horizon discounted fully observable problems
We make an implicit assumption that “models are expensive, trials are cheap.”
The problem is to learn the model parameters based only on observed state and reward information
- Transition probabilities
- Reward function and discount factor
- Optimal policy
Two main approaches:
- learn the model then infer the policy
- learn the policy without learning the explicit model parameters