Properties of the Model

Optimal policy is stationary
- Choice of action ai depends only on si
- Optimal policy is of the form ?(s) = a
  - which is of fixed size |S|, regardless of the # of stages

Assuming