MDP Model (continued)
First simplifying assumption: value function is time separable:
Discounting: rewards earned early are better than rewards earned late
- because of the economics
- because ? some chance that the agent will be terminated
Infinite-horizon discounted problems