MDP Model (continued)

Discounting: rewards earned early are better than rewards earned late
- because of the economics
- because ? some chance that the agent will be terminated

First simplifying assumption: value function is time separable: