MDP Model (continued)

The agent has a value function that determines how good its course of action is.
- value function might depend arbitrarily on entire history:v({s0, a0, s1, a1, ...}) ? ?

The agent’s behavior is evaluated over a finite horizon or in the limit over an infinite horizon.

The agent’s task is to construct a policy that maximizes the expectation of the value function over the specified horizon.

The agent has a value function that determines how good its course of action is.