Computing Optimal Policies
We can define the expected value of being in state s and acting according to a fixed policy ?
A fundamental result is that the optimal policy v*(s) is a solution to the following equation (the Bellman equation):
Previous slide
Next slide
Back to first slide
View graphic version