Summary of MDP Solution Techniques

All are variants of dynamic programming, starting at stage 0 and using an optimal policy for n stages to build an optimal policy for n+1 stages

The use of this backup technique depends crucially on a time-separable value function.

Convergence guarantee depends crucially on discount factor.

Tractability depends crucially on full observability.

Current work:

Previous slide Next slide Back to first slide View graphic version