Summary of MDP Solution Techniques
All are variants of dynamic programming, starting at stage 0 and using an optimal policy for n stages to build an optimal policy for n+1 stages
The use of this backup technique depends crucially on a time-separable value function.
Convergence guarantee depends crucially on discount factor.
Tractability depends crucially on full observability.
Current work:
using structured representations and approximation methods to avoid having to examine the entire state space
working with undiscounted “planning-like” problems
extension to models with partial observability