Policy Construction and Dynamic Programming
This suggests a dynamic programming approach to solving the problem:
start with some v0 (s)
compute vi+1 (s) using the recurrence relationship
stop when computation converges to
convergence guarantee is
Previous slide
Next slide
Back to first slide
View graphic version