Policy Construction and Dynamic Programming

This suggests a dynamic programming approach to solving the problem:
- start with some v0 (s)
- compute vi+1 (s) using the recurrence relationship
- stop when computation converges to
- convergence guarantee is

This suggests a dynamic programming approach to solving the problem: