Q Learning (cont.)
What is the appropriate update from estimated Q^n to the updated Q^n+1
- to ensure that for all s and a, Q^n(s,a) converges to Q(s,a) as n goes to infinity
The key is to adjust the Q^ values gradually with each iteration:
-
- where one possible function for ? is