Q Learning (cont.)

What is the appropriate update from estimated Q^n to the updated Q^n+1
- to ensure that for all s and a, Q^n(s,a) converges to Q(s,a) as n goes to infinity

The key is to adjust the Q^ values gradually with each iteration:
- where one possible function for ? is

Learning rate

What is the appropriate update from estimated Q^n to the updated Q^n+1