Convergence of Q update
The Q^ update converges to the Q(s,a) function (and thus to an optimal policy choice) if
- rewards are bounded and discounted
- initial Q values are finite
- each (s,a) pair is visited infinitely often
- 0 ? ?n < 1
- ?n(s,a) decreases with the number of times (s,a) is visited