lesson plan - Nov 7 - probability and games of chance 1. Pascal's Wager 2. Probabilistic planning: The hungry monkey 3. Calculating expected utility of an action 4. Expectimax 5. Backgammon --------------------------------------------------------------------- So far have been talked about deterministic games. Today we will use probabilistic games to introduce probablity theory. 1654 - Blaise Pascal - probabilites of dice games 1669 - Pascal's Wager: 1. God either exists or does not 2. If he exists and you do not believe, you are damned. 3. If he exists and you believe, you are saved. 4. If he does not exist and you believe, you may have somewhat less fun. 5. If he does not exist and you believe, you may have somewhat more fun. DECISION problem: elements - choice to be made (believe or not believe) - probabilities (50/50?) - rewards (utilities) ----> decision rule: maximize expected reward! how calculate? --------------------------------------------------------------------- STRIPS planning = single player deterministic game probabilistic planning = single player probabilistic game Example: The Hungry Monkey shake: if (~ontable) 1/6 -> +1 banana 5/6 -> no change if (ontable) 2/3 -> +1 banana 1/3 -> no change jump: if (~ontable) 2/3 -> ontable 1/3 -> ~ontable if (ontable) ontable Problem: computed the expected reward of each of the following. (If preconditions do not hold, performing action has no effect) [1] shake [2] shake; shake [3] jump; shake [4] jump; shake; shake; [5] jump; if (~ontable) { jump; shake} else { shake; shake } Method I: Simulation! Distribute dice Collect statistics for each: collect total number of bananas divide by number of people in class result is expected reward! Suppose each action has cost 1/4 of a banana Now what is expected reward? --------------------------------------------------------------------- Method II: Calculation! Probability: function assigning any assertion (sentence) a number in range [0,..,1]. - must satisfy axioms of probability 0<=P(a)<=1 P(true)=1 P(false)=0 P(a v b) = P(a) + P(b) - P(a&b) What is it? - a degree of belief - a summary of historical statistics - a property of an object (e.g. a die) RESULT(action) = random variable ranging over possible worlds Utility: assigns each possible world a real number (reward) Expected Utility of an action = SUM P(RESULT(a)=s) U(s) s Draw game trees! --------------------------------------------------------------------- Making decisons: ExpectiMax ExpectiMax(n) = U(n) if n is a terminal state max { ExpectiMax(s) | s in successors(n) } if n is max SUM P(s) ExpectiMax(s) if n is a chance node s --------------------------------------------------------------------- Two player games of chance ExpectiMiniMax(n) = U(n) if n is a terminal state max { ExpectiMax(s) | s in successors(n) } if n is max min { ExpectiMax(s) | s in successors(n) } if n is min SUM P(s) ExpectiMax(s) if n is a chance node s Backgammon Appears simplier - but branching factor of 21 at each chance node! Number of MAX choices is around 20. Size of tree: O(c^k m^k) Result: can only search 3 plies, must have a really good static evaluation function. Neurogammon and TD-Gammon - learned evaluation function by self-play. (Tesauro 1995) See slides from yesterday about Othello! Use results of games to optimize weights: - "reward" features that were "on" in winning games - "punish" features that were "on" in loosing games Reinforcement learning. Became best player, human or machine! --------------------------------------------------------------------- Book: Computers Challenging Experts