lesson plan - Nov 7 - probability and games of chance

1. Pascal's Wager 
2. Probabilistic planning: The hungry monkey
3. Calculating expected utility of an action
4. Expectimax
5. Backgammon

---------------------------------------------------------------------
So far have been talked about deterministic games.
Today we will use probabilistic games to introduce probablity theory.

1654 - Blaise Pascal - probabilites of dice games

1669 - Pascal's Wager: 
	1. God either exists or does not
	2. If he exists and you do not believe, you are damned.
	3. If he exists and you believe, you are saved.
	4. If he does not exist and you believe, you may have somewhat
        less fun.
	5. If he does not exist and you believe, you may have somewhat
        more fun.

DECISION problem: elements
	- choice to be made (believe or not believe)
	- probabilities (50/50?)
	- rewards (utilities)
	----> decision rule: maximize expected reward!
		how calculate?
---------------------------------------------------------------------

STRIPS planning = single player deterministic game
probabilistic planning = single player probabilistic game

Example: The Hungry Monkey

shake:  if (~ontable) 
             1/6 -> +1 banana
             5/6 -> no change
        if (ontable)
             2/3 -> +1 banana
             1/3 -> no change

jump:   if (~ontable)
	     2/3 -> ontable
	     1/3 -> ~ontable
        if (ontable)
             ontable


Problem: computed the expected reward of each of the following.
(If preconditions do not hold, performing action has no effect)

[1] shake
[2] shake; shake
[3] jump; shake
[4] jump; shake; shake;
[5] jump; if (~ontable) { jump; shake}
          else { shake; shake }

Method I: Simulation!
Distribute dice
Collect statistics for each: 
	collect total number of bananas
	divide by number of people in class
	result is expected reward!	

Suppose each action has cost 1/4 of a banana
Now what is expected reward?

---------------------------------------------------------------------

Method II: Calculation!

Probability: function assigning any assertion (sentence) a number in
range [0,..,1].
	- must satisfy axioms of probability	
		0<=P(a)<=1
		P(true)=1
		P(false)=0
		P(a v b) = P(a) + P(b) - P(a&b)

What is it?
	- a degree of belief
	- a summary of historical statistics
	- a property of an object (e.g. a die)

RESULT(action) = random variable ranging over possible worlds

Utility: assigns each possible world a real number (reward)

Expected Utility of an action = SUM P(RESULT(a)=s) U(s)
                                 s

Draw game trees!


---------------------------------------------------------------------
Making decisons: ExpectiMax

ExpectiMax(n) = U(n) if n is a terminal state

                max { ExpectiMax(s) | s in successors(n) } if n is max

		SUM P(s) ExpectiMax(s) if n is a chance node
                 s


---------------------------------------------------------------------
Two player games of chance


ExpectiMiniMax(n) = U(n) if n is a terminal state

                max { ExpectiMax(s) | s in successors(n) } if n is max

                min { ExpectiMax(s) | s in successors(n) } if n is min

		SUM P(s) ExpectiMax(s) if n is a chance node
                 s

Backgammon

Appears simplier - but branching factor of 21 at each chance node!
Number of MAX choices is around 20.
Size of tree: O(c^k m^k)
Result: can only search 3 plies, must have a really good static
evaluation function.

Neurogammon and TD-Gammon - learned evaluation function by self-play.  
  (Tesauro 1995)
See slides from yesterday about Othello!
Use results of games to optimize weights:
	- "reward" features that were "on" in winning games
	- "punish" features that were "on" in loosing games
Reinforcement learning.  
Became best player, human or machine!

---------------------------------------------------------------------
Book: Computers Challenging Experts