CSE 473 - Artificial Intelligence - Autumn 2011

Written Assignment 2: MDPs, RL, and Markov Chains

Due Date: Due Wed Nov 23, 9:30am in class or through the online dropbox.

Problem 1: Simple Blackjack [30 points]

Consider a simplification of the card game Blackjack. There is an infinite deck of cards. Each card can be a 1, 2, or 3 (distributed in equal proportions). Each turn, the player can choose to draw another card or stop. At the end of the game, the player's score is equal to the sum of the numbers on all of the cards that have been flipped, if this sum is less than 5. If the sum is 5 or greater, the player receives a score of 0.

A. [10pts] Formulate the Simple Blackjack problem as an MDP.

B. [10pts] Do three rounds of value iteration, showing all of your work. Be sure to specify the settings for all relevant parameters (any reasonable values are acceptable).

C. [10pts] Simulate Q-learning for 10 steps. Be sure to specify the settings for all relevant parameters (any reasonable values are acceptable). Any time you would need to sample from a distribution, write the result you picked and label it with the associated probability.

Problem 2: Value Iteration vs. Policy Iteration [10 points]

A. [5pts] Briefly, in your own words, describe the difference between value iteration (VI) and policy iteration (PI).

B. [5pts] In general, would you prefer to use VI or PI to solve an MDP. Describe cases, if any, where one should be preferred to the other. Consider both the per-iteration complexity as well as the number of iterations required to find a good policy.

Problem 3: Markov Chains [20 points]

Consider the following process. You can be employed or unemployed. At each time step, you have a 1% chance of loosing your job and a 15% chance of moving from unemployment to employment (finding a new job!). Unless otherwise specified, assume that you have a 50% chance of being employed at time 1.

A. [5pts] Model this employment process as a Markov chain.

B. [5pts] If you start out employed, what is the probability of still being employed after three time steps? Show all of your work.

B. [5pts] If you are unemployed at time 3, what is the probability that you started out employed at time 1?

C. [5pts] What is the stationary distribution of the chain you defined?