Due Date: Due Wed Nov 23, 9:30am in class or through the online dropbox. |
A. [10pts] Formulate the Simple Blackjack problem as an MDP.
B. [10pts] Do three rounds of value iteration, showing all of your work. Be sure to specify the settings for all relevant parameters (any reasonable values are acceptable).
C. [10pts] Simulate Q-learning for 10 steps. Be sure to specify the settings for all relevant parameters (any reasonable values are acceptable). Any time you would need to sample from a distribution, write the result you picked and label it with the associated probability.
B. [5pts] In general, would you prefer to use VI or PI to solve an MDP. Describe cases, if any, where one should be preferred to the other. Consider both the per-iteration complexity as well as the number of iterations required to find a good policy.
A. [5pts] Model this employment process as a Markov chain.
B. [5pts] If you start out employed, what is the probability of still being employed after three time steps? Show all of your work.
B. [5pts] If you are unemployed at time 3, what is the probability that you started out employed at time 1?
C. [5pts] What is the stationary distribution of the chain you defined?