Assignment 6 for CSE 415 (Spring 2016)

Assignment 6: Bayes' Rule, Markov Decision Processes, and Reinforcement Learning

CSE 415: Introduction to Artificial Intelligence
The University of Washington, Seattle, Spring 2016

The reading on Bayes' rule is Section 7.2.3 of Probabilistic Reasoning. The reading for the MDP and Reinforcement Learning part of this assignment is is Chapter 3 of Sutton and Barto (see the Readings webpage).

Due Wednesday, May 18 via Catalyst CollectIt at 11:59 PM.

Part I. Written Answers (40 points).

"Fishing at the Bay of Bayes" (15 points)
Every year, when the fishing season opens, anglers converge at Bay of Bayes to try their luck. Some of the more scientifically-minded fishing enthusiasts have found the following statistics. Out of 100 observations of individual fishermen fishing during 10-minute periods in the opening week of the season, one or more fish were caught in 20 of these attempts. In 5 of these cases when fish were caught, it had just stopped raining. In just 2 of the 80 cases in which fish were not caught, it had just stopped raining.
1. Determine the prior probability of catching a fish during a 10-minute attempt.
2. Determine the conditional probability that it has just stopped raining given that one or more fish were caught during a single attempt.
3. Determine the conditional probability that it has just stopped raining given that no fish were caught.
4. Determine the probability of one fisherman catching one or more fish during a single 10-minute period given that it has just stopped raining (“sr”), using Bayes’ rule.
5. Determine the joint probability distribution for these two random variables, F: {fish, none} and W: {stopped-raining, other}
6. Write down the marginal distributions for each of F and W. Then compute the product distribution for the two marginals.(Use a calculator.) Compare the tables for P(F, W) and P(F) P(W) and comment on the possible independence of F and W

"The Mecha-Mouse at the Hostel for Travelling Droids" (25 points)

The Hostel for Travelling Droids has four rooms: Dormitory (D), Lavatory (L), Pantry, and Mess Hall (M). There is a mechanical mouse ("Mecha-mouse") that inhabits the hostel, typically looking for a meal. The mouse has three actions: (X: exit current room; Y: alternative action; Z: remain as is). There is some danger than the "Compu-Cat" will ambush the mouse at any time, putting it in the Ambushed state, from which it can only go to the dead-end Kaput state. The activities in this hostel are governed by a Markov Decision Process with the following dynamics.
s, a Dormitory Lavatory Pantry Mess Hall Ambushed Kaput

Dormitory, X 0 0.4 0 0.6 0 0

Dormitory, Y 0 0.6 0 0.4 0 0

Dormitory, Z 0 0.75 0 ~~0.75~~ 0 0 0.25 0

Lavatory, X 0.4 0 0.6 0 0 0

Lavatory, Y 0.6 0 0.4 0 0 0

Lavatory, Z 0 0.75 0 0 0.25 0

Pantry, X 0 0.6 0 0.4 0 0

Pantry, Y 0 0.4 0 0.6 0 0

Pantry, Z 0 0 0.75 0 0.25 0

Mess Hall, X 0.4 0 0.6 0 0 0

Mess Hall, Y 0.6 0 0.4 0 0 0

Mess Hall, Z 0 0 0 0.75 0.25 0

Ambushed, * 0 0 0 0 0 1.0

Kaput, * 0 0 0 0 0 1.0

The reward here depends only on the current state s.
s R(s)

Dormitory 0

Lavatory 4

Pantry 10

Mess Hall 2

Ambushed -50

Kaput 0

Give the number of different policies that are possible for Mecha-mouse in the hostel.
Manually apply the values iteration method to this problem for six iterations. Show the value at each state in each iteration. Assume that the discount factor is 0.5.
Based on your analysis, give the optimal policy as an action for each state.

Part II. 60 Points. (See this separate page.)

Updates and Corrections

Please note the corrected probability values in row 3 of the transition table for Problem 2 (these changes were made on May 13 at 11:57 PM). If necessary, further updates and corrections will be posted here and/or mentioned in class or on GoPost.

Feedback Survey

After submitting your solution, please answer this survey