Part I. Written Answers (40 points).
- "Fishing at the Bay of Bayes" (15 points)
Every year, when the fishing season opens, anglers converge at Bay of Bayes to try their luck.
Some of the more scientifically-minded fishing enthusiasts have found the following statistics.
Out of 100 observations of individual fishermen fishing during 10-minute periods in the opening
week of the season, one or more fish were caught in 20 of these attempts.
In 5 of these cases when fish were caught, it had just stopped raining. In just 2 of the 80 cases in
which fish were not caught, it had just stopped raining.
1. Determine the prior probability of catching a fish during a 10-minute attempt.
2. Determine the conditional probability that it has just stopped raining given that one or more fish were caught
during a single attempt.
3. Determine the conditional probability that it has just stopped raining given that no fish were caught.
4. Determine the probability of one fisherman catching one or more fish during a single 10-minute period given
that it has just stopped raining (“sr”), using Bayes’ rule.
5. Determine the joint probability distribution for these two random variables,
F: {fish, none} and W: {stopped-raining, other}
6. Write down the marginal distributions for each of F and W. Then compute the product distribution for the two
marginals.(Use a calculator.)
Compare the tables for P(F, W) and P(F) P(W) and comment on the possible independence of F and W
- "The Mecha-Mouse at the Hostel for Travelling Droids" (25 points)
The Hostel for Travelling Droids has four rooms: Dormitory (D), Lavatory (L), Pantry, and Mess Hall (M).
There is a mechanical mouse ("Mecha-mouse") that inhabits the hostel, typically looking for a meal.
The mouse has three actions: (X: exit current room; Y: alternative action; Z: remain as is).
There is some danger than the "Compu-Cat" will ambush the mouse at any time, putting it in the Ambushed state, from which it can only go to the dead-end Kaput state.
The activities in this hostel are governed by a Markov Decision Process with the following
dynamics.
s, a |
Dormitory |
Lavatory |
Pantry |
Mess Hall |
Ambushed |
Kaput |
Dormitory, X | 0 | 0.4 | 0 | 0.6 | 0 | 0 |
Dormitory, Y | 0 | 0.6 | 0 | 0.4 | 0 | 0 |
Dormitory, Z | 0.75 | 0 | 0 | 0 | 0.25 | 0 |
Lavatory, X | 0.4 | 0 | 0.6 | 0 | 0 | 0 |
Lavatory, Y | 0.6 | 0 | 0.4 | 0 | 0 | 0 |
Lavatory, Z | 0 | 0.75 | 0 | 0 | 0.25 | 0 |
Pantry, X | 0 | 0.6 | 0 | 0.4 | 0 | 0 |
Pantry, Y | 0 | 0.4 | 0 | 0.6 | 0 | 0 |
Pantry, Z | 0 | 0 | 0.75 | 0 | 0.25 | 0 |
Mess Hall, X | 0.4 | 0 | 0.6 | 0 | 0 | 0 |
Mess Hall, Y | 0.6 | 0 | 0.4 | 0 | 0 | 0 |
Mess Hall, Z | 0 | 0 | 0 | 0.75 | 0.25 | 0 |
Ambushed, * | 0 | 0 | 0 | 0 | 0 | 1.0 |
Kaput, * | 0 | 0 | 0 | 0 | 0 | 1.0 |
The reward here depends only on the current state s.
s | R(s) |
Dormitory | 0 |
Lavatory | 4 |
Pantry | 10 |
Mess Hall | 2 |
Ambushed | -50 |
Kaput | 0 |
- Give the number of different policies that are possible for Mecha-mouse in the hostel.
- Manually apply the values iteration method to this problem for six iterations.
Show the value at each state in each iteration. Assume that the discount factor is 0.5.
- Based on your analysis, give the optimal policy as an action for each state.
|