Due Monday, November 8
Four generation of laboratory mice have been bred as follows:
Initial generation: Alice, Bob, Cindy, Dave, Ellen
Second generation:
- Fred has parents Alice and Bob
- Gwen has parents Cindy and Bob
- Henry has parents Cindy and Dave
- Iona has parents Ellen and Bob
Third generation:
- John has parents Gwen and Fred
- Katherine has parents Henry and Iona
Fourth generation:
- Louis has parents Katherine and John
The mouse Louis suffers from a life-threatening disease and dies soon after birth. The disease is hereditary and carried by a recessive gene, meaning that two parents have to carry the gene for the offspring to be diseased. Since the disease is recessive, the parents themselves need only be carriers of the disease, but not diseased themselves. A completely healthy animal has genotype pp (pure), a carrier is pd or dp (but hereafter written only as pd), and a diseased animal is dd.
We solve the following problem using the Bayesian reasoning inference: Which members of the ancestry of Louis are likely to be carriers of the disease?
According to the laws of genetics (and basic probability theory), the possible combinations of parent genotypes as probability triples (P(dd),P(pd),P(pp)) for the genotype of the descendent is as follows:
dd | pd | pp | |
---|---|---|---|
dd | (1,0,0) | (0.5,0.5,0) | (0,1,0) |
pd | (0.5, 0.5, 0) | (0.25, 0.5, 0.25) | (0,0.5,0.5) |
pp | (0,1,0) | (0,0.5,0.5) | (0,0,1) |
You can make the following assumptions: None of the mice other than Louis are actually diseased, otherwise the would not have lived long enough to procreate. This means that you can reduce the probability above by eliminating the rows and columns marked with dd. Therefore the conditional probability table P(x|parents(x)) for Louis is:
pd | pp | |
---|---|---|
pd | (0.25, 0.5, 0.25) | (0,0.5,0.5) |
pp | (0,0.5,0.5) | (0,0,1) |
For all the other mice you can make an additional simplifying assumption: As we know that all the other mice aren’t diseased, you will need to calculate only probability pairs (P(dp),P(pp)) as the entries of the conditional probability tables P(x|parents(x)). Therefore, the CPT's for all the mice in the second and third generations are:
pd | pp | |
---|---|---|
pd | (0.67, 0.33) | (0.5,0.5) |
pp | (0.5,0.5) | (0,1) |
Finally, you will need prior probability information mice being disease carriers to determine the prior probabilities that the first-generation mice are disease carries. Assume that P(dp) = 0.01 and P(pp) = 0.99.
Model the above genealogy in a Bayesian network in JavaBayes to calculate the disease carrier probabilities for all the mice conditioned on the fact that Louis has the disease.
JavaBayes can be downloaded from: http://www.pmr.poli.usp.br/ltd/Software/javabayes/. You should use the command-line version of JavaBayes so that you can save your work. Before attempting to do this assignment you should work through the "dog problem" example in the on-line manual in order to understand how to use the system. Note that you need to have Java installed on your computer.
On Monday, November 8 turn in:
Please have JavaBayes installed on your laptop for that class, in case we use it for an in-class exercise.