CSE 473 - Introduction to Artificial Intelligence

Assignment 7: Bayesian Inference for Genetics

Four generation of laboratory mice have been bred as follows:

Initial generation: Alice, Bob, Cindy, Dave, Ellen

Second generation:

Third generation:

Fourth generation:

The mouse Louis suffers from a life-threatening disease and dies soon after birth. The disease is hereditary and carried by a recessive gene, meaning that two parents have to carry the gene for the offspring to be diseased. Since the disease is recessive, the parents themselves need only be carriers of the disease, but not diseased themselves. A completely healthy animal has genotype pp (pure), a carrier is pd or dp (but hereafter written only as pd), and a diseased animal is dd.

We solve the following problem using the Bayesian reasoning inference:  Which members of the ancestry of Louis are likely to be carriers of the disease?

According to the laws of genetics (and basic probability theory), the possible combinations of parent genotypes as probability triples (P(dd),P(pd),P(pp)) for the genotype of the descendent is as follows:

 ddpd pp
dd (1,0,0) (0.5,0.5,0) (0,1,0)
pd (0.5, 0.5, 0) (0.25, 0.5, 0.25) (0,0.5,0.5)
pp (0,1,0) (0,0.5,0.5) (0,0,1)

You can make the following assumptions: None of the mice other than Louis are actually diseased, otherwise the would not have lived long enough to procreate. This means that you can reduce the probability above by eliminating the rows and columns marked with dd.  Therefore the conditional probability table P(x|parents(x)) for Louis is:

 pd pp
pd (0.25, 0.5, 0.25) (0,0.5,0.5)
pp (0,0.5,0.5) (0,0,1)

For all the other mice you can make an additional simplifying assumption: As we know that all the other mice aren’t diseased, you will need to calculate only probability pairs (P(dp),P(pp)) as the entries of the conditional probability tables P(x|parents(x)).  Therefore, the CPT's for all the mice in the second and third generations are:

 pd pp
pd (0.67, 0.33) (0.5,0.5)
pp (0.5,0.5) (0,1)

Finally, you will need prior probability information mice being disease carriers to determine the prior probabilities that the first-generation mice are disease carries. Assume that P(dp) = 0.01 and P(pp) = 0.99.

Model the above genealogy in a Bayesian network in JavaBayes to calculate the disease carrier probabilities for all the mice conditioned on the fact that Louis has the disease.

JavaBayes can be downloaded from:  http://www-2.cs.cmu.edu/~javabayes/.  You should use the command-line version of JavaBayes so that you can save your work.  Before attempting to do this assignment you should work through the "dog problem" example in the on-line manual in order to understand how to use the system.  Note that you need to have Java installed on your computer.

On Friday, November 21 at the start of class turn in the following (hardcopy):