Bayes rule
In terms of our problem:
what we measure
(likelihood)
domain knowledge
(prior)
what we want
(posterior)
normalization term
The prior:  P(skin)
Could use domain knowledge
P(skin) may be larger if we know the image contains a person
for a portrait, P(skin) may be higher for pixels in the center
Could learn the prior from the training set.  How?
P(skin) may be proportion of skin pixels in training set