I should mention that there's an error in the paper.  The RHS of Equation 4 marginalizes in the wrong way.  It should be

p(I|theta) = \prod_t \int z_t p(I,z_t)|theta) dz_t
           = \prod_t \int z_t \prod_w,j p(I,z_t,W_wjt|theta) dz_t

If you work through the derivation, the only impact is that we're giving way too much weight to the prior (in effect, reducing the prior variance).  We haven't tried fixing it.

Aaron