From: gupta@ee.washington.edu
Date: Fri Apr 23 2004 - 18:18:12 PDT
I'm an EE, so I know the continuous world a little better. In the continuous
case, let's say you threw down N points uniformly randomly on a continuous line
between 0 and a. Then the distances between each pair of adjacent N points
(and the endpoints) would be your 'sequence lengths' and they would be
distributed exponentially, so that the probability of short ones is much
greater than the probabilty of long ones.
Since this is a discrete case (your sequence lengths are discrete not
continuous) it changes things slightly. I'd have to go back over the math, but
the Poisson is the binomial distribution in the limit, and the exponential
distribution should be a limiting case of 'how many trials until the first
success', so a geometric random variable (also the only memoryless discrete
r.v.).
So I think your sequence lengths would just be geometric r.v.'s.
Someone have this proved already? Otherwise I suppose I could spend another few
minutes verifying my assertion :)
-Maya Gupta
-- Maya Gupta gupta@ee.washington.edu (206)616-2303 Assistant Professor Dept. of Electrical Engineering University of Washington Quoting William Noble <noble@gs.washington.edu>: > > Hello, theory group. > > I posed a question to Martin Tompa this morning that is relevant to my > analysis of the human genome, and he suggested that I try running it > by y'all. So here goes ... > > You are given a sequence of letters of length L. You select > uniformly at random n distinct positions within the sequence, and > break the sequence into n+1 fragments. You then count the number X > of occurrences of all fragments of a specified length m. I'd like > to know the distribution of that resulting count X. In particular, > I need to compute the p-value of observing at least p length-m > fragments, so I need a cumulative density function. > > Actually, to make it a bit more complex, the specified value m will > be a collection of values (e.g., not just fragments of length 13, > but fragments of length 11, 12, 13, 17 or 22). But I assume that if > I can solve the problem above, it will be straightforward to > generalize to this case. > > I also assume that somewhere, in some form, someone has solved this or > a closely related problem before. If anyone knows the answer or can > point me toward a solution, I'd be grateful. > > Thanks. > Bill Noble > > ----- > William Stafford Noble > Assistant Professor > Department of Genome Sciences > University of Washington > Health Sciences Center, Box 357730 > 1705 NE Pacific Street > Seattle, WA 98195 > Tel: (206) 543-8930 > Fax: (206) 685-7301 > Office: J-205 > http://www.gs.washington.edu/~noble > _______________________________________________ > Theory-group mailing list > Theory-group@cs.washington.edu > http://mailman.cs.washington.edu/mailman/listinfo/theory-group > _______________________________________________ Theory-group mailing list Theory-group@cs.washington.edu http://mailman.cs.washington.edu/mailman/listinfo/theory-group
This archive was generated by hypermail 2.1.6 : Fri Apr 23 2004 - 18:18:36 PDT