From: William Noble (noble@gs.washington.edu)
Date: Fri Apr 23 2004 - 16:28:11 PDT
Hello, theory group.
I posed a question to Martin Tompa this morning that is relevant to my
analysis of the human genome, and he suggested that I try running it
by y'all. So here goes ...
You are given a sequence of letters of length L. You select
uniformly at random n distinct positions within the sequence, and
break the sequence into n+1 fragments. You then count the number X
of occurrences of all fragments of a specified length m. I'd like
to know the distribution of that resulting count X. In particular,
I need to compute the p-value of observing at least p length-m
fragments, so I need a cumulative density function.
Actually, to make it a bit more complex, the specified value m will
be a collection of values (e.g., not just fragments of length 13,
but fragments of length 11, 12, 13, 17 or 22). But I assume that if
I can solve the problem above, it will be straightforward to
generalize to this case.
I also assume that somewhere, in some form, someone has solved this or
a closely related problem before. If anyone knows the answer or can
point me toward a solution, I'd be grateful.
Thanks.
Bill Noble
-----
William Stafford Noble
Assistant Professor
Department of Genome Sciences
University of Washington
Health Sciences Center, Box 357730
1705 NE Pacific Street
Seattle, WA 98195
Tel: (206) 543-8930
Fax: (206) 685-7301
Office: J-205
http://www.gs.washington.edu/~noble
_______________________________________________
Theory-group mailing list
Theory-group@cs.washington.edu
http://mailman.cs.washington.edu/mailman/listinfo/theory-group
This archive was generated by hypermail 2.1.6 : Fri Apr 23 2004 - 16:28:36 PDT