Re: sequence fragments

From: gupta@ee.washington.edu
Date: Fri Apr 23 2004 - 18:18:12 PDT

  • Next message: Dimitris Achlioptas: "RE: sequence fragments"

    I'm an EE, so I know the continuous world a little better. In the continuous
    case, let's say you threw down N points uniformly randomly on a continuous line
    between 0 and a. Then the distances between each pair of adjacent N points
    (and the endpoints) would be your 'sequence lengths' and they would be
    distributed exponentially, so that the probability of short ones is much
    greater than the probabilty of long ones.

    Since this is a discrete case (your sequence lengths are discrete not
    continuous) it changes things slightly. I'd have to go back over the math, but
    the Poisson is the binomial distribution in the limit, and the exponential
    distribution should be a limiting case of 'how many trials until the first
    success', so a geometric random variable (also the only memoryless discrete
    r.v.).

    So I think your sequence lengths would just be geometric r.v.'s.

    Someone have this proved already? Otherwise I suppose I could spend another few
    minutes verifying my assertion :)

    -Maya Gupta

    -- 
    Maya Gupta
    gupta@ee.washington.edu
    (206)616-2303
    Assistant Professor
    Dept. of Electrical Engineering
    University of Washington
    Quoting William Noble <noble@gs.washington.edu>:
    > 
    > Hello, theory group.
    > 
    > I posed a question to Martin Tompa this morning that is relevant to my
    > analysis of the human genome, and he suggested that I try running it
    > by y'all.  So here goes ...
    > 
    >   You are given a sequence of letters of length L.  You select
    >   uniformly at random n distinct positions within the sequence, and
    >   break the sequence into n+1 fragments.  You then count the number X
    >   of occurrences of all fragments of a specified length m.  I'd like
    >   to know the distribution of that resulting count X.  In particular,
    >   I need to compute the p-value of observing at least p length-m
    >   fragments, so I need a cumulative density function.
    > 
    >   Actually, to make it a bit more complex, the specified value m will
    >   be a collection of values (e.g., not just fragments of length 13,
    >   but fragments of length 11, 12, 13, 17 or 22).  But I assume that if
    >   I can solve the problem above, it will be straightforward to
    >   generalize to this case.
    > 
    > I also assume that somewhere, in some form, someone has solved this or
    > a closely related problem before.  If anyone knows the answer or can
    > point me toward a solution, I'd be grateful.
    > 
    > Thanks.
    > Bill Noble
    > 
    > -----
    > William Stafford Noble
    > Assistant Professor
    > Department of Genome Sciences
    > University of Washington
    > Health Sciences Center, Box 357730
    > 1705 NE Pacific Street
    > Seattle, WA 98195
    > Tel: (206) 543-8930
    > Fax: (206) 685-7301
    > Office: J-205
    > http://www.gs.washington.edu/~noble
    > _______________________________________________
    > Theory-group mailing list
    > Theory-group@cs.washington.edu
    > http://mailman.cs.washington.edu/mailman/listinfo/theory-group
    > 
    _______________________________________________
    Theory-group mailing list
    Theory-group@cs.washington.edu
    http://mailman.cs.washington.edu/mailman/listinfo/theory-group
    

  • Next message: Dimitris Achlioptas: "RE: sequence fragments"

    This archive was generated by hypermail 2.1.6 : Fri Apr 23 2004 - 18:18:36 PDT