sequence fragments

From: William Noble (noble@gs.washington.edu)
Date: Fri Apr 23 2004 - 16:28:11 PDT

  • Next message: gupta@ee.washington.edu: "Re: sequence fragments"

    Hello, theory group.

    I posed a question to Martin Tompa this morning that is relevant to my
    analysis of the human genome, and he suggested that I try running it
    by y'all. So here goes ...

      You are given a sequence of letters of length L. You select
      uniformly at random n distinct positions within the sequence, and
      break the sequence into n+1 fragments. You then count the number X
      of occurrences of all fragments of a specified length m. I'd like
      to know the distribution of that resulting count X. In particular,
      I need to compute the p-value of observing at least p length-m
      fragments, so I need a cumulative density function.

      Actually, to make it a bit more complex, the specified value m will
      be a collection of values (e.g., not just fragments of length 13,
      but fragments of length 11, 12, 13, 17 or 22). But I assume that if
      I can solve the problem above, it will be straightforward to
      generalize to this case.

    I also assume that somewhere, in some form, someone has solved this or
    a closely related problem before. If anyone knows the answer or can
    point me toward a solution, I'd be grateful.

    Thanks.
    Bill Noble

    -----
    William Stafford Noble
    Assistant Professor
    Department of Genome Sciences
    University of Washington
    Health Sciences Center, Box 357730
    1705 NE Pacific Street
    Seattle, WA 98195
    Tel: (206) 543-8930
    Fax: (206) 685-7301
    Office: J-205
    http://www.gs.washington.edu/~noble
    _______________________________________________
    Theory-group mailing list
    Theory-group@cs.washington.edu
    http://mailman.cs.washington.edu/mailman/listinfo/theory-group


  • Next message: gupta@ee.washington.edu: "Re: sequence fragments"

    This archive was generated by hypermail 2.1.6 : Fri Apr 23 2004 - 16:28:36 PDT