Automatically choosing good FootPrinter parameters.
This is an art manually, so what I'd like you to do first is come up
with a list of criteria you will use to compare two FootPrinter
outputs to decide which is better. Obvious symptoms of bad output
include too few motifs or too many motifs, but you have to decide
what's too few or too many. You should prefer motifs that span more
of the tree over ones that span less. You should prefer sets of
motifs that occur in (roughly) the same order in many species. Take a
look at the examples on the project
FootPrinter page to get a feel for the comparison of different
outputs.
Once you have the list of criteria, what I expect you'll want your
program to do is automatically try various settings of FootPrinter's
parameters, selecting the 1 or 2 settings that lead to the best
output according to your criteria. You might want to try motif sizes
10 and 8. You might want to try subregion change costs 1 and 0.
You'll want to allow for losses, but the biggest challenge may be to
get the config file right. One thing I'd try is a simple rescaling
of FootPrinter's 3 "universal" config files. (From the manual: "For
a motif of size X, three files are provided: universalXloose.config ,
universalX.config and universalXtight.config, which will respectively
report motifs that are somewhat significant, significant or very
significant, approximatively corresponding to p-values of 0.2, 0.1
and 0.05 respectively.") My first guess would be to scale all the
spans in these files by C/F, where C is the sum of all ClustalW's
branch lengths and F is the sum of all FootPrinter's branch
lengths if you were to use the -compute_branch_lengths option (which
you won't in your real program). Just these few suggestions will
lead to 2x2x3 settings of FootPrinter's parameters to be compared
according to your criteria. You may want to do more.
Use your team's wiki to post good FootPrinter outputs for the folC
data set, and say whether it was found manually or automatically by
your phase 2 program.