|
CMU-CS-05-181
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-05-181
FPT Algorithms for Binary Near-Perfect Phylogenetic Trees
Srinath Sridhar, Kedar Dhamdhere*, Guy E. Blelloch,
Eric Halperin**, R. Ravi++, Russell Schwartz+
September 2005
CMU-CS-05-181.ps
CMU-CS-05-181.pdf
Keywords: Phylogenetic trees, Parsimony, Near-perfect phylogeny
We consider the problem of reconstructing near-perfect phylogenetic
trees using binary character states
(referred to as BNPP). A perfect
phylogeny assumes that every character mutates at most once in the
evolutionary tree, yielding an algorithm for binary character states
that is computationally e cient but not
robust to imperfections in real
data. A near-perfect phylogeny relaxes the perfect phylogeny assumption
by allowing at most a constant number of additional mutations. In this
paper, we present a simple lower bound for the size of an optimal
phylogeny, develop two algorithms for constructing optimal phylogenies
and show experimental results for one of the variants. The first
algorithm is intuitive and reconstructs an optimal near-perfect
phylogenetic tree in time
(q + k)O(q)nm + O(nm2) where
k is the number f characters that share
four gametes with
some other character. A second, more involved algorithm shows the problem
to be fixed parameter tractable in q by solving it in time
qO(q)nm + O(nm2)
where n is the number of taxa and m is the number of characters.
This is a significant improvement over the previous best result of
nmO(q)2O(q2s2), where
s is the number of states per character (2 for binary).
We implement the first algorithm and show that it
finds the optimal solution quickly for a selection of population
datasets including mitochondrial and Y chromosome samples from
humans and other primates. Our results describe the first practical
phylogenetic tree reconstruction algorithm that finds guaranteed
optimal solutions while being easily implemented and computationally
feasible for data sets of biologically meaningful size and complexity.
37 pages
*Google, Inc. Mountain View, CA
**ICSI, University of California, Berkeley
++Tepper School of Business, Carnegie Mellon University
+Department of Biological Sciences, Carnegie Mellon Un
iversity
|