SEPP: SATe-Enabled Phylogenetic Placement
Tandy Warnow
Phylogenetic placement arises in the analysis of metagenomic data, in which the objective is to insert short molecular sequences (called "query sequences") into an existing phylogenetic tree and alignment on full-length sequences for the same gene. Phylogenetic placement has the potential to provide information beyond pure species identification (i.e, the association of metagenomic reads to existing species), because it can also give information about the evolutionary relationships between these query sequences and to known species. We present SEPP, a general "boosting" technique to improve the accuracy and/or speed of phylogenetic placement techniques. The key algorithmic aspect of this booster is a dataset decomposition technique in SATe (Liu et al., Science 2009), a method that utilizes an iterative divide-and-conquer technique to co-estimate alignments and trees on large molecular sequence datasets. We show that SEPP improves current phylogenetic placement methods, placing metagenomic sequences more accurately when the set of input sequences has a large evolutionary diameter and produces placements of comparable accuracy in a fraction of the time for easier cases. Joint work with Siavash Mirarab and Nam Nguyen, PhD students at UT-Austin.
