Nathanael Fillmore: Evaluation of de novo Transcriptome Assemblies from RNA-Seq Data
Computation and Informatics in Biology and Medicine (CIBM) seminar:
In order to understand a wide variety of cellular processes, it is useful to study a cell's transcriptome, i.e., the collection of RNA transcripts present. High-throughput RNA sequencing (RNA-Seq) is a powerful tool toward this end. In order to overcome some of the limitations of current approaches to de novo RNA-Seq assembly, we have developed a transcriptome assembly scoring function which can be used to choose the best assembly from a collection of candidate assemblies. Our score does not require any knowledge of the organism's genome or of the true set of RNA transcripts. The score is based on a probability model of the process of RNA-Seq read generation and the process of ideal transcriptome assembly. We have also developed several simple reference-based scores, and we have used these to carry out a large-scale meta-evaluation of our scoring function on real and simulated data.
