Laura LeGault and Nathanael Fillmore: AISEM 2012 - Meet the Dewey Lab
AISEM series provide a unique opportunity for both new and returning students to interact with groups of the UW Artificial Intelligence community (including machine learning, bioinformatics, computer vision etc.) and hear about their research.
Each seminar will feature a unique group. The seminar includes several short talks given by students in that group and a social time which the PI will be present. Refreshments are provided.
In this first seminar, we will introduce Prof. Colin Dewey’s lab. Dewey lab focuses on developing algorithms and statistical models for genomics, especially for analyzing RNA sequencing (RNA-Seq) data. Two members of the Dewey lab, Laura LeGault and Nathanael Fillmore, will present their work.
Title: Inference on Probabilistic Splice Graphs
Speaker: Laura LeGault
Abstract: Between the increase in high-throughput RNA sequencing and the increase in de novo transcriptome assembly, methods accounting for alternative mRNA splicing have become even more important. We discuss probabilistic models and associated inference algorithms for estimating relative frequencies of alternative transcription events and examine the practical applications of these models to axolotl research.
Title: RSEM-EVAL: A Probabilistic Transcriptome Assembly Evaluator
Speaker: Nathanael Fillmore
Abstract: In order to understand a wide variety of cellular processes, biologists find it useful to study a cell's transcriptome, i.e., the collection of RNA transcripts present in the cell. High-throughput RNA sequencing (RNA-Seq) is a (fairly) recently-developed, powerful tool toward this end. However, the physical RNA-Seq protocol only produces short reads from a full RNA transcript, so an important early step in any RNA-Seq analysis is to assemble a collection of these reads into longer sequences, ideally full-length transcripts. If the organism's genome is known, this is relatively straightforward - but most organisms' genomes are not known, and in this case, the assembly problem is rather difficult. Several computer programs exist to produce an assembly when the genome is not known, but the assemblies produced by a pair of these programs, or even by a single program with different parameter settings, can differ substantially, and it is often not clear how to choose between the assemblies.
We have developed a transcriptome assembly scoring function which can be used to choose the best assembly from a collection of candidate assemblies. Our score does not require any knowledge of the organism's genome or of the true set of RNA transcripts. The score is based on a probability model of the process of RNA-Seq read generation and the process of ideal transcriptome assembly. In the talk, I will describe (i) our model, (ii) computational challenges associated with evaluation of the score, (iii) validation of the model and meta-evaluation of the scoring function, and (iv) preliminary experimental results on real and simulated data. I will begin with background information so that the talk is accessible to a broad audience of computer scientists.
