Multi-task Learning for in silico Prediction of Chromatin Contact Counts

Tuesday, March 28, 2017 -
4:00pm to 5:00pm
Room 1360 Biotechnology Center, 425 Henry Mall

Speaker Name: 

Deborah Chasman

Speaker Institution: 

CIBM Postdoctoral Fellow, Wisc. Institute for Discovery, Laboratory of Sushmita Roy, UW-Msadison

Cookies: 

No

Description: 

Regulatory sequence elements such as enhancers can regulate the expression level of a gene hundreds of kilobases away by chromosomal looping. Such long-range regulatory interactions are emerging as important determinants in tissue-specific expression, interpretation of regulatory variation, and in disease. However, identifying long-range interactions on a genome-scale is a significant challenge, especially in new biological contexts such as rare cell types and non-model species. As three-dimensional chromatin contact data are available for a limited number of contexts, accurate predictive models are needed to identify distal regulatory links. Our group and others have developed machine learning approaches that perform binary classification of chromatin interaction (Roy et al 2015, Whalen et al 2016, He et al 2014) or predict contact count (Zhang et al, unpublished, Chen et al 2016) using one-dimensional regulatory genomic features such as chromatin marks, architectural and transcription factor proteins, and chromatin accessibility. However, current approaches have a lower accuracy in predicting interactions in new cell lines than when predicting interactions among test pairs in the same cell line. We are developing novel computational methods based on multi-task learning to integrate information from multiple cell types and high-throughput 3C experimental technologies for improved prediction of chromatin interactions in new cell types. Our preliminary results using simple ensembles and off-the-shelf multi-task learning (MTL) methods trained on other cell lines suggest that integrating data from multiple cell lines is beneficial and newer MTL frameworks specifically geared towards our problem domain could significantly improve prediction accuracy.