Machine learning of biological networks from gene knockout data and selection of gene knockout experiments by active machine learning

Tuesday, February 9, 2016 -
4:00pm to 5:00pm
Room 1360 Biotechnology Center, 425 Henry Mall

Speaker Name: 

Yuriy Sverchkov

Speaker Institution: 

Dept. of Biostatistics and Medical Informatics, UW-Madison




Recent advances in biology and medicine have made clear the importance of network models for capturing knowledge in systems biology and in applying this knowledge to immunology, virology, metabolomics, cancer genetics, and other areas. Biological networks include regulatory networks, metabolic networks protein-protein interaction networks, and their combinations. Current knowledge about the exact structure of these networks is often incomplete, as the number of entities involved (genes, proteins, etc.) is often quite large, and it is difficult and expensive
to experimentally determine interactions among the entities. This creates a challenge both for inferring these networks from data and for designing informative experiments for further elucidating our models. There is a need to select experiments efficiently and cost-effectively for gaining this knowledge about the models for furthering our understanding of biology.

A common approach to uncovering biological networks involves performing perturbations on elements of the network and observing how the perturbation affects some measurable product (reporter) of the process under study. In this talk I focus on knockout/knockdown experiments, where we observe the effe of removing or deactivating a gene. In this talk I focus on machine learning approaches to learning network structure from such data, and on an active machine learning approach for systematically selecting informative future experiments.

Particularly, I present a novel active learning approach for selecting batches of double-knockout experiments based on an existing network inference method. I also present a new method for network inference that learns network structures where two genes may be in the same pathway when viewed upstream of one reporter, but in different pathways upstream of another, to reflect the biological phenomenon that different sets of molecules might be available in different pools in the cell. This is a move away from the traditional network learning paradigm where a single structure is meant to apply to all contexts.