One way to provide deeper insight into data is to reason about the underlying causal process that produced it. I'll present model-based approaches for discovering and managing language data that incorporate rich causal structure in novel ways. First, I'll describe a new approach to automatic text summarization that incorporates syntactic structure into a decision process that learns from human summaries. Second, I'll describe an approach to historical document recognition that uses a statistical model of the historical printing press to reason about images, and, as a result, is able to decipher historical documents in an unsupervised fashion. I'll hint at how similar approaches can be used for a range of other problems and types of data.
Taylor Berg-Kirkpatrick is a PhD candidate in computer science at the University of California, Berkeley. He works with professor Dan Klein on using machine learning toÂ understand structured human data, including language but also sources like music, document images, and other complex artifacts. Taylor completed hisÂ undergraduate degree in mathematics and computer science at Berkeley as well, where he won the departmental Dorothea Klumpke Roberts Prize in mathematics. As a graduate student, Taylor has received both the Qualcomm Innovation Fellowship and the National Science Foundation Graduate Research Fellowship.