Studies have shown that over 85% of the information people and organizations deal with is unstructured, and that the vast majority of it is in text. A multitude of techniques has to be used in order to enable intelligent access to this information and support transforming it to forms that allow sensible use of the information. The fundamental issue that all these techniques have to address is that of semantics – there is a need to move toward understanding the text at an appropriate level of abstraction, beyond the word level, in order to support access, knowledge extraction and synthesis. We will discuss two key issues: (1) From Data to Meaning: There are several dimensions of text understanding that can facilitate access to information and the extraction of knowledge from unstructured text, and (2) Trustworthiness of Information: While we can locate and extract information quite reliably, we lack ready means to determine whether we should actually believe them.
In particular, I will describe some of our research on developing machine learning and inference methods in pursue of understanding natural language text, and point to some of the key challenges, including those of learning models with indirect supervision, knowledge acquisition, and reasoning.
Dan Roth is the Founder Professor of Engineering at the University of Illinois at Urbana-Champaign. He is a Professor in the Computer Science Department and the Beckman Institute and holds positions also in Statistics, Linguistics, ECE, and the iSchool at Illinois. He received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D in Computer Science from Harvard University in 1995.
Roth is a Fellow of the American Association for the Advancement of Science (AAAS), the Association of Computing Machinery (ACM), the Association for the Advancement of Artificial Intelligence (AAAI), and the Association of Computational Linguistics (ACL), for his contributions to Machine Learning and to Natural Language Processing. Roth is the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). He was the program chair of AAAI’11, ACL’03 and CoNLL'02.
He has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely by the research community and commercially.