As AI models are released into the wild, Sharon Li wants to ensure they’re safe.
CS professor Sharon Li has been named MIT Technology Review’s Innovator of the Year! The MIT Technology Review story, which is a great read, describes Li’s research and why she was chosen Innovator of the Year:
Her approach embraces uncertainty by using machine learning to detect unknown data out in the world and design AI models to adjust to it on the fly. Out-of-distribution detection could help prevent accidents when autonomous cars run into unfamiliar objects on the road, or make medical AI systems more useful in finding a new disease. “In all those situations, what we really need [is a] safety-aware machine learning model that’s able to identify what it doesn’t know,” says Li.
We asked Li to expand on her inspiration, her team at UW-Madison, how she uses her research in teaching, and what she’s working on next. Below, read Li’s words about the exhilaration of research and the persistence that’s required, her journey “into the realm of out-of-distribution detection,” and the next steps she’s taking in her research.
Q&A
What inspired you to take this on? Were there any specific occurrences or events? Or did your previous research lead you to this topic? Or both?
My journey into the realm of out-of-distribution detection began in early 2017, when the majority of the AI research community was primarily focused on achieving higher in-distribution accuracy. While improving accuracy within known data distributions is undoubtedly important, I noticed a significant blind spot in the field — the lack of attention given to the critical issue of detecting and handling data that falls outside of those familiar distributions. This realization was the catalyst for my interest in exploring this research direction, and led to one of the first papers in the field tackling the OOD detection problem for deep neural networks. I believed that understanding and addressing this challenge was crucial for the real-world applicability and deployment of machine learning models.
My perspective on this issue was further reinforced during my time working in the industry. Working with real-world machine learning models, I witnessed firsthand the challenges and potential risks associated with models that could not effectively detect out-of-distribution data. The consequences of these blind spots became increasingly evident, highlighting the urgency of the problem.
Looking back, it has taken almost six years of continuous effort for this research direction to gain the attention it deserves. When I started, very few people in the research community were aware of this problem, but now, many researchers recognize the importance and started to actively work on it. This journey has taught me the value of persistence and the transformative power of innovation in addressing pressing challenges.
Do you have a team of UW-Madison researchers and graduate students working with you?
Absolutely. I am fortunate to have a dedicated team of brilliant UW-Madison researchers and graduate students working alongside me on this exciting project. Their passion, dedication, and innovative thinking have been instrumental in driving our research forward and have truly inspired many of the breakthroughs we’ve achieved. And this special honor should really be shared with all of my students. I am deeply grateful that they joined the journey and pursued the dream with me.
Will you use this research in your courses?
Yes, I have already integrated the findings and insights from this research into my graduate course, Advanced Deep Learning. AI safety and reliability have been a major theme covered in this course since its inception in the Fall of 2020. Every year I update the course material with the latest research advancements. It is essential to bridge the gap between cutting-edge research and classroom teaching, and this research provides an excellent opportunity to do so.
By incorporating the latest developments and breakthroughs from our research into the course curriculum, I aim to provide a unique learning experience for students at UW-Madison. They not only gain exposure to state-of-the-art techniques but also have the opportunity to produce exciting research on their own. Notably, multiple students in my course have contributed to this research area and have had their work published at top AI conferences. These student-authored papers demonstrate the caliber of talent and dedication within our program and further enrich the course material. I look forward to witnessing the growth and success of my students as they engage with this exciting research in the context of their coursework.
Did you have a breakthrough? Or was it a slow and methodical process that led you here? Or both?
My journey in this research field has been a combination of both breakthrough moments and a slow, methodical process. There have been pivotal moments when we made significant discoveries or developed innovative solutions that marked breakthroughs in our research. These moments were exhilarating and played a crucial role in shaping our direction and fueling our subsequent explorations. However, it’s equally important to acknowledge that these moments often build upon a strong foundation of persistent and methodical effort over the years. Research, especially in the complex and evolving field of AI, often involves patiently exploring different avenues, refining hypotheses, and continuously learning from both successes and setbacks.
Can you share what your next steps in this research are?
Our research has reached an exciting juncture, and our next steps will focus on deepening the theoretical foundations for OOD detection, and addressing AI safety challenges arising from the emergence of powerful models (such as large language models). The models are trained on massive amounts of data and are being utilized in various new online applications and products, including general-purpose assistants. These models can potentially propagate safety concerns writ large, causing profound adversarial impacts on our society. Thus, it’s imperative that we not only push the boundaries of performance but also prioritize safety considerations. My goal here is to understand the problem of safety related to out-of-distribution data, hallucination, alignment with human preference, and uncertainty quantification in the context of massive AI models.