For the Good News list, visit here. The notes below begin with the June 2024 Good News. For notes previous to June 2024, visit the Good News list.
This is an accordion element with a series of buttons that open and close related content panels.
August 2025
[1] https://www.vldb.org/awards_early_career.html
This award recognizes a researcher who has demonstrated research impact through a specific technical contribution of high significance since completing the Ph.D. The award is given for a specific technical contribution and not for the researcher’s accumulative contributions across diverse topics. The technical contribution must have been made after completing the Ph.D. work. In other words, research prior to obtaining the Ph.D. cannot be used towards this award. The candidate should have been awarded the Ph.D. no earlier than June 15, 8 years ago.
[2] https://www.ams.org/meetings/sectional/2322_program_ss25.html#title
This session welcomes talks in any area related to the work of Eric Bach. This primarily includes number theory and algorithms and related areas, but anything connected to Professor Bach’s work is welcome.
[3] “CAREER: Game-theoretic Foundations for Incentive-aware Data Sharing and Collaborative Machine Learning”
Kirthi Kandasamy
https://www.nsf.gov/funding/opportunities/career-faculty-early-career-development-program
Abstract: With recent advances in artificial intelligence, organizations increasingly recognize data as a vital resource for advancing scientific, economic, and societal progress. Many modern machine learning systems rely on large and complex datasets to learn patterns, generate insights, create new content, and support intelligent decision-making. However, the data needed to develop these models is often scattered across different organizations, with each holding only a portion of the necessary information. By sharing data with each other, instead of relying only on their own data, organizations can unlock new opportunities for discovery and innovation. Despite this potential, organizations often hesitate to share data unless there are clear incentives to do so. This raises important questions: Is it always beneficial for an organization to share data? Do participants receive benefits that reflect their contributions? What mechanisms can encourage data sharing while preventing participants from benefiting without contributing? When several parties collaborate to train a shared model, how should the benefits be allocated? This project addresses these questions by investigating the incentives that influence participation in data sharing and collaborative machine learning. The project aims to create environments where data sharing supports collaboration and innovation, ultimately enabling breakthroughs in areas such as health care, public policy, and education.
This project will develop new theoretical foundations for the design of protocols that govern data sharing and collaborative machine learning, focusing on two central incentive-related challenges: the balanced allocation of responsibilities and benefits, and reducing the potential for strategic behavior. The research will design protocols that assign data collection responsibilities in a balanced way, allocate shared data in line with contributions, and ensure that exchanges among competing participants are mutually beneficial. These protocols will also be designed to prevent strategic behaviors to exploit the system, such as avoiding data collection, withholding contributions, or providing incorrect data. The project will further explore multi-round collaborations, where participants exchange data over time, adapt their strategies, and operate under uncertainty about the value of others? data. The proposed methods will be evaluated through simulations and partnerships with Alzheimer?s disease research consortia. The project is expected to contribute broadly to the fields of machine learning and game theory. Education efforts will focus on the development of new courses, workshops, and research opportunities for undergraduate and high-school students.
[4] Systems, Methods, and Media for Estimating Depths in a Scene using Single-Photon Detectors and Binary Structured Light Patterns
Varun Sundar and Mohit Gupta
Abstract: In accordance with some embodiments, systems, methods, and media for estimating depths in a scene using single-photon detectors and binary structured light patterns are provided. In some embodiments, a system comprises: a light source; an image sensor comprising an array including single-photon detectors; a processor programmed to: cause the light source to emit a sequence of n binary light patterns toward the scene, each comprising at least L columns, each of the patterns has a minimum stripe width of 8 columns; and n>logiL); cause, for each of then binary light patterns, the image sensor to capture a binary image of the scene, the binary image comprises a binary value for each of single DMD projector (Binary patterns) 20,000 Hz photon detectors such that each of the single-photon detectors generates a set of binary values; and estimate, for each single-photon detector, a depth value based on the set of binary values.
Systems, methods, and media for high dynamic range imaging using single-photon and conventional image sensor data
Felipe Gutierrez Barragan, YuHao Liu, Atul Ingle, Mohit Gupta, Andreas Velten
Abstract: In accordance with some embodiments, systems, methods, and media for high dynamic range imaging using single-photon and conventional image sensor data are provided. In some embodiments, the system comprises: first detectors configured to detect a level of photons proportional to incident photon flux; second detectors configured to detect arrival of individual photons; a processor programmed to: receive, from the first detectors, first values indicative of photon flux from a scene with a first resolution; receive, from the second detectors, second values indicative of photon flux from the scene with a lower resolution; provide a first encoder of a trained machine learning model first flux values based on the first values, provide the second encoder of the model second flux values; receive, as output, values indicative of photon flux from the scene; and generate a high dynamic range image based on the third plurality of values.
[5] https://operations.access-ci.org/step
This program is intended for undergraduates who are interested in pursuing careers in cyberinfrastructure! Our goal is to provide students with exposure to these careers in an effort to grow the cyberinfrastructure workforce.
https://nationalsciencedatafabric.org/nsdf-ahm-2025-01
Abstract: The PATh project offers a variety of distributed data handling services. Researchers who place their workloads at PATh Access Points leverage these services to bring input data to their applications and to ship results back to archival sites. They also employ them to copy the applications, in most cases in the form of container images, to execution sites provided by the Open Science Pool (OSPool). Owners of Object Stores who federate datasets they host via the Open Science Data Federation (OSDF) use these services to make locally stored objects available to remote consumers for frequent reuse.
In the past 12 months PATh services have performed more than 3B Object transfers across more than 100 institutions. The distributed nature of the infrastructure requires objects to be copied across institution boundaries and to be buffered in memory or disk at multiple locations. These operations clearly introduce concerns about data integrity and access control. Do the researchers and Object providers who use these services believe that the PATh services are trustworthy?
The talk will review our ongoing effort to develop data handling services that we believe to be trustworthy. A key element of our current development work is transitioning our software tools and infrastructure into capability-based authorization. We hope that facilitating trust relationships through capabilities will help researchers and data providers to entrust their objects with the PATh services. Namely, to believe that the OSDF and the OSPool data handling services are trustworthy.
[6] GRC Annual Review, Celebration & Transition to SMART USA
https://www.src.org/program/grc/esh/events/
“Evaluating Accelerator Performance Under Advanced Cooling”
Authors: Akansha Chaudhari & Matthew D. Sinclair
Abstract: Escalating compute demands are driving power-dense heterogeneous architectures, making advanced cooling essential. Moreover, modern computing systems are becoming increasingly heterogeneous, yet simulation and modeling tools struggle to effectively model these systems. In this work, we have added support to the state-of-the-art gem5 to model systems with CPUs, GPUs, and accelerator. This enables their early-stage exploration through high-fidelity simulation. Then, we demonstrate the value of this support by studying the amenability of these accelerators to advanced cooling techniques.
“Simulating and Modeling Hardware for Machine Learning Workloads at Scale”
Authors: Akanksha Chaudhari, Vishnu Ramadas, & Matthew D. Sinclair
Abstract: As both the size of Machine Learning (ML) and Artificial Intelligence (AI) hardware and the data they operate on scale-up, there is a growing challenge in performing early-system exploration based on sound simulation methodology. Using existing tools and infrastructure, a detailed simulation of a contemporary deep neural network can take years. In addition, the difficult task of modeling energy at scale, without working RTL for a newly proposed accelerator, or accelerator component (i.e., Tensor Cores), remains an important research challenge. In particular modeling energy in a way that is flexible and can be extended to both different types of programmable accelerators (like GPUs) and to more bespoke discrete accelerators (like TPUs) is needed. The broad goals of this proposal are to (1) reduce detailed simulation time, particularly on ML training and inference workloads, by characterizing execution in silicon, validated with detailed simulation, to find representative program slices, (2) develop a flexible energy model for accelerators, and (3) develop infrastructure to model the multi-GPU and multi-accelerator system of the future. To tackle these problems, we propose to extend existing simulation infrastructure, such as gem5, to create a scalable, parallel simulation framework that models diverse system components with varying levels of detail. In this talk, I discuss our on-going efforts to identify the most important parts of the applications to simulate at high fidelity while reducing simulation time.
[7] Prof Abedia led a measurement campaign at the VLA telescope in New Mexico, which conducted the first-ever active transmission experiment at the site. This was a joint effort between NRAO, NTIA, and SpectrumX, with me serving as the lead PI for the experiment on behalf of SpectrumX.
Summary: The 7.125–8.4 GHz band has recently emerged as a promising candidate for future 6G mobile networks due to its favorable propagation characteristics and global harmonization potential. Although not formally protected for radio astronomy, this band is currently one of the cleanest available and plays a critical role in high-sensitivity observations of short-duration and non-thermal cosmic phenomena. Given the extreme sensitivity of radio telescopes, even low-power mobile devices operating near observatories can introduce interference, potentially compromising scientific data. To investigate this, we conducted the first-ever active transmission experiment at the Very Large Array (VLA), using a custom-built platform to emulate 6G-like signals and study their impact on real astronomical observations. To explore potential mitigation strategies, we also introduced a narrow-band beacon signal to assist with interference detection, particularly when interference is near or below the telescope’s noise floor. This work aims to inform future spectrum-sharing efforts by providing experimental insights into both the risks and opportunities associated with the coexistence of 6G networks and radio astronomy.
[8] “Take The Long Way Home – Distant Peering to the Cloud”
Esteban Carisimo, Mia Weaver, Fabian Bustamante and Paul Barford
To Appear in IEEE/ACM Transactions on Networking.
Abstract: The emergence of large cloud providers in the last decade has transformed the Internet, resulting in a seemingly ever-growing set of datacenters, points of presence, and network peers. Despite the availability of closer peering locations, some networks continue to peer with cloud providers at distant locations, traveling thousands of kilometers. In this paper, we employ a novel cloud-based traceroute campaign to characterize the distances networks travel to peer with the cloud. This unique approach allows us to gain unprecedented insights into the peering patterns of networks. Our findings reveal that 50% of the networks peer within 300 kilometers of the nearest datacenter. However, our analysis also reveals that over 20% of networks travel at least 6,700 kilometers beyond the proximity of the nearest computing facility, and some as much as 18,791 kilometers! While these networks connect with the cloud worldwide, from South America to Europe and Asia, many come to peer with cloud providers in North America, even from Oceania and Asia. We explore possible motivations for the persistence of distant peering, discussing factors such as cost-effective routes, enhanced peering opportunities, and access to exclusive content.
[9] “LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics”
Xiangpeng Hao, Andrew Lamb, Yibo Wu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau
VLDB
We present LiquidCache, a novel pushdown-based disaggregated caching system that evaluates filters on cache servers before transmitting data to compute nodes. Our key observation is that data decoding, not filter evaluation, is the primary CPU bottleneck in existing systems. To address this challenge, we propose caching logical data rather than its physical representation by transcoding upstream files on-the-fly into an optimized “Liquid” format. This format is co-designed with filter evaluation semantics to enable selective decoding, late materialization, and encoding-aware filter evaluation, delivering efficient filter evaluation while preserving compression benefits. The “Liquid” format exists exclusively in the cache, allowing easy adoption without changing existing storage formats, and enabling LiquidCache itself to evolve and incorporate future encodings while maintaining ecosystem compatibility. Through integration with Apache DataFusion and evaluation with ClickBench, we demonstrate that LiquidCache achieves up to 10× lower cache server CPU usage compared to state-of-the-art systems without increasing memory footprint.
“PANGOLIN: a Comprehensive Testing Framework for Configuration-Rich Key-Value Stores”
Shaohua Duan, Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau
ACM SYSTOR
In this paper, we present PANGOLIN, a comprehensive testing framework for configuration-rich key-value stores. To better understand bugs in modern key-value stores and explore domain knowledge for efficiently identifying new ones, we first comprehensively study historical bugs in five mature key-value stores during the last eight years. Then, we design and implement PANGOLIN, which is motivated by insights from our bug study, which indicated most bugs could be identified by systematically testing a small sequence of operations and configurations. Specifically, PANGOLIN practices these insights by introducing a bounded testing strategy into a spectrum of black-box and fuzzing test procedures. Finally, we utilize PANGOLIN to find 20 bugs and reproduce 443 historical bugs in five mature key-value stores (RocksDB, LevelDB, HyperlevelDB, BadgerDB, and Redis), making it an attractive supplement to handwritten test suites.
[10] https://cdis.wisc.edu/preparing-students-for-the-ai-era-cdis-and-openai-launch-sail/
As AI continues to reshape the global job market, universities are developing new programs to help students navigate the changing landscape. At the University of Wisconsin–Madison, the School of Computer, Data & Information Sciences (CDIS) launched forward-looking initiative, supported by OpenAI, designed to prepare students to lead in the AI era.
Read the rest above…
[11] “Switched or Switchless: An Empirical Study of SAN Architecture for Disaggregated Storage”
Chendong Wang, Joontaek Oh, and Ming Liu
NSDI ‘26
Disaggregated storage is a pivotal component of today’s cluster infrastructures. With the advent of high-bandwidth server interconnects and new NVMe form factors, commodity storage appliances are becoming denser, delivering tens of millions of IOPS. This calls for today’s storage area network (SAN) fabric to expand the bandwidth capacity drastically. Industry practices tackle this issue via either (i) a scale-up approach, upgrading the per-port bandwidth in a switched SAN; or (ii) a scale-out strategy, integrating more paths in a switchless SAN. However, it is unclear which network architecture is more suitable for scaling storage disaggregation.
This paper presents an empirical study that compares a switched and switchless SAN from several angles. We first design an experimental methodology–consisting of small-scale real-system prototypes and large-scale simulations–that provides adequate flexibility when exploring SAN architecture. Next, we characterize NVMe-oF I/O flows and co-design the SAN traffic control with these characteristics to optimize I/O transmission performance in both architectures. Our evaluations draw several findings and show that the switchless SAN entails several advantages. For example, it delivers the same throughput on par with the switched SAN, albeit enclosing more routing hops, but with lower latency, because of the multiple load-aware I/O paths, mitigating I/O interference. Further, the switchless SAN saves capital costs by replacing expensive high-radix switches, scales with heterogeneous I/O flows, and eliminates the single-point ToR switch failure.
[12] Papers to appear in Science Robotics.
These are in collaboration with people at U Chicago and U Illinois Chicago.
Wright, L., Vegesna, P., Michaelis, J., Mutlu, B., and Sebo, S. Robotic Reading Companions Can Mitigate Oral Reading Anxiety in Children. Science Robotics, In press.
Michaelis, J., & Mutlu, B., How can educational robots enhance family life? Through careful integration. Science Robotics, In press.
[13] “Choreographic Quick Changes: First-Class Location (Set) Polymorphism”
Ashley Samuelson, Andrew K. Hirsch, Ethan Cecchetti
OOPSLA
Choreographic programming is a promising new paradigm for programming concurrent systems where a developer writes a single centralized program that compiles to individual programs for each node. Existing choreographic languages, however, lack critical features integral to modern systems, like the ability of one node to dynamically compute who should perform a computation and send that decision to others. This work addresses this gap with λQC, the first typed choreographic language with \emph{first class process names} and polymorphism over both types and (sets of) locations. λQC also improves expressive power over previous work by supporting algebraic and recursive data types as well as multiply-located values. We formalize and mechanically verify our results in Rocq, including the standard choreographic guarantee of deadlock freedom.
Paper details on arxiv: https://arxiv.org/abs/2506.10913
[14] Collaborative Research: CIRC: ENS: Enabling Detailed, Open-Source Accelerator Modeling
Prof Matt Sinclair and colleagues
Abstract: The rapid end of Moore’s Law and Dennard scaling has driven computing systems — from smartphones to supercomputers—to embrace heterogeneous architectures for continued efficiency gains. As specialized, compute-intensive workloads such as machine learning become increasingly prominent, there is a critical need for accurate, open-source simulation tools to model and evaluate the next generation of hardware accelerators. However, the pace of innovation in accelerator architectures, particularly graphics processing units (GPUs), has outstripped the capabilities of existing public simulation frameworks, limiting the research community’s ability to explore new ideas and validate results. This project addresses these challenges by enhancing the widely used Accel-Sim simulation infrastructure, enabling detailed, validated modeling of modern and future accelerators. The proposed enhancements will empower a broad community of researchers—including those from underrepresented groups — to advance innovations in computer architecture, improve system efficiency, and support the development of emerging applications that rely on high-performance accelerators.
This award will significantly extend Accel-Sim’s capabilities through three major technical thrusts. First, the project will modernize and expand Accel-Sim’s performance and energy models to support the latest GPU architectures (including NVIDIA’s Ampere, Hopper, and Blackwell), incorporating features such as transformer engines, sparse tensor cores, and support for asynchronous execution. Second, the project will broaden the diversity of accelerators and workloads modeled by Accel-Sim, adding support for GPUs from additional vendors (such as AMD and Intel) and integrating with broader system simulation frameworks like gem5. Third, the project will develop advanced workload sampling and telescopic level-of-detail modeling to enable scalable, accurate simulation of long-running, compute-heavy workloads. These enhancements will be delivered as robust, open-source tools, accompanied by extensive documentation, community outreach, and training resources — including tutorials and summer boot camps — to ensure broad accessibility and long-term sustainability. Collectively, these efforts will provide the research community with essential infrastructure to drive the next decade of accelerator innovation and foster a more inclusive and collaborative ecosystem for computer systems research.
[15] In July, a team of students and Prof Josiah Hanna participated in the international RoboCup competition and placed 3rd among the 12 teams invited to participate in the standard platform soccer competition in Salvador, Brazil. Two Wisconsin graduate students, Adam Labiosa and Abhinav Harish, led the effort at the competition, supported by many contributions from several other undergraduate and graduate students over the preceding months.
[16] Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution
Alex Clinton, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy
ICML 2025
https://arxiv.org/abs/2407.15881
Abstract: We study a collaborative learning problem where m agents aim to estimate a vector µ = (µ1, . . . , µd) ∈ R d by sampling from associated univariate normal distributions {N (µk, σ2 )}k∈[d] . Agent i incurs a cost ci,k to sample from N (µk, σ2 ). Instead of working independently, agents can exchange data, collecting cheaper samples and sharing them in return for costly data, thereby reducing both costs and estimation error. We design a mechanism to facilitate such collaboration, while addressing two key challenges: ensuring individually rational (IR) and fair outcomes so all agents benefit, and preventing strategic behavior (e.g., non-collection, data fabrication) to avoid socially undesirable outcomes. We design a mechanism and an associated Nash equilibrium (NE) which minimizes the social penalty–sum of agents’ estimation errors and collection costs–while being IR for all agents. We achieve a O( √ m)–approximation to the minimum social penalty in the worst case and an O(1)– approximation under favorable conditions. Additionally, we establish three hardness results: no nontrivial mechanism guarantees (i) a dominant strategy equilibrium where agents report truthfully, (ii) is IR for every strategy profile of other agents, (iii) or avoids a worst-case Ω(√ m) price of stability in any NE. Finally, by integrating concepts from axiomatic bargaining, we demonstrate that our mechanism supports fairer outcomes than one which minimizes social penalty.
[17] “Pretrained Hybrids with MAD Skills”
Nicholas Roberts, Samuel Guo, Zhiqi Gao, Srinath Namburi, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala Conference on Language Modeling (COLM), 2025
Abstract: While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed hybrid architectures seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose Manticore, 1 a framework that addresses these challenges. Manticore automates the design of hybrid architectures while reusing pretrained models to create pretrained hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by incorporating simple projectors that translate features between pretrained blocks from different architectures. We then finetune hybrids that combine pretrained models from different architecture families—such as the GPT series and Mamba—end-to-end. With Manticore, we enable LM selection without training multiple models, the construction of pretrained hybrids from existing pretrained models, and the ability to program pretrained hybrids to have certain capabilities. Manticore hybrids outperform existing manually-designed hybrids, achieve strong performance on Long Range Arena (LRA) tasks, and can improve on pretrained transformers and state space models.
https://arxiv.org/abs/2406.00894
“Theoretical Physics Benchmark (TPBench) – a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics”
Daniel J.H. Chung, Zhiqi Gao, Yurii Kvasiuk, Tianyi Li, Moritz Münchmeyer, Maja Rudolph, Frederic Sala, Sai Chaitanya Tadepalli
Machine Learning: Science and Technology 2025
Abstract: We introduce a benchmark to evaluate the capability of AI to solve problems in theoretical physics, focusing on high-energy theory and cosmology. The first iteration of our benchmark consists of 57 problems of varying difficulty, from undergraduate to research level. These problems are novel in the sense that they do not come from public problem collections. We evaluate our data set on various open and closed language models, including o3-mini, o1, DeepSeek-R1, GPT-4o and versions of Llama and Qwen. While we find impressive progress in model performance with the most recent models, our research-level difficulty problems are mostly unsolved. We address challenges of auto-verifiability and grading, and discuss common failure modes. While currently state-of-the art models are still of limited use for researchers, our results show that AI assisted theoretical physics research may become possible in the near future. We discuss the main obstacles towards this goal and possible strategies to overcome them. The public problems and solutions, results for various models, and updates to the data set and score distribution, are available on the website of the dataset.
https://arxiv.org/abs/2502.15815
[18] “Deploying RL with Confidence via Active and Offline Policy Evaluation”
Josiah Hanna
IJCAI Early Career Spotlight Invited Talk
Abstract: Recent years have seen a surge of interest in reinforcement learning (RL) as a powerful method for enabling AI agents to learn how to act so as to achieve the goals set by their designers. In practice, a crucial question in RL applications is how to decide when a learned policy is performant enough for deployment and, just as importantly, when a learned policy should not be deployed. In this talk, I will describe recent work from my group on methods that aim to enable RL practitioners to answer this question and thus to enable the use of RL in domains where extensive testing of learned policies is difficult or impossible. I will first talk about a line of work in my group on offline policy evaluation (OPE), or predicting the performance of an untested policy using data from previously used policies. The key novelty in these works is to leverage state abstraction and representation learning to scale OPE methods to more complex domains such as robot control. Then I will discuss a line of work on active data collection for data-efficient evaluation of RL policies. In this line of work, we have shown how to adaptively collect data in order to effectively evaluate an RL policy with as few real-world interactions as possible. Taken together, these lines of work are an important step toward instilling confidence in decision-making trained with RL.
“Thinking is Another Form of Control”
Josiah Hanna
“Most Thought-Provoking” paper award at the Finding the Frame Workshop at the Reinforcement Learning Conference (RLC)
Abstract: Dual process theory divides cognitive processing into a fast, intuitive System 1 and a slow, deliberative System 2. In reinforcement learning (RL), model-free learning, in which the agent takes actions with a reactive policy, is reminiscent of System 1, whereas model-based decision-time planning is reminiscent of System 2. This paper presents the view that deliberative, System 2 behaviors (“thinking”) can be considered a form of mental action that an agent performs before taking an action that influences its external environment. Under this view, we hypothesize that model-free RL alone would be sufficient to produce deliberation if these mental actions ultimately led to higher value actions being selected. We formalize the notion of a controllable “thought” state, then prove conditions under which “thinking” emerges as a strategy for reward maximization, and discuss how large language models serve as a proof-of-concept for thinking as mental action. Finally, we conclude by discussing new opportunities for research on model-free RL agents that learn both to think and act.
[19] “The Internet Visualization Exhibition”
Paul Barford and Loqman Salamatian
To appear at ACM SIGCOMM ’25.
Overview: As the Internet grows more complex, opaque, and deeply embedded in society, visualization plays an increasingly vital role in understanding its structure, behavior, and evolution. From classic time series to immersive interactive maps, visual tools help expose anomalies, reveal hidden geometries, and make abstract patterns tangible. The Internet Visualization Exhibition (IVE) celebrates this creative dimension of Internet measurement and systems research. It invites researchers, practitioners, and artists to share novel visualizations, images, animations, and interactive artifacts that distill insight from data and make Internet infrastructure more legible. IVE aims to foster dialogue between disciplines, promote visual thinking as a research skill, and inspire new ways of representing complexity at Internet scale. For more information, or to submit visuals to be featured in the session, please visit: https://burdantes.github.io/ive
[20] An undergraduate student of Prof Kirthi Kandasamy, Shangyuan (Peter) Yang, received an honorable mention for a CRA Outstanding Undergraduate Researcher Award.
[21] Morgridge Hall opening:
Milwaukee Journal Sentinel: https://www.jsonline.com/story/news/education/2025/09/05/university-of-wisconsin-madison-debuts-new-computer-sciences-building/83793528007/
Milwaukee Journal Sentinel (photo essay): https://www.jsonline.com/picture-gallery/news/2025/09/04/uw-madison-unveils-new-morgridge-hall-on-the-first-day-of-classes/85974982007/
Wisconsin State Journal (PDF attached if you run into the paywall): https://madison.com/news/local/education/university/article_d1b94aab-b39f-433b-86e8-b36303acf0a7.html
(also appeared as the front page story in print)
And on TV! WKOW (ABC affiliate): https://www.wkow.com/news/education/doors-open-for-uw-madisons-new-school-of-computer-data-information-sciences/article_e7ab8248-491d-4f94-bbdb-85d3828a081d.html
[22] Welcome to new professors!
Gavin Brown (dbrown5@wisc.edu) – PhD at Boston U – “I work on machine learning and data privacy. I am interested in understanding when and why machine learning models memorize large amounts of training examples. I also study this topic through the lens of differential privacy.”
Mike Hagenow (mhagenow@wisc.edu) – PhD at UW-Madison – “I work on developing methods for effective human-robot teaming with a focus on contact-rich tasks that are physically demanding on people. My main research interests lie in the shared autonomy and robot learning.”
Junjie Hu (jhu@cs.wisc.edu) – PhD at CMU – “I have a broad interest in natural language processing and machine learning. In particular, my research focuses on algorithmic design and fundamental understanding of machine learning models in NLT that enable safe deployment in the wild”
Misha Khodak (khodak@wisc.edu) – PhD at CMU – “My body of work includes fundamental theory for modern meta-learning (scalable methods that “learn-to-learn” using multiple learning tasks as data) and end-to-end guarantees for learning-augmented algorithms (algorithms that incorporate learned predictions about their instances to improve performance).”
Charles Yuan (charlesyuan@cs.wisc.edu) – PhD at MIT – “I study how to program a quantum computer to practically realize quantum algorithms. My research thus builds a new software stack of languages, libraries, and compilers to manipulate and reason about quantum information.”
Jiawei Zhang (jzhang2924@wisc.edu) – PhD at Chinese University of Hong Kong- “Research interests include: Nonlinear and convex optimization: theory and algorithms; Optimization, generalization, and robustness of machine learning, reinforcement learning, generative models (including diffusion models, large models, foundation models); Data-driven decision-making under uncertainty.”
This is an accordion element with a series of buttons that open and close related content panels.
June/July 2025
[1] https://blog.google/products/google-cloud/ml-systems-junior-faculty-awards/
From the blog post: “As part of our ongoing commitment to the academic research community, we’re announcing a new program, the Google ML and Systems Junior Faculty Awards, which we are presenting today to more than 50 assistant professors in 27 U.S. universities. These professors are leading the analysis, design and implementation of efficient, scalable, secure and trustworthy computing systems. Their work crosses the technology stack, from algorithms to software and hardware, enabling machine learning and cloud computing at increasingly massive scale. The recipients, who were selected by a distinguished group of Google engineers and researchers, will receive grants of $100,000 in unrestricted funding.”
[2] https://www.cs.wisc.edu/2025/07/23/jin-yi-cai-awarded-warf-named-professorship
“Computer Sciences Professor Jin-Yi Cai has been awarded a WARF Named Professorship. The professorship honors “faculty who have made contributions to the advancement of knowledge” through their research, teaching, and service activities. Nine other UW-Madison faculty members received the award, a distinguishing feature of which is that recipients choose the name for their professorship.”
[3] https://morgridge.org/story/htcondor-celebrates-40-years-powered-by-community/
“The high-throughput computing community — including scientists, researchers, developers, administrators and facilitators — gathered at UW–Madison on June 3 to celebrate 40 years of ‘The Condor Project,’ now known as HTCondor.
The brainchild of Miron Livny, lead investigator of research computing at the Morgridge Institute and director of the Center for High-Throughput Computing, HTCondor uses a vast network of computing resources to power a wide range of research projects across the globe.”
Read more at the link above.
[4] ProD SAFe: Programmatic Distillation for Safe and Assured Foundation Models & Robots
Foundation models (FMs) offer great promise for use in robotics in numerous scenarios. However, verification and safety for foundation and large language models is an embryonic field with limited results. We propose performing what we term programmatic distillation for foundation model-powered robots. The idea is to replace distilling into a smaller model with distilling into code—thus enabling safety and assurance guarantees that are otherwise unavailable, all while maintaining the advantages of foundation models. The objective of this proposal is to develop safe and assured robots via programmatic distillation from foundation models.
[5] The paper that appeared in the IFIP Traffic Measurement and Analysis Conference (see below), received the Community Contribution Award (https://tma.ifip.org/2025/awards/).
Calvin Kranig, Eric Pauley, Wei-Shiang Wung, Paul Barford, Mark Crovella, Joel Sommers. “Toward a Representative DNS Data Corpus: A Longitudinal Comparison of Collection Methods”
Abstract—Domain Name System (DNS) records are frequently used to investigate a wide variety of Internet phenomena including topology, malicious activity, resource allocations and
user behavior. The scope and utility of the results of these studies depends intrinsically on the representativeness of the DNS data. In this paper, we compare and contrast five different
DNS datasets from four different providers collected over a period of 3 months that in total comprise over 10.1B total Fully Qualified Domain Names (FQDNs) of which 3.7B are unique.
We process and organize that data into a consistent format that enables it to be efficiently analyzed in Google BigQuery. We begin by reporting the details of the measurement methods and the datasets used in our analysis. We then analyze the relative coverage of each dataset by structural, administrative, and client-behavioral features. We find that while there are significant overlaps in the records provided by each dataset, each also provides unique records not found in the other datasets that are important in different use cases. Our results highlight the opportunities in using these datasets in research and operations, and how combinations of datasets can provide broader and more diverse perspectives
More conference details: https://tma.ifip.org/2025/accepted-papers/
[6] “Defying Moore: Envisioning the Economics of a Semiconductor Revolution through 12nm Specialization” is in the July print (now line) Communications of the ACM.
From the author: “It proposes a revolutionary shift in semiconductor design. The research demonstrates that specialized 12nm chips, through architectural innovation like the tiled decoupled control and compute (TDCC) architecture, can achieve performance competitive with, or even surpass, state-of-the-art 7nm and 5nm technology for deep learning. This approach offers significant gains in cost and sustainability, challenging the conventional reliance on advanced technology nodes and opening new pathways for economic productivity in the semiconductor industry.”
Paper: https://dl.acm.org/doi/10.1145/3711920
Abstract: “The semiconductor industry is experiencing a significant transformation, raising questions about the advantages traditionally associated with Moore’s Law and Dennard scaling. This shift highlights four key trends that intersect with technology, economics, and society. The first trend is the perceived end of Moore’s Law. Analysis indicates that the benefits of advances in semiconductor technology—specifically in terms of cost, energy efficiency, and density—are diminishing (see Table 1). This suggests a departure from the era where technological advances consistently delivered substantial economic and performance improvements. The second trend is the reevaluation of government subsidies. In response to these technological challenges, governments worldwide have allocated significant subsidies to support the development of advanced chip-fabrication facilities. The economic justification for these investments, however, is being increasingly scrutinized, especially when considering the complex dynamics of workforce and ecosystem support that can affect the operational success of these facilities, as well as their time to becoming operational.2,36 The third trend is access to semiconductor technology. Recent policy measures have aimed to regulate access to certain semiconductor technologies based on geopolitical considerations.10,23,25 In short, China is being prevented from accessing 7nm and lower technology.a These measures are designed to address concerns over AI technological supremacy, but also must be understood within the context of the broader industry trend toward diminishing returns from further miniaturization. The fourth and final trend is the role of architectural efficiency. Economic and engineering considerations have prompted a renewed focus on architectural innovation. This trend illustrates how, despite the high costs of the latest technology nodes, thoughtful design at more accessible nodes, such as 12nm, can achieve competitive or even superior performance compared with designs at more advanced nodes, such as 5nm.1,33 This approach is also potentially more sustainable, offering reduced carbon footprints without compromising performance.8 Together, these trends underscore a complex landscape where the relationship between technological scaling, economic viability, and environmental sustainability is being redefined, necessitating a nuanced understanding of the semiconductor industry’s future direction.”
[7] The paper “Faster Algorithms for Agnostically Learning Disjunctions and their Implications”
https://arxiv.org/abs/2504.15244
has been selected for the Mark Fulk Award for best student paper at COLT 2025.
From the author: “The paper gives the first improvement, in terms of sample and computational complexity, in (exactly) 20 years for the problem of “agnostically learning disjunctions”. This means that we are trying to learn an OR function of Boolean variables, but the labeled data that we observe are arbitrary—not necessarily consistent with any such function. The goal is to match the accuracy of the “best-fit” function in the class.”
[8] “Probabilistic Point Clouds from Single-Photon LiDARs for Robust 3D Inference”
Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin, Andreas Velten, Yin Li, Mohit Gupta
To Appear in Proc. IEEE International Conference on Computer Vision (ICCV)
https://iccv.thecvf.com/virtual/2025/poster/1077
Abstract: LiDAR-based 3D sensors provide point clouds, a canonical 3D representation used in various 3D scene understanding tasks. Modern LiDARs face key challenges in various real-world scenarios such as long-distance or low-albedo objects, producing sparse or erroneous point clouds. These errors, which are rooted in the noisy raw LiDAR measurements, get propagated to downstream perception models, resulting in potentially severe loss of accuracy. This is because conventional 3D processing pipelines used to construct point clouds from raw LiDAR measurements do not retain the uncertainty information available in the raw sensor data. We propose a novel 3D scene representation called Probabilistic Point Clouds (PPC) where each point is augmented with a probability attribute that encapsulates the measurement uncertainty (confidence) in raw data. We further introduce inference approaches that leverage PPC for robust 3D object detection; these methods are versatile and can be used as computationally lightweight drop-in modules in 3D inference pipelines. We demonstrate, via both simulations and real captures, that PPC-based 3D inference methods outperform several baselines with LiDAR as well as camera-LiDAR fusion models, across challenging indoor and outdoor scenarios involving small, distant, and low-albedo objects, as well as strong ambient light.
“Quanta Vision: From Photons to Perception”
Varun Sundar, Tianyi Zhang, Sacha Jungerman, Mohit Gupta
To Appear in Proc. IEEE International Conference on Computer Vision (ICCV)
https://iccv.thecvf.com/virtual/2025/poster/2542
Abstract: Quanta image sensors record individual photons, enabling capabilities like imaging in near-complete darkness and ultra-high-speed videography. Yet, most research on quanta sensors is limited to recovering image intensities. Can we go beyond just imaging, and develop algorithms that can extract high-level scene information from quanta sensors? This could unlock new possibilities in vision systems, offering reliable operation in extreme conditions. The challenge: raw photon streams captured by quanta sensors have fundamentally different characteristics than conventional images, making them incompatible with vision models. One approach is to first transform raw photon streams to conventional-like images, but this is prohibitively expensive in terms of compute, memory, and latency, making it impractical for most vision and robotics systems. We propose quanta neural networks (QNNs) that directly produce downstream task objectives from raw photon streams. Our core proposal is a trainable QNN layer that can seamlessly integrate with existing image- and video-based neural networks, producing quanta counterparts. By avoiding image reconstruction and allocating computational resources on a scene-adaptive basis, QNNs achieve — orders of magnitude improvements across all efficiency metrics (compute, latency, readout bandwidth) as compared to reconstruction-based quanta vision, while maintaining high task accuracy across a wide gamut of challenging scenarios including low light and rapid motion.
“Predicting Important Photons for Energy-Efficient Single-Photon Videography”
Shantanu Gupta, Varun Sundar, Lucas J. Koerner, Claudio Bruschini, Edoardo Charbon, Mohit Gupta
To Appear in Proc. IEEE International Conference on Computational Photography (ICCP)
(A select few ICCP papers each year are recommended to be published in a special issue of the IEEE PAMI journal; this paper was selected to be a part of the special issue.)
https://wisionlab.com/project/predicting-important-photons/
Abstract: Single-photon avalanche diodes (SPAD) detect individual photons with fine temporal resolutions, enabling capabilities like imaging in near-total darkness, extreme dynamic range, and rapid motion. Due to these capabilities, and coupled with the recent emergence of high-resolution ( MP) arrays, SPADs have the potential to become workhorses for computer vision systems of the future that need to operate in a wide range of challenging conditions. However, SPADs’ sensitivity comes at a high energy cost due to the underlying avalanche process, which consumes substantial energy per detected photon, limiting the scalability and practicality of high-resolution SPAD arrays. To address this, we propose approaches to predict and sample only the most salient photons for a given vision task. To this end, we design computationally lightweight photon-sampling strategies that allocate energy resources for detecting photons only in areas with significant motion and spatial variation, while continually adapting to changing signals. We demonstrate the effectiveness of the proposed methods in recovering comparable video to a fully-sampled SPAD capture using only a small fraction of the photons (up to fewer), across diverse real-world scenes with motion, high dynamic range, and varying light conditions.
[9] “Recovering Parametric Scenes from Very Few Time-of-Flight Pixels”
Carter Sifferman, Yiquan Li, Yiming Li, Fangzhou Mu, Michael Gleicher, Mohit Gupta, Yin Li
To Appear in Proc. IEEE International Conference on Computer Vision (ICCV)
https://iccv.thecvf.com/virtual/2025/poster/410
Abstract: We aim to recover the geometry of 3D parametric scenes using very few depth measurements from low-cost, commercially available time-of-flight sensors. These sensors offer very low spatial resolution (i.e., a single pixel), but image a wide field-of-view per pixel and capture detailed time-of-flight data in the form of time-resolved photon counts. This time-of-flight data encodes rich scene information and thus enables recovery of simple scenes from sparse measurements. We investigate the feasibility of using a distributed set of few measurements (e.g., as few as 15 pixels) to recover the geometry of simple parametric scenes with a strong prior, such as estimating the 6D pose of a known object. To achieve this, we design a method that utilizes both feed-forward prediction to infer scene parameters, and differentiable rendering within an analysis-by-synthesis framework to refine the scene parameter estimate. We develop hardware prototypes and demonstrate that our method effectively recovers object pose given an untextured 3D model in both simulations and controlled real-world captures, and show promising initial results for other parametric scenes. We additionally conduct experiments to explore the limits and capabilities of our imaging solution.
[10] “Post-Quantum Homomorphic Commitments, Zero-Knowledge Proofs and Applications”
Stellar Award, 2025
https://research.stellar.org/research-grants/grantees
Abstract. Quantum-powered adversaries are a significant and rapidly emerging threat to the security of digital systems and authentication platforms. Despite significant advances in post-quantum cryptography over the last decade, we have very limited designs for quantum-safe authentication platforms whose performance is comparable to their classical (quantum-insecure) counterparts. One very popular authentication model is the blind signature model (where a receiver can obtain an authentication token in a private manner). In this project, we will design and implement practical blind digital signatures that provide post- quantum security and provide better performance than available state-of-the-art quantum-safe solutions. At a high level, our core technical approach will be to carefully compose post-quantum homomorphic commitments and zero-knowledge proofs in a white-box manner to design more efficient blind signatures.
[11] “Towards Foundational Models for mmWave Radar”
Tianshu Huang, Akarsh Prabhakara, Chuhan Chen, Jay Karhade, Deva Ramanan, Matthew O Toole, Anthony Rowe
ICCV 2025
https://iccv.thecvf.com/virtual/2025/poster/1270
Abstract: mmWave radars are compact, inexpensive, and durable sensors that are robust to occlusions and work regardless of environmental conditions, such as weather and darkness. However, this comes at the cost of poor angular resolution, especially for inexpensive single-chip radars, which are typically used in automotive and indoor sensing applications. Although many have proposed learning-based methods to mitigate this weakness, no standardized foundational models or large datasets for the mmWave radar have emerged, and practitioners have largely trained task-specific models from scratch using relatively small datasets.
In this paper, we collect (to our knowledge) the largest available raw radar dataset with 1M samples (29 hours) and train a foundational model for 4D single-chip radar, which can predict 3D occupancy and semantic segmentation with quality that is typically only possible with much higher resolution sensors. We demonstrate that our Generalizable Radar Transformer (GRT) generalizes across diverse settings, can be fine-tuned for different tasks, and shows logarithmic data scaling of 20% per 10x data. We also run extensive ablations on common design decisions, and find that using raw radar data significantly outperforms widely-used lossy representations, equivalent to a 10x increase in training data. Finally, we roughly estimate that ~100M samples (3000 hours) of data are required to fully exploit the potential of GRT.
[12] Rethinking Confidence Scores and Thresholds in Pseudolabeling-based SSL
Harit Vishwakarma, Yi Chen, Srinath Namburi, Sui Jiet Tay, Ramya Vinayak, Frederic Sala
https://pages.cs.wisc.edu/~fredsala/ssl_icml25.pdf
Modern semi-supervised learning (SSL) methods rely on pseudolabeling and consistency regularization. Pseudolabeling is typically performed by comparing the model’s confidence scores and a predefined threshold. While several heuristics have been proposed to improve threshold selection, the underlying issues of overconfidence and miscalibration in confidence scores remain largely unaddressed, leading to inaccurate pseudolabels, degraded test accuracy, and prolonged training. We take a first-principles approach to learn confidence scores and thresholds with an explicit knob for error. This flexible framework addresses the fundamental question of optimal scores and threshold selection in pseudolabeling. Moreover, it gives practitioners a principled way to control the quality and quantity of pseudolabels. Such control is vital in SSL, where balancing pseudolabel quality and quantity directly affects model performance and training efficiency. Our experiments show that, by integrating this framework with modern SSL methods, we achieve significant improvements in accuracy and training efficiency. In addition, we provide novel insights on the tradeoffs between the choices of the error parameter and the end model’s performance.
Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights
Tzu-Heng Huang, Manjot Bilkhu, Frederic Sala
https://arxiv.org/abs/2501.06708
Multimodal models are trained on large-scale web-crawled datasets, which often contain noise, bias, and irrelevant information. This motivates the use of data selection techniques, which can be divided into model-free variants, relying on heuristic rules and downstream datasets, and model-based approaches, such as those using influence functions. The former can be expensive to design and risks introducing unwanted dataset dependencies, while the latter are often computationally prohibitive. In this work, we propose an efficient, model-based approach using the Mimic Score, a new data-quality metric that leverages the weights of a reference model to assess the usefulness of individual samples for training a new model. Our method relies on measuring alignments between training gradients and a target direction induced by this reference model. Building on the derived mimic scores, we develop Grad-Mimic: a framework that prioritizes samples to learn, estimates overall sample utility, and creates effective filters. Empirically, using mimic scores to guide training improves data efficiency, accelerates convergence, yields consistent performance gains across six image datasets, and enhances CLIP models with 20.7% fewer training steps. Moreover, mimic score-based filters complement existing filtering methods, e.g., training improved CLIP models with 4.7 million fewer samples while offering accurate estimation of dataset quality.
[13] Newly promoted to Teaching Professor:
Marc Renault, Full Teaching Professor
Andrew Kuemmel, Associate Teaching Professor
Tyler Caraza-Harter, Associate Teaching Professor
Hobbes Legault, Associate Teaching Professor
[14] Keynote at the National Science Data Fabric annual AHM at UCSD by Prof Miron Livny https://nationalsciencedatafabric.org/nsdf-ahm-2025-01
The PATh project offers a variety of distributed data handling services. Researchers who place their workloads at PATh Access Points leverage these services to bring input data to their applications and to ship results back to archival sites. They also employ them to copy the applications, in most cases in the form of container images, to execution sites provided by the Open Science Pool (OSPool). Owners of Object Stores who federate datasets they host via the Open Science Data Federation (OSDF) use these services to make locally stored objects available to remote consumers for frequent reuse.
In the past 12 months PATh services have performed more than 3B Object transfers across more than 100 institutions. The distributed nature of the infrastructure requires objects to be copied across institution boundaries and to be buffered in memory or disk at multiple locations. These operations clearly introduce concerns about data integrity and access control. Do the researchers and Object providers who use these services believe that the PATh services are trustworthy?
The talk will review our ongoing effort to develop data handling services that we believe to be trustworthy. A key element of our current development work is transitioning our software tools and infrastructure into capability-based authorization. We hope that facilitating trust relationships through capabilities will help researchers and data providers to entrust their objects with the PATh services. Namely, to believe that the OSDF and the OSPool data handling services are trustworthy.
[15] The IO500 aims to rank high-performance distributed file systems. The results were
released last week. We are ranked 26th in the 10-node category and 33rd overall worldwide.
https://io500.org/list/isc25/ten
https://io500.org/list/isc25/io500
From the author: We built the GluonFS (our dfs) starting from scratch and have optimized the I/O processing a lot, which turns out quite challenging under 400GbE networks and 8+ NVMes. We also compared a bit with other big players. As we are resource-limited, we co-located clients
(that issue I/O requests) with our DFS data and metadata servers together, unlike others.
We can drive the performance much more with just a bit more servers.
[16] https://www.research.gov/grfp/
Prof Sharon Li’s student: Shawn Im
Prof Shivaram Venkataraman’s student: Seth Ockerman
https://ockermansethgvsu.github.io
“An international team of astronomers has trained a neural network with millions of synthetic simulations and artificial intelligence (AI) to tease out new cosmic curiosities about black holes, revealing the one at the center of our Milky Way is spinning at nearly top speed.
These large ensembles of simulations were generated by throughput computing capabilities provided by the Center for High Throughput Computing (CHTC), a joint entity of the Morgridge Institute for Research and the University of Wisconsin-Madison. The astronomers published their results and methodology today in three papers in the journal Astronomy & Astrophysics.”
Read more at the link above.
[18] From the gem5 Users Workshop
https://www.gem5.org/events/isca-2025
Title: Implementing Support for Extensible Power Modeling in gem5
Authors: Alex Smith & Matthew D. Sinclair
Abstract: Power consumption has increasingly become a first-class design constraint to satisfy requirements for scientific workloads and other widely used workloads, such as machine learning. To meet performance and power requirements, system designers often use architectural simulators, such as gem5, to model component and system-level behavior. However, performance and power modeling tools are often isolated and do not make it accessible to integrate with one another for rapid performance and power system co-design. Although studies have previously explored power modeling with gem5 and validation on real hardware, there are several flaws with this approach. First, power models are sometimes not open source, making it difficult to apply them to different simulated systems. The current interface for implementing power models in gem5 also relies on hard-coded strings provided by the user to model dynamic and static power. This makes defining power models for components cumbersome and restrictive, as gem5’s MathExpr string formula parser has support for limited mathematical operations. Third, previous works only implement one form of power model for one component. This unnecessarily limits users from combining other power models, which may model certain system components with higher accuracy. Instead, we posit that decoupling how power models are integrated with simulators from the design of power models themselves will enable better power modeling in simulators. Accordingly, we extend our prior work on designing and implementing an extensible, generalizable power modeling interface by integrating support for McPAT into it and validating it emits correct power values.
Title: Narrowing the GAP: Enhancing gem5’s GPU Memory Bandwidth Accuracy
Authors: Yu Xia, Vishnu Ramadas, Matthew Poremba, and Matthew D. Sinclair
Abstract: Computer systems research heavily relies on simulation tools like gem5 to effectively prototype and validate new ideas. However, publicly available simulators struggle to accurately model systems as architectures evolve rapidly. This is a major issue because incorrect simulator models may lead researchers to draw misleading or even incorrect conclusions about their research prototypes from these simulators. Although this challenge pertains to many open source simulators, we focus on the widely used, open source gem5 simulator. In GAP we showed that gem5’s GPGPU models have significant correlation issues versus real hardware. GAP also improved the fidelity of gem5’s AMDGPU model, particularly for cache access latencies and bandwidths. However, one critical issue remains: our microbenchmarks reveal 88% error in memory bandwidth between gem5’s current model and corresponding real AMD GPUs. To narrow this gap, we examined recent patents and gem5’s memory system bottlenecks, then made several improvements including: utilizing a redesigned HBM memory controller, enhancing TLB request coalescing, adding support for multiple page sizes, adding a page walk cache, and improving network bandwidth modeling. Collectively, these optimizations significantly improve gem5’s GPU memory bandwidth by 3.8x: from 153 GB/s to 583 GB/s. Moreover, our address translation enhancements can be ported to other ISAs where similar support is also needed, improving gem5’s MMU support.
Title: Toward Full-System Heterogeneous Simulation: Merging gem5-SALAM with Mainline gem5
Authors: Akanksha Chaudhari and Matthew D. Sinclair
Abstract: The slowdown of process technology-driven improvements has accelerated the shift toward heterogeneous computing systems, where conventional general-purpose cores are increasingly combined with GPUs and specialized accelerators to continue scaling performance and energy efficiency gains. However, as these systems grow more diverse, architectural design and system-level optimization become significantly complex. Fully leveraging the benefits of such architectures demands rigorous early-stage exploration using validated, cycle-level, full-system simulation frameworks that capture both component behavior and cross-layer interactions. The gem5 simulator has long provided this functionality for both CPUs and GPUs. However, accelerator modeling within gem5 remains fragmented. Frameworks such as gem5-SALAM provide high-fidelity accelerator simulation atop gem5 v21.1, but operate outside the mainline development and remain limited to modeling accelerators in isolation. Consequently, they cannot leverage improvements introduced in recent versions of gem5, nor can mainline users test and validate their changes do not break accelerator support. To bridge this gap, we integrated and merged SALAM’s accelerator modeling infrastructure into the latest gem5 mainline, enabling tightly coupled simulation of accelerators alongside gem5’s existing CPU and GPU components within a unified, full-system framework. This integration is critical for advancing early-stage architectural exploration, enabling researchers to systematically evaluate design trade-offs, study cross-layer interactions, and co-optimize compute, memory, and communication components for emerging heterogeneous computing platforms.
[19] Title: Simulating and Modeling Hardware for Machine Learning Workloads at Scale
Prof Matt Sinclair
Abstract: As both the size of Machine Learning (ML) and Artificial Intelligence (AI) hardware and the data they operate on scale-up, there is a growing challenge in performing early-system exploration based on sound simulation methodology. Using existing tools and infrastructure, a detailed simulation of a contemporary deep neural network can take years. In addition, the difficult task of modeling energy at scale, without working RTL for a newly proposed accelerator, or accelerator component (i.e., Tensor Cores), remains an important research challenge. In particular modeling energy in a way that is flexible and can be extended to both different types of programmable accelerators (like GPUs) and to more bespoke discrete accelerators (like TPUs) is needed. The broad goals of this proposal are to (1) reduce detailed simulation time, particularly on ML training and inference workloads, by characterizing execution in silicon, validated with detailed simulation, to find representative program slices, (2) develop a flexible energy model for accelerators, and (3) develop infrastructure to model the multi-GPU and multi-accelerator system of the future. To tackle these problems, we propose to extend existing simulation infrastructure, such as gem5, to create a scalable, parallel simulation framework that models diverse system components with varying levels of detail. In this talk, I discuss our on-going efforts to identify the most important parts of the applications to simulate at high fidelity while reducing simulation time.
[20] The PhD List from last year:
– Saurabh Agarwal, advised by Shivaram Venkataraman and Dimitris Papailiopoulos, produced the dissertation entitled “Reducing synchronization overheads in Machine Learning systems”.
– Edward Barton, advised by Yu Hen Hu, produced the dissertation entitled “Practical Single Shot High Dynamic Range Imaging”.
– Bengisu Cagiltay, advised by Bilge Mutlu, produced the dissertation entitled “Designing Social Robots for Everyday Family Life”.
– Mu Cai, advised by Yong Jae Lee, produced the dissertation entitled “Toward Versatile and Efficient Multimodal Models”.
– Xufeng Cai, advised by Jelena Diakonikolas, produced the dissertation entitled “Towards Efficient Optimization for Decomposable Problems in Large-Scale Machine Learning”.
– Joseph Catudal, advised by Paul Barford, produced the dissertation entitled “Practical Environmental Sensing On Optical Communication Infrastructure”.
– Xuefeng Du, advised by Sharon Li, produced the dissertation entitled “Foundations of Unknown-aware Machine Learning”.
– Ying Fan, advised by Kangwook Lee, produced the dissertation entitled “Sequential Learning for Foundation Models: Reinforcement Learning for Diffusion Models and Looped Transformers”.
– Kevin Gaffney, advised by Jignesh Patel, produced the dissertation entitled “Bridging the Processor-Memory Performance Gap in Database Systems”.
– Bhavya Goyal, advised by Mohit Gupta, produced the dissertation entitled “Robust Scene Inference under Adverse Conditions”.
– Ashish Hooda, advised by Somesh Jha and Kassem Fawaz, produced the dissertation entitled “Security of Real World Machine Learning Systems”.
– Mazharul Islam, advised by Rahul Chatterjee, produced the dissertation entitled “Safeguarding online authentication systems from attacks”.
– Sacha Jungerman, advised by Mohit Gupta, produced the dissertation entitled “Spatial Computing from Photons”.
– Rishabh Khandelwal, advised by Kassem Fawaz, produced the dissertation entitled “Assessing and Enhancing Users’ Online Data Privacy”.
– Ziqian Lin, advised by Kangwook Lee, produced the dissertation entitled “Understanding In-Context Learning via Synthetic and Controllable Datasets”.
– Eric Lin, advised by Jelena Diakonikolas, produced the dissertation entitled “Faster Coordinate Algorithms for Structured Machine Learning Problems”.
– Jing Liu, advised by Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau, produced the dissertation entitled “Scale, Performance, and Fault-Tolerance in a Filesystem Semi-Microkernel”.
– Anna Meyer, advised by Aws Albarghouthi and Loris D’Antoni, produced the dissertation entitled “Robustness to Multiplicity in the Machine Learning Pipeline”.
– Elena Milkai, advised by Xiangyao Yu and Jignesh Patel, produced the dissertation entitled “Real-Time Analytics On Cloud Database Systems”.
– Nicollas Mocelin Sdroievski, advised by Dieter Van Melkebeek, produced the dissertation entitled “Derandomization vs. Lower Bounds for Arthur-Merlin Protocols”.
– Shyam Murthy, advised by Gurindar Sohi, produced the dissertation entitled “Optimizing Memory Hierarchies for Commercial Applications”.
– Eric Pauley, advised by Patrick McDaniel, produced the dissertation entitled “Methods and Applications for Cloud-based Internet Measurement”.
– Martin Prammer, advised by Jignesh Patel, produced the dissertation entitled “Towards Architecture-Aware Data Analytics”.
– Anthony Lawrence George Rebello, advised by Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau, produced the dissertation entitled “Correctness and Performance with Declarative System Calls”.
– Zhenmei Shi, advised by Yingyu Liang, produced the dissertation entitled “Understanding Training and Adaptation in Feature Learning: From Two-layer Networks to Foundation Models”.
– Laura Stegner, advised by Bilge Mutlu, produced the dissertation entitled “Integrating Care Robots into Assisted Living Facilities”.
– Yuxin Sun, advised by Ilias Diakonikolas, produced the dissertation entitled “Algorithms and Information-Computation Trade-offs in High-Dimensional Statistical Estimation”.
– Harit Vishwakarma, advised by Frederic Sala and Ramya Vinayak, produced the dissertation entitled “Towards Reliable Machine Learning: From Data to Deployment”.
– Roger Waleffe, advised by Steve Wright, produced the dissertation entitled “Algorithms and Systems for Scalable Machine Learning over Graphs”.
– Qisi Wang, advised by Eftychios Sifkis, produced the dissertation entitled “Extending Projective Dynamics for Practical and Efficient Simulation”.
– Nathan White, advised by Bilge Mutlu, produced the dissertation entitled “Improving Human-Robot Collaboration through Abstraction and Scaffolding in Planning and Programming Interfaces”.
– Yunjia Zhang, advised by Jignesh Patel and Theodoros Rekatsinas, produced the dissertation entitled “Machine Learning for Data Analytics: Profiling, Querying, and Beyond”.
– Hangdong Zhao, advised by Paris Koutris, produced the dissertation entitled “Next-Generation Join Algorithms”.
This is an accordion element with a series of buttons that open and close related content panels.
May 2025
[1] https://awards.acm.org/hopper
About ACM Grace Murray Hopper Award
Awarded to the outstanding young computer professional of the year, selected on the basis of a single recent major technical or service contribution. This award is accompanied by a prize of $35,000. The candidate must have been 35 years of age or less at the time the qualifying contribution was made, though allowance may be made for interrupted or second careers. Financial support of the Grace Murray Hopper Award is provided by Microsoft.
Article: https://www.cs.wisc.edu/2025/05/21/ilias-diakonikolas-wins-2024-acm-grace-hopper-award/
[2] H.I. Romnes Fellowship
This program, funded by WARF in recognition of the leadership of the late WARF Trustee President H. I. Romnes, is designed to bridge the gap between the Research Committee’s initial research support for new faculty and the Mid-Career Award for Faculty Research. This award is intended to recognize and support faculty up to SIX years past their first promotion to a tenured position. This award cannot be awarded more than one time to any faculty member.
https://research.wisc.edu/professorships-and-faculty-fellowships/h-i-romnes-faculty-fellowships/
[3, 4] https://cdis.wisc.edu/cdis-endowed-chairs-professorships/
[5] https://www.cs.wisc.edu/2025/04/22/the-spirit-of-madhacks-podcast/
What is Madhacks? How did it start? What goes into creating a hackathon? Meet the organizers of MadHacks 2024 – Grace Steinmetz, Chris Gottwaldt, Kevin Wang, Kayley Seow, Nico Salm, and Michael Noguera with Computer Sciences professor Manolis Vlatakis – and hear about their process and the impact of MadHacks. “Participants come to MadHacks from a variety of backgrounds, bringing their own knowledge of problems affecting their communities,” say the creators. “We are enablers — we provide a positive environment and concentrate resources—but hackers themselves are the agents using what they know and what they learn to develop solutions to their own real-world problems.”
[6] “Solving Zero-Sum Convex Markov Games”
(with Foivos Kalogiannis, Ian Gemp, and Georgios Piliouras)
https://icml.cc/virtual/2025/poster/44636
We provide the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) using independent policy gradient methods. Convex Markov games, recently described by (Gemp et. al 2024) , extend Markov decision processes to multi-agent settings with convex preferences over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity even under direct policy parametrization, the absence of Bellman equations, and the complexities of infinite horizons.Our results follow a two-step approach. First, leveraging the properties of hidden-convex–hidden-concave functions, we show that a carefully crafted regularization transforms the underlying min-max optimization problem into a nonconvex–proximal Polyak-Lojasiewicz (NC-pPL) structure.Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria.Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.
“A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games”
(with Francisca Vasconcelos, Panayotis Mertikopoulos, Georgios Piliouras, and Michael I. Jordan)
Recent advances in quantum information have revived interest in quantum zero-sum games, where computing Nash equilibria plays a central role. In 2008, Jain and Watrous introduced the first classical algorithm (MMWU) achieving a\mathcal{O}(d/\epsilon^2) convergence rate to \epsilon-Nash equilibria over the 4d-dimensional spectraplex.In our work:• We introduce a hierarchy of quantum optimization algorithms that generalize MMWU using an extra-gradient mechanism.• Notably, our Optimistic Matrix Multiplicative Weights Update (OMMWU) achieves \mathcal{O}(d/\epsilon) iteration complexity—providing a quadratic speedup over previous methods.
[7] Three papers accepted by Prof Barford and colleagues: Two in the IFIP Networking Conference [a, b] and one in the Network Traffic Measurement and Analysis Conference [c].
[a] Mia Weaver, Paul Barford, Fabian Bustamante, Esteban Carisimo, Weili Wu, Lynne Stokes. “Unsteady Underwater – On the Constancy of Submarine Path Properties”
Abstract—The world-wide submarine cable network (SCN) is one of the most costly, complex, and critical components in the Internet. In this paper, we describe a measurement study of packet dynamics on SCN paths. To conduct this study we built an active probe-based measurement system that harnesses looking glass nodes in strategic locations in service provider networks to gather data from 134 SCN paths. During the 18 month period of our study, we also tracked outage reports from online sources on a weekly basis. Our main finding is that key dynamic characteristics of the SCN including latency, loss, path stability, and outages vary widely. To provide perspective, we assess the operational constancy (i.e., remaining within bounds considered operationally equivalent) of SCN path properties and find a wide range of behavior. To further clarify these characteristics, we report the results of a cluster analysis and find that paths separate into distinct groups where behavior is either relatively stable or highly variable. Our findings have implications for operational practices, applications that utilize these paths, and for future measurement and monitoring of the SCN.
[b] Wei-Shiang Wung, Calvin Kranig, Eric Pauley, Paul Barford, Mark Crovella, Joel Sommers. “Squatspotting: Towards the Systematic Measurement of Typosquatting Techniques”
Abstract— Typosquatting—the practice of registering a domain name similar to another, usually well-known, domain name—is typically intended to drive traffic to a website for malicious or profit-driven purposes. In this paper we assess the current state of typosquatting, both broadly (across a wide variety of techniques) and deeply (using an extensive and novel dataset). Our breadth derives from the application of eight different candidate-generation techniques to a selection of the most popular domain names. Our depth derives from probing the resulting name set via a unique corpus comprising over 3.3B Domain Name System (DNS) records. We find that over 2.3M potential typosquatting names have been registered that resolve to an IP address. We then assess those names using a framework focused on identifying the intent of the domain from the perspectives of DNS and webpage clustering. Using the DNS information, HTTP responses, and Google SafeBrowsing, we classify the candidate typosquatting names as resolved to private IP, malicious, defensive, parked, legitimate, or unknown intents. Our findings provide the largest-scale and most-comprehensive perspective to date on typosquatting, exposing potential risks to users. Further, our methodology provides a blueprint for tracking and classifying typosquatting on an ongoing basis.
[c] Calvin Kranig, Eric Pauley, Wei-Shiang Wung, Paul Barford, Mark Crovella, Joel Sommers. “Toward a Representative DNS Data Corpus: A Longitudinal Comparison of Collection Methods”
Abstract—Domain Name System (DNS) records are frequently used to investigate a wide variety of Internet phenomena including topology, malicious activity, resource allocations and user behavior. The scope and utility of the results of these studies depends intrinsically on the representativeness of the DNS data. In this paper, we compare and contrast five different
DNS datasets from four different providers collected over a period of 3 months that in total comprise over 10.1B total Fully Qualified Domain Names (FQDNs) of which 3.7B are unique.
We process and organize that data into a consistent format that enables it to be efficiently analyzed in Google BigQuery. We begin by reporting the details of the measurement methods and the datasets used in our analysis. We then analyze the relative coverage of each dataset by structural, administrative, and client-behavioral features. We find that while there are significant overlaps in the records provided by each dataset, each also provides unique records not found in the other datasets that are important in different use cases. Our results highlight the opportunities in using these datasets in research and operations, and how combinations of datasets can provide broader and more diverse perspectives
[8] https://academicaffairs.rutgers.edu/2024-2025-faculty-year-end-excellence-award-recipients
The Board of Trustees Research Fellowship for Scholarly Excellence
The fellowship honors faculty members who have made truly outstanding contributions to research during their early years at Rutgers.
Prof Sudarsun Kannan’s web page: here.
[9] Prof Livny and Ramakrishnan’s BIRCH Algorithm paper discovered in the wild:
A seminar visit by a computational chemist, Ramon Miranda Quintana, from the University of Florida, took place last week here at UW-Madison in the Chemistry department. His group is using your BIRCH algorithm to cluster millions of drug compounds.
[10] The paper “Decoding Speculative Decoding” got the SAC Award for Generation at NAACL 2025.
Award details:
https://2025.naacl.org/blog/best-papers/
Paper available here:
https://arxiv.org/abs/2402.01528
Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality. When performing inference, speculative decoding uses a smaller draft model to generate speculative tokens and then uses the target LLM to verify those draft tokens. The speedup provided by speculative decoding heavily depends on the choice of the draft model. In this work, we perform a detailed study comprising over 350 experiments with LLaMA-65B and OPT-66B using speculative decoding and delineate the factors that affect the performance gain provided by speculative decoding. Our experiments indicate that the performance of speculative decoding depends heavily on the latency of the draft model, and the draft model’s capability in language modeling does not correlate strongly with its performance in speculative decoding. Based on these insights we explore a new design space for draft models and design hardware-efficient draft models for speculative decoding. Our newly designed draft model can provide 111% higher throughput than existing draft models and our approach generalizes further to all LLaMA models (1/2/3.1) and supervised fine-tuned models.
[11] Two papers accepted to ICML 2025 by Prof Shivaram Venkataraman and colleagues.
“LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models” with student Tzu-Tao Chang.
https://arxiv.org/abs/2502.02406
Cross-attention is commonly adopted in multimodal large language models (MLLMs) for integrating visual information into the language backbone. However, in applications with large visual inputs, such as video understanding, processing a large number of visual tokens in cross-attention layers leads to high memory demands and often necessitates distributed computation across multiple GPUs. Existing distributed attention mechanisms face significant communication overheads, making cross-attention layers a critical bottleneck for efficient training and inference of MLLMs. To address this, we propose LV-XAttn, a distributed, exact cross-attention mechanism with minimal communication overhead. We observe that in applications involving large visual inputs, the size of the query block is typically much smaller than that of the key-value blocks. Thus, in LV-XAttn we keep the large key-value blocks locally on each GPU and exchange smaller query blocks across GPUs. We also introduce an efficient activation recomputation technique to support longer visual context. We theoretically analyze the communication benefits of LV-XAttn and show that it can achieve speedups for a wide range of models. Our evaluations with Llama 3-V, mPLUG-Owl3 and OpenFlamingo models find that LV-XAttn achieves up to 10.62× end-to-end speedup compared to existing approaches.
“Scaling Inference-Efficient Language Models” with students Song Bian and Minghao Yan. https://arxiv.org/abs/2501.18107
Scaling laws are powerful tools to predict the performance of large language models. However, current scaling laws fall short of accounting for inference costs. In this work, we first show that model architecture affects inference latency, where models of the same size can have up to 3.5x difference in latency. To tackle this challenge, we modify the Chinchilla scaling laws to co-optimize the model parameter count, the number of training tokens, and the model architecture. Due to the reason that models of similar training loss exhibit gaps in downstream evaluation, we also propose a novel method to train inference-efficient models based on the revised scaling laws. We perform extensive empirical studies to fit and evaluate our inference-aware scaling laws. We vary model parameters from 80M to 1B, training tokens from 1.6B to 30B, and model shapes, training a total of 63 models. Guided by our inference-efficient scaling law and model selection method, we release the Morph-1B model, which improves inference latency by 1.8x while maintaining accuracy on downstream tasks compared to open-source models, pushing the Pareto frontier of accuracy-latency tradeoff.
[12] Wenxuan Tan, advised by Prof Shivaram Venkataraman, received the Hilldale Fellowship and the Dewitt Scholarship.
[13] SIGCOMM ’25 paper acceptance:
Understanding and profiling CXL.mem using PathFinder
Xiao Li, Zerui Guo, Yuebin Bai, Mahesh Ketkar, Hugh Willkinson, and Ming Liu
CXL.mem and the resulting memory pool are promising and attracting great attention. Unlike local memory, CXL DIMMs stay at the I/O subsystem (i.e., FlexBus), whose inferior performance can easily impact the processor pipeline and memory subsystem, yielding performance interference, hardware contention, obscure behaviors, and underutilized communication and computing resources. However, our community lacks a tool to understand and profile the CXL.mem protocol execution end-to-end between CPU and remote DIMM.
This paper fills the gap by designing and implementing PathFinder, a systematic, informative, and lightweight CXL.mem profiler. PathFinder leverages the capabilities of existing hardware performance monitors (PMUs) and dissects the CXL.mem protocol at adequate granularities. Our key idea is to view the server processor and its chipset as a multi-stage Clos network, equip each architectural module with a PMU-based telemetry engine, track different CXL.mem paths, and apply conventional traffic analysis techniques. PathFinder performs snapshot-based path-driven profiling and introduces four techniques, i.e., path construction, stall cycle breakdown, interference analyzer, and cross-snapshot analysis. We build PathFinder atop Linux Perf and apply it to six case studies. PathFinder will be open-sourced.
[14] Invited talk at AMD Research & Advanced Development
“Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems”
Matt Sinclair
Abstract: In recent years, system designers have increasingly been turning to heterogeneous systems to improve performance and energy efficiency. Specialized accelerators are frequently used to improve the efficiency of computations that run inefficiently on conventional, general-purpose processors. As a result, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly using large numbers of accelerators (including GPUs) while providing better efficiency than CPU-based solutions. In particular, GPUs are widely used in these systems due to their combination of programmability and efficiency. Traditionally, GPUs are throughput-oriented, focused on data parallelism, and assume synchronization happens at a coarse granularity. However, programmers have begun using these systems for a wider variety of applications which exhibit different characteristics, including latency-sensitivity, mixes of both task and data parallelism, and fine-grained synchronization. Thus, future heterogeneous systems must evolve and make deadline-aware scheduling, more intelligent data movement, efficient fine-grained synchronization, and effective power management first-order design constraints. In the first part of this talk, I will discuss our efforts to apply hardware-software co-design to help future heterogeneous systems overcome these challenges and improve performance, energy efficiency, and scalability. Then, in the second part I will discuss how the on-going transition to chiplet-based heterogeneous systems exacerbates these challenges and how we address these challenges in chiplet-based heterogeneous systems by rethinking the control plane.
[15] https://captimes.com/news/education/uw-madison-computer-science-prepares-to-relocate-meet-ai-moment/
UW-Madison computer science prepares to relocate, meet ‘AI moment’
[16] Prof Jin-Yi Cai (or students or both) had the following paper acceptances:
A joint paper with Jin Soo Ihm
Holant* Dichotomy on Domain Size 3: A Geometric Perspectiveto appear in ICALP 2025 (The 52nd EATCS International Colloquium on Automata, Languages, and Programming)
A singly authored paper by his student Benjamin Young
The Converse of the Real Orthogonal Holant Theorem to appear in ICALP 2025 (The 52nd EATCS International Colloquium on Automata, Languages, and Programming)
A joint paper with Benjamin Young
Quantum Algorithms for Discrete Log Require Precise Rotations to appear in ACM Transactions on Quantum Computing (TQC)
[17] CRYPTO ’25 papers
“A Note on Adaptive Security in Hierarchical Identity-Based Encryption”
Authors: Rishab Goyal (UW-Madison); Venkata Koppula (IIT Delhi); Mahesh Sreekumar Rajasree (CISPA)
“Succinct Arguments for BatchQMA and Friends under 8 rounds”
Authors: Rishab Goyal (University of Wisconsin, Madison); Aditya Jain (University of Wisconsin, Madison); Shashwatha Mitra G B (University of Wisconsin, Madison)
ICALP’ 25 paper:
Boosting SNARKs and Rate-1 Barrier in Arguments of Knowledge
Authors: Jiaqi Cheng and Rishab Goyal (UW-Madison)
[18] Benjamin Young
The Converse of the Real Orthogonal Holant Theorem
Winner: Best Student Paper at ICALP ‘25.
[19] Condor Pool Usage update:
For the first time we crossed the 600 number of users over the past 12 months. Supporting the education mission definitely contributes to these increases.
Also, the first time we crossed the 220 number of users for one month.
We see a significant increase in the number of jobs served over the past two months (30M and 34M respectively). For the first time we crossed the 240M jobs in 12 months. A year ago, we had 187M jobs for the past 12 months …
With 243M jobs and 2.8 B file transfers the OSPool is “humming” at 7.7 HZ and 88.8 HZ respectively. Amazing!
[20] Master’s Projects and Theses advised this past semester (and summer):
David Parra, advised by Mohit Gupta
Practical 3D Time-of-flight imaging with single-photon cameras.
Alexander Peseckis, advised by Michael Gleicher
Occlusion-Conscious Viewpoint Optimization
Hanxiu Zhu, advised by Yuhang Zhao
Design Technology to Make Video Watching Experiences More Inclusive for People with ADHD
Casey Ford, advised by Michael Gleicher
Stereo SPAD sensing
Siddharth Suresh, advised by Robert Nowak
Understanding Computation Humor: From Crowd-Sourced Preferences to Expert-Level AI Alignment
Sriram Ashokkumar, advised by Dan Negrut
Multi Angent Exploration with Vision-Language Models
Tara Bobinac, advised by Junjie Hu
Leveraging AI as a Deliberative Conversation Partner to Foster Intercultural Empathy
Rose Ceccio, advised by Rahul Chatterjee
The technology used in intimate partner surveillance
Aditya Das Sarma, advised by Rahul Chatterjee
Multi-bit flip-flop clustering for power optimization
Prakriti Garg, advised by Sushmita Roy
Graph representation learning framework for principled analysis of orthologous regulatory regions
Shashwatha Mitra Ghante Banuprakash, advised by Rishab Goyal
Succinct Classical Verification For BatchQMA in under 8 Rounds
Suenggwan Jo, advised by Fred Sala
Foundation Model usage for forecasting
Keith Johnson, advised by Aws Albarghouthi
Tools and Techniques for Semantics-Guided Synthesis
Jaden Park, advised by Yong Jae Lee
On the Analysis and Detection of Data Contamination in Vision Language Models
Kassie Povinelli, advised by Yuhang Zhao
Using Voice-Changer Technologies for Gender-Affirming Voice Exploration
Lakshika Rathi, advised by Swamit Tannu
Physics Aware Neural Networks for Estimating Application Fidelity
Muhammad Danial Saleem, advised by Xiangyao Yu
Evaluating and Improving Bioinformatics Tools Using DBMS Techniques
Anshuman Senapati, advised by Josiah Hanna
On state abstractions for reinforcement learning
Shengyuan Yang, advised by Thomas Reps
Precise Pointer Analysis for Java Streams
Binwei Yao, advised by Junjie Hu
No Preference Left Behind: Group Distributional Preference optimization
Karl Vachuska, advised by Hanbaek Lyu
Network Motifs
Hadi Khader, advised by Mohit Gupta
Latent Diffusion for Efficient SPAD Image Reconstruction in Low-Light and High-Motion Settings
Longbang Liu, advised by Sharon Li
Enhancing AlphaFold 3 for Molecular Glue Prediction via Local Chemical Environment Integration
April 2025
[1] Prof Ilias Diakonikolas’s former PhD student, Ankit Pensia, will start as a tenure-track assistant professor at CMU, in the department of Statistics and Data Science. Ankit obtained a PhD from our department in 2023. After graduation, Ankit spent a year in Boston as a Goldstine postdoctoral fellow at IBM. Currently, he is a research fellow at the Simons Institute in Berkeley. He will be starting his faculty position in the coming Fall. Ankit is the first PhD graduate under Diakonikolas’s supervision at UW Madison.
[2] Prof Steve Wright has been invited as one of the 21 plenary speakers in the International Congress of Mathematicians (ICM) in July 2026. ICM is the largest conference in mathematics, held every four years, and organized by the International Mathematical Union. The Fields Medals are awarded there.
https://en.wikipedia.org/wiki/International_Congress_of_Mathematicians
[3] Comparing Vibrotactile and Skin-Stretch Haptic Feedback for Conveying Spatial Information of Virtual Objects to Blind VR Users
Jiangsheng Li, Zining Zhang, Zeyu Yan, Yuhang Zhao, and Huaishu Peng.
IEEE VR 2025.
https://jsli.phd/assets/pdf/IEEE_hand_haptic.pdf
Perceiving spatial information of a virtual object (e.g., direction, distance) is critical yet challenging for blind users seeking an im- mersive virtual reality (VR) experience. To facilitate VR accessibility for blind users, in this paper, we investigate the effectiveness of two types of haptic cues—vibrotactile and skin-stretch cues—in conveying the spatial information of a virtual object when applied to the dorsal side of a blind user’s hand. We conducted a user study with 10 blind users to investigate how they perceive static and mov- ing objects in VR with a custom-made haptic apparatus. Our results reveal that blind users can more accurately understand an object’s location and movement when receiving skin-stretch cues, as op- posed to vibrotactile cues. We discuss the pros and cons of both types of haptic cues and conclude with design recommendations for future haptic solutions for VR accessibility.
[4] Prof Yuhang Zhao has recently received the Grant Accelerator Program (GAP) Award from the McPherson Eye Research Institute to work on “Intent-aware Systems to Enhance Low Vision People’s Visual Perception”
https://vision.wisc.edu/funding_opportunities/grant-accelerator-program/recipients/
[5] Yeping Wang and Michael Gleicher. 2025. Anytime Planning for End-Effector Trajectory Tracking. IEEE Robot. Autom. Lett. 10, 4 (April 2025), 3246–3253. https://doi.org/10.1109/LRA.2025.3540633
https://arxiv.org/abs/2502.03676
End-effector trajectory tracking algorithms find joint motions that drive robot manipulators to track reference trajectories. In practical scenarios, anytime algorithms are preferred for their ability to quickly generate initial motions and continuously refine them over time. In this paper, we present an algorithmic framework that adapts common graph-based trajectory tracking algorithms to be anytime and enhances their efficiency and effectiveness. Our key insight is to identify guide paths that approximately track the reference trajectory and strategically bias sampling toward the guide paths. We demonstrate the effectiveness of the proposed framework by restructuring two existing graph-based trajectory tracking algorithms and evaluating the updated algorithms in three experiments.
[6] “Quake: Adaptive Indexing for Vector Search”
Authors: Jason Mohoney (University of Wisconsin, Madison);
Devesh Sarda (University of Wisconsin, Madison);
Mengze Tang (University of Wisconsin, Madison);
Shihabur Rahman Chowdhury (Apple);
Anil Pacaci (Apple);
Ihab F. Ilyas (University of Waterloo);
Theodoros Rekatsinas (Apple);
Shivaram Venkataraman (University of Wisconsin, Madison)
https://github.com/marius-team/quake
[7] https://edbticdt2025.upc.edu/?contents=invited_keynotes.html
Abstract: Joins are the cornerstone of relational databases. Surprisingly, even after several decades of research in the systems and theory database community, we still lack an understanding of how to design the fastest possible join algorithm. In this talk, we will present the exciting progress the database theory community has achieved in join algorithms over the last two decades. The talk will revolve around five key ideas fundamentally shaping this research area: tree decompositions, data partitioning, leveraging statistical information, enumeration, and algebraic techniques.
Paris Koutris is an associate professor in Computer Sciences at the University of Wisconsin-Madison. His research lies in the intersection of data management theory and practice, focusing on data processing for massively parallel systems, data markets, efficient query processing, and managing uncertain data. He has won the 2016 SIGMOD Jim Gray Dissertation Award for his work on the foundations of parallel data processing, as well as the 2023 PODS Alberto O. Mendelzon Test-of-Time award.
[8] https://arxiv.org/abs/2406.07847
Output-sensitive Conjunctive Query Evaluation
Shaleen Deep, Hangdong Zhao, Austen Z. Fan, Paraschos Koutris
Join evaluation is one of the most fundamental operations performed by database systems and arguably the most well-studied problem in the Database community. A staggering number of join algorithms have been developed, and commercial database engines use finely tuned join heuristics that take into account many factors including the selectivity of predicates, memory, IO, etc. However, most of the results have catered to either full join queries or non-full join queries but with degree constraints (such as PK-FK relationships) that make joins \emph{easier} to evaluate. Further, most of the algorithms are also not output-sensitive.
In this paper, we present a novel, output-sensitive algorithm for the evaluation of acyclic Conjunctive Queries (CQs) that contain arbitrary free variables. Our result is based on a novel generalization of the Yannakakis algorithm and shows that it is possible to improve the running time guarantee of the Yannakakis algorithm by a polynomial factor. Importantly, our algorithmic improvement does not depend on the use of fast matrix multiplication, as a recently proposed algorithm does. The upper bound is complemented with matching lower bounds conditioned on two variants of the k-clique conjecture. The application of our algorithm recovers known prior results and improves on known state-of-the-art results for common queries such as paths and stars.
[9] https://teachingacademy.wisc.edu/
[10] (a) Title: Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems
Abstract: In recent years, system designers have increasingly been turning to heterogeneous systems to improve performance and energy efficiency. Specialized accelerators are frequently used to improve the efficiency of computations that run inefficiently on conventional, general-purpose processors. As a result, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly using large numbers of accelerators (including GPUs) while providing better efficiency than CPU-based solutions. In particular, GPUs are widely used in these systems due to their combination of programmability and efficiency. Traditionally, GPUs are throughput-oriented, focused on data parallelism, and assume synchronization happens at a coarse granularity. However, programmers have begun using these systems for a wider variety of applications which exhibit different characteristics, including latency-sensitivity, mixes of both task and data parallelism, and fine-grained synchronization. Thus, future heterogeneous systems must evolve and make deadline-aware scheduling, more intelligent data movement, efficient fine-grained synchronization, and effective power management first-order design constraints. In the first part of this talk, I will discuss our efforts to apply hardware-software co-design to help future heterogeneous systems overcome these challenges and improve performance, energy efficiency, and scalability. Then, in the second part I will discuss how the on-going transition to chiplet-based heterogeneous systems exacerbates these challenges and how we address these challenges in chiplet-based heterogeneous systems by rethinking the control plane.
(b) Title: Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems
Abstract: The session “The Role of Non-GPU Accelerators and Chiplets in Future HPC Systems” at SOS27 will explore the evolving landscape of high-performance computing (HPC) architectures, focusing on the integration of unique accelerators and chiplet technology. As the demand for computational power continues to surge, traditional GPU-centric approaches are being complemented by innovative solutions that leverage specialized accelerators, FPGAs, ASICs, and Dataflow solutions, all alongside modular chiplet designs. This session will delve into the advantages of these technologies, including enhanced performance, energy efficiency, and scalability, while addressing the challenges of programming, cost, and system integration. Attendees will engage in discussions on real-world applications, case studies, and future trends, equipping them with insights to navigate a potential Cambrian explosion of new HPC architectures more effectively.
Link: https://sos27.cscs.ch/
(c) Title: Leveraging Open-source Simulators to Enable HW/SW Co-Design of Next-generation HPC Systems
Abstract: Multi-fidelity simulation accelerates hardware–software (HW/SW) co-design through design space exploration for next-generation HPC systems. By leveraging open-source simulators like gem5 and SST, early-stage prototyping enables the exploration of system architectures capable of achieving 100 exaFLOPs while consuming only 20 MW of power—a critical target for DOE’s future leadership-class computing. Traditional modeling lacks the scalability and fidelity needed for effective co-design at exascale. A Region of Interest (ROI)-driven methodology optimizes simulation efficiency by dynamically adjusting fidelity based on application characteristics. The integration of post-CMOS accelerators, power models, and fine-grained thermal modeling addresses key performance and power trade-offs. By pinpointing representative regions of interest, significant simulation speedups are achieved, enabling practical large-scale exploration. This work aligns with DOE’s moonshot goals, emphasizing disruptive, cross-layer innovations spanning architecture, runtime, OS, compilers, networking, and batch scheduling. By leveraging insights from prior system co-design efforts and lessons learned, this approach enables agile, adaptable designs that meet the evolving demands of HPC+AI, quantum computing, and scientific computing.
Link: https://orau.gov/2025CSPI
[11] Zhang, S., Li, J., Cagiltay, B., Kirkorian, H., Mutlu, B., & Fawaz, K. (2025). “A qualitative exploration of parents and their children’s uses and gratifications of ChatGPT”. Family Relations.
https://onlinelibrary.wiley.com/doi/full/10.1111/fare.13171
Objective
This Emerging Ideas report explores families’ (parents and their children) uses and gratification for ChatGPT.
Background
Generative artificial intelligence–based conversational agents, such as ChatGPT, can be used to accomplish a variety of tasks, yet little is known about how and why parents and their children may use these technologies.
Methods
We conducted semistructured qualitative and exploratory interviews with 12 U.S.-based families that had experience sharing a ChatGPT account. Families were recruited using social media advertisements, and at least one child and one parent joined the interview. We asked families about what they used ChatGPT for and why they used the platform.
Results
Families reported four main motivators for using ChatGPT: (a) information seeking, (b) enhancing productivity, (c) entertainment, and (d) social bonding. Potential barriers to use included concerns about (a) ChatGPT’s credibility and capabilities, (b) being less familiar with using ChatGPT, (c) the platform’s ethical implications, and (d) possible privacy risks.
Conclusion
Families use ChatGPT for various purposes, but their uses and gratifications sometimes may differ depending on their perceptions of and experiences with the platform.
Implications
Our findings suggest that with some improvements, ChatGPT has the potential to be a useful tool for both individual and shared use in families.
[12] Prof Jha and Prof Chajed recently received Darpa grant through the TRACTOR program The grant involves UW(lead), Berkeley, UIUC, and Edinburgh. The total grant amount is $5M.
https://www.darpa.mil/research/programs/translating-all-c-to-rust
[13] https://arxiv.org/abs/2411.18479
SoK: Watermarking for AI-Generated Content
Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, Dawn Song
As the outputs of generative AI (GenAI) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI-generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAI, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of watermarking techniques for GenAI, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. Practical evaluation strategies are also explored, providing insights into the development of robust watermarking techniques capable of resisting various attacks. Additionally, we review recent representative works, highlight open challenges, and discuss potential directions for this emerging field. By offering a thorough understanding of watermarking in GenAI, this work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.
[14] https://dl.acm.org/doi/10.1145/235968.233324
BIRCH: an efficient data clustering method for very large databases
Authors: Tian Zhang, Raghu Ramakrishnan, Miron Livny
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle “noise” (data points that are not part of the underlying pattern) effectively.We evaluate BIRCH’s time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.
[15] TitletownTech (a VC firm created through a partnership of Green Bay Packers and Microsoft) announced their first round picks in their inaugural “startup draft” — happening at the same time as the NFL draft in Green Bay. One of the picks is a Madison (and Cambrige MA) based company, called Ubicept, which includes Prof. Mohit Gupta as a co-founding advisor and a few others from UW-Madison. They receive $1 million in investment + $350K in Microsoft Azure credits (plus a lot of visibility).
March 2025
[1] https://www.cs.wisc.edu/2025/03/06/sharon-li-2025-sloan-fellow/
One of just 126 recipients across the US and Canada, the Sloan Foundation counts Sharon Li among the next generation of scientific leaders.
Assistant Professor Sharon Li, a researcher in algorithmic and theoretical foundations of safe and reliable AI, is one of two UW–Madison researchers to be named a 2025 Sloan Fellow. Conferred by the Alfred P. Sloan Foundation, the awards recognize promising early-career scientists “whose creativity, innovation, and research accomplishments make them stand out as the next generation of leaders.”
[2] https://www.computer.org/press-room/computer-pioneer-award-recipients-2025
The IEEE Computer Society (CS) today announced the recipients of the 2025 Computer Pioneer Award, which recognizes and honors those whose long-standing efforts have resulted in the advancement and continued vitality of the computer industry. This year, the “IEEE Computer Society Computer Pioneer Award in honor of the Women of ENIAC” recognized two individuals: Gurindar (Guri) Sohi, Vilas Research Professor, John P. Morgridge Professor, and E. David Cronon Professor of Computer Sciences, Computer Science Department, University of Wisconsin-Madison, Madison, Wis., U.S.A.; and Moshe Y. Vardi, University Professor and the George Distinguished Service Professor in Computational Engineering at Rice University, Houston, Texas, U.S.A.
Also see our CDIS write up:
[3] https://research.wisc.edu/professorships-and-faculty-fellowships/vilas-associates/
The Vilas Associates Competition recognizes new and ongoing research of the highest quality and significance. Recipients are chosen competitively by the divisional Research Committees on the basis of a detailed proposal. Winners receive up to two-ninths of research salary support (including the associated fringe costs) for both summers 2025 and 2026, as well as a $12,500 flexible research fund in each of the two fiscal years. Faculty paid on an annual basis are not eligible for the summer salary support but are eligible for the flexible fund portion of this award. This award cannot be awarded more than one time to any faculty member.
[4] https://conf.researchr.org/news/icse-2025
On behalf of the ICSE 2025 Organizing Committee and ICSE 2015 PC Chair, we are delighted to inform you that your paper, “IccTA: Detecting Inter-Component Privacy Leaks in Android Apps”, has been selected for the ICSE Most Influential Paper N-10 Award.
https://ieeexplore.ieee.org/document/7194581
“IccTA: Detecting Inter-Component Privacy Leaks in Android Apps”
Li Li, Alexandre Bartel, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, Patrick McDaniel
Abstract: Shake Them All is a popular “Wallpaper” application exceeding millions of downloads on Google Play. At installation, this application is given permission to (1) access the Internet (for updating wallpapers) and (2) use the device microphone (to change background following noise changes). With these permissions, the application could silently record user conversations and upload them remotely. To give more confidence about how Shake Them All actually processes what it records, it is necessary to build a precise analysis tool that tracks the flow of any sensitive data from its source point to any sink, especially if those are in different components. Since Android applications may leak private data carelessly or maliciously, we propose IccTA, a static taint analyzer to detect privacy leaks among components in Android applications. IccTA goes beyond state-of-the-art approaches by supporting inter- component detection. By propagating context information among components, IccTA improves the precision of the analysis. IccTA outperforms existing tools on two benchmarks for ICC-leak detectors: DroidBench and ICC-Bench. Moreover, our approach detects 534 ICC leaks in 108 apps from MalGenome and 2,395 ICC leaks in 337 apps in a set of 15,000 Google Play apps.
[5] https://www.usenix.org/conference/fast25/presentation/liu-jing
“Fast, Transparent Filesystem Microkernel Recovery with Ananke”
Jing Liu, Microsoft Research; Yifan Dai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin–Madison
Awarded Best Paper!
Abstract: We introduce Ananke, a high-performance filesystem microkernel service that provides transparent recovery from unexpected filesystem failures. Ananke does so by leveraging the unique opportunity of the microkernels, running a small amount of recovery code coordinated by the host OS at the moment of a process crash. Ananke can record key pieces of information not usually available during full-system crash recovery, enabling fast and transparent recovery for applications. Through over 30,000 fault-injection experiments, we demonstrate that Ananke achieves lossless recovery; we also show that Ananke recovers quickly, usually in a few hundred milliseconds. Through real application workloads, we show that Ananke delivers high performance in the common case; the extra work needed to detect faults and prepare for recovery incurs minimal overheads.
[6] http://tab.computer.org/tcde/tcdeawards.html#tcde_ea
On behalf of the TCDE Awards Committee, I am delighted to inform you that you are the recipients of the 2025 IEEE TCDE Ramez Elmasri Outstanding Database Education Award. The citation is as follows:
For authoring a widely used textbook accompanied by an open-source software package that has significantly influenced global education in database systems.
[7] Prof Sharon Li’s student Sean Xuefeng Du was awarded the 2024-2025 Ivanisevic Award is Sean Xuefeng Du. Xuefeng’s research focuses on developing algorithms and providing theoretical understandings that enable safe and reliable machine learning in the open world. He is being advised by Professor Sharon Li.
Hi! I am a fifth-year CS Ph.D. student at UW-Madison advised by Prof. Sharon Yixuan Li. Previously, I was an undergraduate student at Xi’an Jiaotong University (XJTU) majoring in electronic engineering.
My current research focus is to develop theory and algorithms for reliable machine learning, foundation model reliability, and related applications, specifically I work on:
- Out-of-distribution (OOD) research: designing adaptive and interpretable learning algorithms that help ML models detect and generalize on OOD samples, such as semantic/covariate shifts, adversarial and noisy samples.
- Reliability of foundation models: understanding the blindspots of LLMs and VLMs for improved safeguarding efforts, such as model hallucination and harmful prompt detection.
- Applications: Graph embedding, object detection and segmentation.
- Interdisciplinary research: Biometrics, AI-aided cryo-microscopy/protein structural analysis.
[8] https://nick11roberts.science
I am a Ph.D. student in CS at University of Wisconsin–Madison where I am advised by Fred Sala along with my many talented colleagues in the Sprocket Lab.
This past Fall, I interned with the Llama Generative AI team at Meta in London, where I worked with Dieuwke Hupkes on language model pretraining and evaluation. This past Summer, I interned at Together AI in San Francisco with Tri Dao, where I worked on hybrid language models. In Summer 2023, I interned with the Physics of AGI research group at Microsoft Research led by Sébastien Bubeck, where I also worked on language models.
Before starting my Ph.D., I had the pleasure of working with Ameet Talwalkar and Zack Lipton during my M.S. in the Machine Learning Department at Carnegie Mellon University. As an undergraduate, I was extremely fortunate to work with both Sanjoy Dasgupta and Gary Cottrell at the University of California, San Diego. Prior to all of this, I was a community college student at Fresno City College, where I was lucky enough to learn calculus, linear algebra, and C++ from Greg Jamison.
In 2025 I received an honorable mention for the Jane Street Graduate Research Fellowship and in 2023, I was named an MLCommons Rising Star. I have also been awarded the Prove AI and UnifyID AI Fellowships in 2021 and 2019, respectively.
[9] Product Manifold Representations for Learning on Biological Pathways
Daniel McNeela, Frederic Sala, Anthony Gitter
Abstract: Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality embeddings for biological pathway graphs is important for researchers looking to understand the underpinnings of disease and train high-quality predictive models on these networks. In this work, we investigate the effects of embedding pathway graphs in non-Euclidean mixed-curvature spaces and compare against traditional Euclidean graph representation learning models. We then train a supervised model using the learned node embeddings to predict missing protein-protein interactions in pathway graphs. We find large reductions in distortion and boosts on in-distribution edge prediction performance as a result of using mixed-curvature embeddings and their corresponding graph neural network models.
[10] https://pages.cs.wisc.edu/~pb/
Paul Barford is a professor of Computer Sciences at the University of Wisconsin – Madison where he is a member of the networking group and the founder and director of the Wisconsin Advanced Internet Laboratory. He received his B.S. in Electrical Engineering from the University of Illinois and his Ph.D. in Computer Science from Boston University. He was a visiting professor at the Computer Laboratory at the University of Cambridge, a visiting fellow at Wolfson College and an EPSRC visiting fellow. He was also a visiting scholar at Wolfson College at University of Oxford.
Prof. Barford worked in industry for 8 years prior to his academic career. He has been a consultant to companies and frequently gives talks and seminars on computer networking and security issues. He was the founder and CEO of Nemean Networks LLC., which developed network intrusion detection technology. Nemean was acquired by Qualys Inc. where he served as Chief Scientist. Prof. Barford was also co-founder and Chief Scientist of Mdotlabs LLC., which developed fraud detection services for on-line advertising. Mdot was acquired by comScore Inc. where he served as Chief Scientist.
Prof. Barford’s research interests are in computer networking and communications with a focus on measurement and analysis of Internet data, protocols and topological structure. He also investigates Internet security including detection and identification of fraud, traffic anomalies, malicious attacks and intrusions. His research activities have been supported by the NSF, DHS, ARO, Cisco, and Intel among others.
Prof. Barford has served on numerous government panels. He has also served on executive, organizing and technical program committees of many different networking and IT security workshops and conferences including ACM SIGCOMM (’13 PC chair), ACM SIGMETRICS (’10 PC chair) and ACM IMC (’06 PC chair). He served on the board of directors of the National Lambdarail, as an associate editor for IEEE/ACM Transactions on Networking and as co-chair of the GENI Instrumentation and Measurement working group.
[11] MAP: Multi-user Personalization with Collaborative LLM-powered Agents
Christine Lee, Jihye Choi, Bilge Mutlu
CHI 2025.
LearningMate: An LLM-Powered Adaptive Teaching System with Personalized Learning Plans and Tutoring
Jessica Wang, Christine Lee, Bilge Mutlu
CHI 2025.
[12] VisiMark: Characterizing and Augmenting Landmarks for People with Low Vision in Augmented Reality to Support Indoor Navigation
Ruijia Chen, Junru Jiang, Pragati Maheshwary, Brianna R. Cochran, and Yuhang Zhao.
CHI 2025.
Inclusive Avatar Guidelines for People with Disabilities: Supporting Disability Representation in Social Virtual Reality
Kexin Zhang, Edward Glenn Scott Spencer, Andric Li, Ang Li, Yaxing Yao, and Yuhang Zhao. CHI 2025.
Beyond the “Industry Standard”: Focusing Gender-Affirming Voice Training Technologies on Individualized Goal Exploration
Kassie Povinelli, Hanxiu “Hazel” Zhu, and Yuhang Zhao.
CHI 2025.
[13] Understanding Mixed Reality Drift Tolerance
Daniel Killough, Daniel Killough, Yuhang Zhao, Bilge Mutlu
CHI 2025.
[14] “Demonstration of VRSight: AI-Driven Real-Time Descriptions to Enhance VR Accessibility for Blind People”
Daniel Killough, Justin Feng, Rithvik Dyava, Zheng Xue Ching, Daniel Wang, Yapeng Tian, Yuhang Zhao.
CHI 2025 Interactivity.
[15] Checking observational correctness of database systems
OOPSLA 25 · Pick, Xu, Desai, Seshia, Albarghouthi
[16] Dependency-aware compilation for surface code quantum architectures
OOPSLA 25 · Molavi, Xu, Tannu, Albarghouthi
[17] Carter Sifferman, William Sun, Mohit Gupta, and Michael Gleicher. 2024. Using a Distance Sensor to Detect Deviations in a Planar Surface. IEEE Robotics and Automation Letters 9, 10 (2024), 8515–8522. https://doi.org/10.1109/LRA.2024.3445665
https://arxiv.org/abs/2408.03838
We investigate methods for determining if a planar surface contains geometric deviations (e.g., protrusions, objects, divots, or cliffs) using only an instantaneous measurement from a miniature optical time-of-flight sensor. The key to our method is to utilize the entirety of information encoded in raw time-of-flight data captured by off-the-shelf distance sensors. We provide an analysis of the problem in which we identify the key ambiguity between geometry and surface photometrics. To overcome this challenging ambiguity, we fit a Gaussian mixture model to a small dataset of planar surface measurements. This model implicitly captures the expected geometry and distribution of photometrics of the planar surface and is used to identify measurements that are likely to contain deviations. We characterize our method on a variety of surfaces and planar deviations across a range of scenarios. We find that our method utilizing raw time-of-flight data outperforms baselines which use only derived distance estimates. We build an example application in which our method enables mobile robot obstacle and cliff avoidance over a wide field-of-view.
[18] Yeping Wang, Alexander Peseckis, Zelong Jiang, and Michael Gleicher. 2024. Motion Comparator: Visual Comparison of Robot Motions. IEEE Robot. Autom. Lett. 9, 9 (September 2024), 7699–7706. https://doi.org/10.1109/LRA.2024.3430649
https://arxiv.org/abs/2407.02746
Roboticists compare robot motions for tasks such as parameter tuning, troubleshooting, and deciding between possible motions. However, most existing visualization tools are designed for individual motions and lack the features necessary to facilitate robot motion comparison. In this paper, we utilize a rigorous design framework to develop Motion Comparator, a web-based tool that facilitates the comprehension, comparison, and communication of robot motions. Our design process identified roboticists’ needs, articulated design challenges, and provided corresponding strategies. Motion Comparator includes several key features such as multi-view coordination, quaternion visualization, time warping, and comparative designs. To demonstrate the applications of Motion Comparator, we discuss four case studies in which our tool is used for motion selection, troubleshooting, parameter tuning, and motion review.
[19] Yeping Wang and Michael Gleicher. Hierarchically Accelerated Coverage Path Planning for Redundant Manipulators. ICRA 2025.
https://arxiv.org/abs/2502.19591
Many robotic applications, such as sanding, polishing, wiping and sensor scanning, require a manipulator to dexterously cover a surface using its end-effector. In this paper, we provide an efficient and effective coverage path planning approach that leverages a manipulator’s redundancy and task tolerances to minimize costs in joint space. We formulate the problem as a Generalized Traveling Salesman Problem and hierarchically streamline the graph size. Our strategy is to identify guide paths that roughly cover the surface and accelerate the computation by solving a sequence of smaller problems. We demonstrate the effectiveness of our method through a simulation experiment and an illustrative demonstration using a physical robot.
[20] Georgia Tech CRNCH Summit https://crnch.gatech.edu/crnch-summit-2025/
“Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems”
Matt Sinclair
Abstract: In recent years, system designers have increasingly been turning to heterogeneous systems to improve performance and energy efficiency. Specialized accelerators are frequently used to improve the efficiency of computations that run inefficiently on conventional, general-purpose processors. As a result, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly using large numbers of accelerators (including GPUs) while providing better efficiency than CPU-based solutions. In particular, GPUs are widely used in these systems due to their combination of programmability and efficiency. Traditionally, GPUs are throughput-oriented, focused on data parallelism, and assume synchronization happens at a coarse granularity. However, programmers have begun using these systems for a wider variety of applications which exhibit different characteristics, including latency-sensitivity, mixes of both task and data parallelism, and fine-grained synchronization. Thus, future heterogeneous systems must evolve and make deadline-aware scheduling, more intelligent data movement, efficient fine-grained synchronization, and effective power management first-order design constraints. In the first part of this talk, I will discuss our efforts to apply hardware-software co-design to help future heterogeneous systems overcome these challenges and improve performance, energy efficiency, and scalability. Then, in the second part I will discuss how the on-going transition to chiplet-based heterogeneous systems exacerbates these challenges and how we address these challenges in chiplet-based heterogeneous systems by rethinking the control plane.
“ModSim Will Play a Crucial Role in Enabling the Next 15 Years of Computer Architecture Research … But Needs a Lot of TLC”
Matt Sinclair
My Talk’s Abstract: As modern computing systems become increasingly complex and heterogeneous, modeling and simulation (ModSim) infrastructure is straining to keep up. Nevertheless, ModSim plays a crucial role in enabling rapid, early-stage co-design for these systems. In this talk, I discuss how ModSim will be crucial to enabling the next 15 years of computer architecture research, as well as (some of) the challenges for us to get there.
[21] Here is a CS article about students from the Wisconsin AI Safety Initiative (WAISI) who traveled to DC to present to Congressional staffers, journalists on risks associated with the proliferation of AI:
https://www.cs.wisc.edu/2025/03/13/waisi-presents-caip-advanced-ai-expo/
[22] https://it.wisc.edu/cio-blog/computing-infrastructure-research-rise-initiative/
February 2025
[1] Vilas Distinguished Achievement Professor awarded to Prof Jerry Zhu.
https://provost.wisc.edu/vilas-distinguished-achievement-professorship/
Vilas Distinguished Achievement Professorships (VDAP) recognize UW–Madison faculty members whose distinguished scholarship has advanced the confines of knowledge, and whose excellence also includes teaching or service. Faculty members receiving this award keep the Vilas Distinguished Achievement Professorship title for the duration of their careers.
[2] Prof Jelena Diakonikolas wins NSF CAREER award.
https://www.cs.wisc.edu/2025/01/29/jelena-diakonikolas-receives-nsf-career-award/
Assistant Professor Jelena Diakonikolas, a researcher in the field of large-scale optimization, recently received a National Science Foundation (NSF) CAREER Award for her proposal “Optimization and Learning with Changing Distributions.” The award comes with $700,000 in funding to support research and education initiatives through 2029.
[3] Prof Paul Barford wins Chancellor’s Teaching Innovation Award.
https://news.wisc.edu/2025-distinguished-teaching-award-recipients-announced/
Twelve faculty members have been chosen to receive this year’s Distinguished Teaching Awards, an honor given out since 1953 to recognize some of the university’s finest educators.
[4] Prof Patrick McDaniel gives Distinguished Lecture at Iowa State
Adversarial Machine Learning: A 10-year Perspective, Distinguished Lecture,
Iowa State University, Ames, IA, January, 2025.
Older video of said talk from UIUC: https://mediaspace.illinois.edu/media/t/1_19ei0f5b/180208991
[5] Prof Miron Livny gives talk on OSG and PATh
OSG (Open Science Grid) and PATh (Partnership to Advance Throughput Computing). PATh is a partnership launched by the NSF in 2020 between UW-Madison Center for High Throughput Computing (CHTC) and the OSG Consortium to advance Throughput Computing. Established in 2005, the OSG Consortium operates a fabric of distributed High Throughput Computing (dHTC) services in support of the National Science & Engineering community. The research collaborations, campuses, national laboratories, and software providers that form the consortium are unified in their commitment to advance open science via these services.
[6] https://www.acm.org/media-center/2025/january/fellows-2024
Dr. Satish Chandra (Google), who did his Ph.D. under former Prof Jim Larus, is one of this year’s ACM Fellows. Congratulations Satish! Satish received our department’s Outstanding Graduate Student Research Award in 1997 (see here). To our knowledge (thanks Prof Tom Reps!), Satish is the third UW CS Ph.D. from the PL group who has been named an ACM Fellow, along with Tom Ball (Microsoft) and G. Ramalingam (Microsoft).
[7] The ACM India Council is delighted to announce Dr. Arkaprava (Arka) Basu of IISc as the recipient of the ACM India Early Career Researcher Award for 2024.
https://www.acm.org/articles/acm-india-bulletins/2025/ecr-award-2024
The ACM India Early Career Researcher (ECR) Award recognizes individuals in early stages of their careers, who have made fundamental, innovative, and impactful contributions to the computing field while primarily working in India. The ECR award carries a prize of ₹15 lakhs. Financial support for this award is provided by Persistent Foundation.
Basu is being recognized “for creative and impactful contributions towards efficient memory management for GPUs and CPUs through innovations spanning the software and hardware stacks.” Arkaprava is a leading systems researcher in India with a track record of consistently publishing impactful research in flagship conferences and exhibiting leadership roles at the top conferences of his field. Arkaprava’s research focuses on the software and the hardware of Graphics Processing Units (GPUs), which is the workhorse for most AI tasks in today’s data-driven world. His work takes a holistic view of a system design by cutting across the software and hardware stack from the compiler, runtimes, and operating system to the hardware architecture. Besides top academic publications, he also has deep collaborations with and long term consultancy for tech giants in the field.
Basu is an associate professor at the Indian Institute of Science (IISc), Bangalore. Before joining IISc in 2018, he worked as a full-time member of the research staff at AMD Research in Austin, USA, for four years. Earlier, Arkaprava obtained his PhD in computer science from the University of Wisconsin-Madison, USA in December 2013.
Dissertation: https://research.cs.wisc.edu/multifacet/theses/arka_basu_phd.pdf
The ECR award recipient is chosen by a jury comprising eminent personalities in the computing community. The jury follows a rigorous process wherein each nomination is individually reviewed by all the committee members, then collated and deliberated upon in multiple evaluation rounds, and finally decided in the presence of a Steering Committee-appointed observer.
[8] SET-PAiREd: Designing for Parental Involvement in Learning with an AI-Assisted Educational Robot
Hui-Ru Ho, Nitigya Kargeti, Ziqi Liu, Bilge Mutlu
https://arxiv.org/pdf/2305.02525
AI-assisted learning companion robots are increasingly used in early education. Many parents express concerns about content appropriateness, while they also value how AI and robots could supplement their limited skill, time, and energy to support their children’s learning. We designed a card-based kit, SET, to systematically capture scenarios that have different extents of parental involvement. We developed a prototype interface, PAiREd, with a learning companion robot to deliver LLM-generated educational content that can be reviewed and revised by parents. Parents can flexibly adjust their involvement in the activity by determining what they want the robot to help with. We conducted an in-home field study involving 20 families with children aged 3–5. Our work contributes to an empirical understanding of the level of support parents with different expectations may need from AI and robots and a prototype that demonstrates an innovative interaction paradigm for flexibly including parents in supporting their children.
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
Christine Lee, David Porfirio, Jessica Wang, Kevin Chenkai Zhao, Bilge Mutlu
Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features—a rule translator, flexibility sliders, and a model checker—that engage users in the verification process. Through a user study ($n=12$), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.
Bridging Generations using AI-Supported Co-Creative Activities
Callie Kim, Arissa Sato, Nathan White, Hui-Ru Ho, Christine Lee, Yuna Hwang, Bilge Mutlu
Intergenerational collaboration between grandparents and grandchildren can be challenging due to differences in technological familiarity. AI has emerged as a promising tool to support collaborative activities, offering flexibility and creative assistance, but its role in facilitating intergenerational collaboration remains underexplored. In this study, we conducted a user study with 29 grandparent-grandchild groups engaged in AI-supported story creation to examine how AI-assisted co-creation can facilitate collaboration. Our findings show that grandchildren managed the technical aspects, while grandparents contributed creative ideas and guided the storytelling. AI played a key role in structuring the activity, facilitating brainstorming, enhancing storytelling, and balancing the contributions of both generations. The process fostered mutual appreciation, with each generation recognizing the strengths of the other, leading to an engaging and cohesive co-creation process. We offer design implications for integrating AI into intergenerational co-creative activities, emphasizing how AI can enhance collaboration across skill levels and technological familiarity.
[9] “GOLDYLOC: Global Optimizations & Lightweight Dynamic Logic for Concurrency”
Authors: Suchita Pati, Shaizeen Aga, Nuwan Jayasena, Matthew D. Sinclair
Venue: ACM Transactions on Architecture & Code Optimization (ACM TACO)
Abstract: Modern accelerators like GPUs are increasingly executing independent operations concurrently to improve the device’s compute utilization. However, effectively harnessing it on GPUs for important primitives such as general matrix multiplications (GEMMs) remains challenging. Although modern GPUs have significant hardware and software support for GEMMs, their kernel implementations and optimizations typically assume each kernel executes in isolation and can utilize all GPU resources. This approach is highly efficient when kernels execute in isolation, but causes significant resource contention and slowdowns when kernels execute concurrently. Moreover, current approaches often only statically expose and control parallelism within an application, without considering runtime information such as varying input size and concurrent applications — often exacerbating contention. These issues limit performance benefits from concurrently executing independent operations. Accordingly, we propose GOLDYLOC , which considers the global resources across all concurrent operations to identify performant GEMM kernels, which we call globally optimized (GO)-Kernels. Moreover, GOLDYLOC introduces a lightweight dynamic logic which considers the dynamic execution environment for available parallelism and input sizes to execute performant combinations of concurrent GEMMs on the GPU. Overall, GOLDYLOC improves performance of concurrent GEMMs on a real GPU by up to 2× (18% geomean per workload) and provides up to 2.5x (43% geomean per workload) speedups over sequential execution.
[10] Prof Matt Sinclair participated in a panel entitled “AI’s Energy Future: Balancing Growth, the Grid, and the Environment” and gave a talk “Designing Hardware to Meet the Demands of Machine Learning & Artificial Intelligence” at a WEI event on AI and energy.
Abstract: As use of artificial intelligence grows, so does the demand for electricity and water to power and cool hyper-scale data centers. Many utilities have proposed building additional fossil fuel power plants or keeping older ones running to meet this new demand. Meanwhile, most major computing companies have made commitments to power their operations with clean energy. In this Forum, we’ll explore the potential demands that the growth of AI and data centers may place on the grid and what can be done to mitigate its environmental impact (e.g. improving energy efficiency of computing, or locating and operating data centers in ways that maximize use of renewable energy). How much will electricity demand from data centers truly grow? How might that growth affect both the environment and other electricity customers? And what can we do to ensure that we address that growth in the most sustainable way?
[11] PhD Theses recently completed include:
Rishabh Khandelwal
Assessing and Enhancing Users’ Online Data Privacy
Advisor: Fawaz, Kassem
Committee: Ramanathan, Parameswaran (Parmesh); Fawaz, Kassem; Chatterjee; Rahul; Zhao, Yuhang
https://search.library.wisc.edu/digital/AGUNKPQPZ23VHP8Y
Anthony Lawrence George Rebello
Correctness and Performance with Declarative System Calls
Advisor: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi
Committee: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi; Swift, Michael; Venkataraman, Shivaram; Sinclair, Matthew
https://search.library.wisc.edu/digital/AWAYYAKZRTR7KC8U
Roger Waleffe
Algorithms and Systems for Scalable Machine Learning over Graphs
Advisor: Wright, Steve
Committee: Wright, Steve; Rekatsinas, Theodoros; Venkataraman, Shivaram; Papailiopoulos, Dimitris
https://asset.library.wisc.edu/1711.dl/ZVOS5EFYR2LHA8L/R/file-000e8.pdf
Edward Barton
Practical Single Shot High Dynamic Range Imaging
Advisor: Hu, Yu Hen
Committee: Hu, Yu Hen; Lee, Yong Jae; Li, Yin; Chen, Yudong
https://www.linkedin.com/in/edwardabarton/
Zhenmei Shi
Understanding Training and Adptation in Feature Learning: From Two-layer Networks to Foundation Models
Advisor: Liang, Yingyu
Committee: Liang, Yingyu; Zhu, Xiaojin; Sala, Frederic; Li, Yin
https://search.library.wisc.edu/digital/ASBHFA7AP6WTKM8K
Jing Liu
Scale, Performance, and Fault-Tolerance in a Filesystem Semi-Microkernel
Advisor: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi
Committee: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi; Swift, Michael; Yu, Xiangyao; Fawaz, Kassem; Zhang, Irene (MSR)
https://asset.library.wisc.edu/1711.dl/4GNA6MNYXHEDM8Y/R/file-2b9e2.pdf
[12] Masters Theses (or Projects)
Tommy Chang
Shivaram Venkataraman
EVA: Cost-Efficient Cloud-Based Cluster Scheduling
Neeraj Surawar
Matt Sinclair
(Project) NUMA-Aware Queue Scheduler for Multi-Chiplet GPUs
Jean-Charles Noirot Ferrand
Patrick McDaniel
Extracting the Harmfulness Classifier of Aligned LLMs
Kunyang Li
Patrick McDaniel
On Adversarial and Common Robustness of Parameter-Efficient Fine-Tuning Strategies
Jingxin Du
Prof. Kevin Ponto
Towards the Recreation of More Realistic 3D Virtual Museum Tours
Muhammad Danial Saleem
Thesis (6 total credits)
Xiangyao Yu
Evaluating and Improving Bioinformatics Tools Using DBMS Techniques
Sriram Ashokkumar
Thesis (6 total credits)
Dan Negrut
Project Topic: Multimodal Models and LLMs with robotics using Simulation for Training
Shashwatha Mitra Ghante Banuprakash
Thesis (6 total credits)
Rishab Goyal
Lattice based Cryptography
Binwei Yao
Thesis (6 total credits)
Junjie Hu
Multi-Community Preference Alignment of LLMs
Aditya Das Sarma
Thesis (6 total credits)
Rahul Chatterjee
Multi-bit flip-flop clustering for power optimization
Jaden Park
Thesis (6 total credits)
Yong Jae Lee
Data Contamination in Vision Language Models
Rose Ceccio
Thesis (6 total credits)
Rahul Chatterjee
The technology used in intimate partner surveillance
Shengyuan Yang
Thesis (6 total credits)
Thomas Reps
Precise Pointer Analysis for Java Streams
Tara Bobinac
Thesis (6 total credits)
Junjie Hu
Leveraging LLMs to Gauge Perspectives on Social Issues in Latinx and American Communities
Prakriti Garg
Thesis (6 total credits)
Sushmita Roy
Graph representation learning framework for principled analysis of orthologous regulatory regions
David Parra
Project (3 total credits)
Mohit Gupta
Practical 3D Time-of-flight imaging with single-photon cameras.
Keith Johnson
Thesis (6 total credits)
Aws Albarghouthi
Tools and Techniques for Semantics-Guided Synthesis
Alexander Peseckis
Project (3 total credits)
Michael Gleicher
Occlusion-Conscious Viewpoint Optimization
Binwei Yao
Thesis (6 total credits)
Junjie Hu
Pluralistic Alignment of Large Language Models
Suenggwan Jo
Thesis (6 total credits)
Fred Sala
Foundation Model usage for forecasting
[13] ICRA 2025 paper
“Shape-programming Robotic Reflectors for Wireless Networks”
Yawen Liu, Akarsh Prabhakara, Jiangyifei Zhu, Shenyi Qiao, Swarun Kumar
Shape-programming Robotic Reflectors for Wireless Networks
(link not working, yet?)
[14] We had one paper accepted at ICLR and one at NAACL. The latter was led by Yijing Zhang, who was an undergraduate in our program.
Changho Shin, John Cooper, Frederic Sala, “Weak-to-Strong Generalization Through the Data-Centric Lens,” ICLR 2025
https://arxiv.org/abs/2412.03881
Weak-to-Strong Generalization Through the Data-Centric Lens: The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.
Yijing Zhang, Dyah Adila, Changho Shin, Frederic Sala, “Personalize Your LLM: Fake It Then Align It,” NAACL 2025 Findings
https://yijingz02.github.io/files/CHAMELEON_NAACL_.pdf
Personalize Your LLM: Fake It Then Align It: Personalizing large language models (LLMs) is essential for delivering tailored interactions that improve user experience. Many existing personalization methods require fine-tuning LLMs for each user, rendering them prohibitively expensive for widespread adoption. Although retrieval-based approaches offer a more compute-efficient alternative, they still depend on large, high-quality datasets that are not consistently available for all users. To address this challenge, we propose CHAMELEON, a scalable and efficient personalization approach that uses (1) self-generated personal preference data and (2) representation editing to enable quick and cost-effective personalization. Our experiments on various tasks, including those from the LaMP personalization benchmark, show that CHAMELEON efficiently adapts models to personal preferences, improving instruction-tuned models and outperforms two personalization baselines by an average of 40% across two model architectures.
December 2024
[1] “From Foe to Friend: The Surprising Turn of Mega Constellations in Radio Astronomy”
HotNets ‘25
Cheap spaceflight has ushered in an explosive growth era for Low Earth Orbit (LEO) satellites. While this has brought us LEO satellite megaconstellations for ubiquitous highspeed data, it has also enabled a proliferation of nanosatellites (e.g. CubeSats) launched by diverse organizations. An unfortunate side-effect is harmful interference to sensitive receivers like those of radio astronomy — no place on Earth is safe. How can we enjoy the fruits of the satellite revolution without blinding ourselves to the secrets of the universe?
Networking is the key. This paper proposes InOrbitNet, which aggregates and backhauls traffic from low-capability nanosatellites using highly-capable LEO megaconstellations. By simulating LEO and nanosatellite orbit transitions, we show that orders-of-magnitude reductions in latency and significant increases in capacity are possible as compared to the current non-networked direct-to-ground approach. But more importantly, because LEO megaconstellations are highly capable and tightly managed, this consolidation of RF footprints also allows radio astronomy to be protected from interference.
https://conferences.sigcomm.org/hotnets/2024/papers/hotnets24-62.pdf
and
“WiFi Physical Layer Stays Awake and Responds When it Should Not”
WF-IoT
WiFi communication should be possible only between devices inside the same network. However, we find that all existing WiFi devices send back acknowledgments (ACKs) to even fake packets received from unauthorized WiFi devices outside of their network. Moreover, we find that an unauthorized device can manipulate the power-saving mechanism of WiFi radios and keep them continuously awake by sending specific fake beacon frames to them. Our evaluation of over 5000 devices from 186 vendors confirms that these are widespread issues. We believe these loopholes cannot be prevented, and hence they create privacy and security concerns. Finally, to show the importance of these issues and their consequences, we implement and demonstrate two attacks where an adversary performs battery drain and WiFi sensing attacks just using a tiny WiFi module which costs less than ten dollars.
https://arxiv.org/abs/2301.00269
[2] https://chtc.cs.wisc.edu/unravelling-antibiotic-resistance.html
“I have stuck with the OSPool because the access is unprecedented—the amount of compute time you can get, the number of jobs you can run at a time”
Erik Wright, Associate Professor of Biomedical Informatics at the University of Pittsburgh
As bacterial infections evolve to develop resistance to drugs, it is critical to investigate methodologies to slow resistance and, potentially, even reverse it. In his quest to discover new antibiotics to counter resistance, Erik Wright’s project, BioMedInfo, aims to compare bacteria with resistant genomes by grouping them into a network of genes with related functions.
Read more in the link above!
[3] https://new.nsf.gov/events/
[4] From our lab, led by UW CS students:
Exploring the Use of Robots for Diary Studies
Michael F Xu, Bilge Mutlu
Designing Telepresence Robots to Support Place Attachment
Yaxin Hu, Anjun Zhu, Catalina Toma, Bilge Mutlu
With collaborators:
Making Sense of Public Space for Robot Design
Hannah Pelikan, Bilge Mutlu, Stuart Reeves
(w/ European collaborators)
The Connection-Coordination Rapport (CCR) Scale
Ting-Han Lin, Hannah Dinner, Tsz Long Leung, Bilge Mutlu, Greg Trafton, Sarah Sebo
(w/ U Chicago collaborators)
ImageInThat: Manipulating Images to Convey User Instructions to Robots
Karthik Mahadevan, Blaine Lewis, Jiannan Li, Bilge Mutlu, Anthony Tang, Tovi Grossman
(w/ U Toronto collaborators)
[5] “SYSTEMS, METHODS, AND MEDIA FOR EULERIAN SINGLE-PHOTON COMPUTER VISION”
Authors: Shantanu Gupta, Mohit Gupta
https://www.freepatentsonline.com/y2024/0370974.html
In accordance with some embodiments, systems, methods, and media for Eulerian single-photon computer vision are provided. In some embodiments, the system comprises: an image sensor comprising detectors configured to detect arrival of individual photons, and arranged in an array; a processor programmed to: cause the image sensor to generate a sequence of images representing a scene, each of the images comprising a plurality of pixels; perform, for each of a plurality of three dimensional filters, a convolution between the three dimensional filter and a plurality of frames, wherein each of the plurality of frames is based on one or more of the images of the sequence of images; generate, for each of the plurality of frames, a plurality of filter bank responses each corresponding to a three dimensional filter of the plurality of three dimensional filters; and perform a computer vision process based on the plurality of filter responses.
[6] Matthew Zurek is the winner of the INFORMS Applied Probability Society Best Student Paper Prize.
https://connect.informs.org/aps/apsawards/awards
The relevant paper is https://arxiv.org/abs/2403.11477
[7] “Non-interactive Blind Signatures: Post-quantum and Stronger Security”
with my students and external collaborator (Foteini Baldimtsi, Jiaqi Cheng, Rishab Goyal, Aayush Yadav)
Asiacrypt 2024
https://eprint.iacr.org/2024/614
“Leakage-Resilient Incompressible Cryptography: Constructions and Barriers”
with my co-advised postdoc and external collaborators (Kaartik Bhushan, Rishab Goyal, Venkata Koppula, Varun Narayanan, Manoj Prabhakaran, Mahesh Sreekumar Rajasree)
https://www.iacr.org/cryptodb/data/paper.php?pubkey=34710
“Multi-Authority Functional Encryption with Bounded Collusions from Standard Assumptions”
with my student (Rishab Goyal, Saikumar Yadugiri)”
TCC 2024
https://link.springer.com/chapter/10.1007/978-3-031-78020-2_1
“Incompressible Functional Encryption”
with my co-advised postdoc and external collaborators (Rishab Goyal; Venkata Koppula, Mahesh Sreekumar Rajasree, Aman Verma)
ITCS 2025
https://eprint.iacr.org/2024/798
[8] “Simulating Machine Learning Models at Scale”
Authors: Vishnu Ramadas, Matthew D. Sinclair
Abstract: Traditionally, architectural simulators are used to perform early exploration. However, detailed simulation of modern systems can take extremely long times in existing tools. Furthermore, prototyping optimizations at scale can also be challenging, especially for newly proposed accelerators. In this talk we discuss our efforts to enhance gem5’s support to make simulating AI/ML workloads practical to run while retaining accuracy. This will enable users to use the same early exploration platform, gem5, for both homogeneous and heterogeneous systems. Specifically, we are extending gem5’s existing simulation infrastructure to create a scalable simulation framework that models diverse system components with varying levels of detail. We will also develop novel mechanisms that identify the most important parts of the applications to simulate at high fidelity while reducing simulation time and add support for multi-chiplet and multi-node (e.g., multi-GPU and multi-accelerator) simulations to drive a flexible, adaptable simulation infrastructure that enables rapid prototyping of different optimized AI and ML algorithms and architectures while ensuring that the changes are representative of real, modern hardware and software.
[9] “RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator”
Jie Zhang, Hongjing Huang, Xuzheng Xu, Xiang Li, Ming Liu, Zeke Wang
Abstract: The emerging microservice/serverless-based cloud programming paradigm and the rising networking speeds leave the RPC stack as the predominant data center tax. Domain-specific hardware acceleration holds the potential to disentangle the overhead and save host CPU cycles. However, state-of-the-art RPC accelerators integrate RPC logic into the CPU or use specialized low-latency interconnects, hardly adopted in commodity servers. To this end, we design and implement RPCAcc, a software-hardware co-designed RPC on-NIC accelerator that enables reconfigurable RPC kernel offloading. RPCAcc connects to the server through the most widely used PCIe interconnect. To grapple with the ramifications of PCIe-induced challenges, RPCAcc introduces three techniques:(a) a target-aware deserializer that effectively batches cross-PCIe writes on the accelerator’s on-chip memory using compacted hardware data structures; (b) a memory-affinity CPU-accelerator collaborative serializer, which trades additional host memory copies for slow cross-PCIe transfers; (c) an automatic field update technique that transparently codifies the schema based on dynamic reconfigure RPC kernels to minimize superfluous PCIe traversals. We prototype RPCAcc using the Xilinx U280 FPGA card. On HyperProtoBench, RPCAcc achieves 3.2× lower serialization time than a comparable RPC accelerator baseline and demonstrates up to 2.6× throughput improvement in the end-to-end cloud workload.
https://arxiv.org/abs/2411.07632
[10] “Optimizing Quantum Circuits, Fast and Slow”
Amanda Xu, Abtin Molavi, Swamit Tannu, Aws Albarghouthi
ASPLOS 2025
https://pages.cs.wisc.edu/~aws/papers/asplos25.pdf
Optimizing quantum circuits is critical: the number of quan- tum operations needs to be minimized for a successful eval- uation of a circuit on a quantum processor. In this paper we unify two disparate ideas for optimizing quantum circuits, rewrite rules, which are fast standard optimizer passes, and unitary synthesis, which is slow, requiring a search through the space of circuits. We present a clean, unifying framework for thinking of rewriting and resynthesis as abstract circuit transformations. We then present a radically simple algo- rithm, guoq, for optimizing quantum circuits that exploits the synergies of rewriting and resynthesis. Our extensive evaluation demonstrates the ability of guoq to strongly out- perform existing optimizers on a wide range of benchmarks.
[11] “Making Serverless Pay-For-Use a Reality with Leopard”
Tingjia Cao, A. Arpaci-Dusseau, R. Arpaci-Dusseau, T. Caraza-Harter
NSDI ‘25
Abstract: Serverless computing has gained traction due to its event-driven architecture and ”pay for use” (PFU) billing model. However, our analysis reveals that current billing practices do not align with the true resource consumption. This paper challenges the prevailing SLIM (static, linear, interactive-only model) assumptions that underpin existing billing models, demonstrating that current billing does not realize PFU for realistic workloads. We introduce the Nearly Pay-for-Use (NPFU) billing model, which accommodates varying CPU and memory demands, spot cores, and preemptible memory. We also introduce Leopard, an NPFU-based serverless platform that integrates billing awareness into several major subsystems: CPU scheduler, OOM killer, admission controller, and cluster scheduler. Experimental results indicate that Leopard benefits both providers and users, increasing throughput by more that 2x and enabling cost reductions.
It begins: “If you spend enough time around computer science majors, one of the things you’re likely to hear is a long series of complaints about course enrollment. That’s been my experience living with two CS majors, and after two years of hearing about the shortcomings of the course enrollment process in the CS department, I decided to take a closer look at the problem myself.”Read the rest at the link above!
November 2024
[1] ACSAC 2024 Test of Time Award
Year: ACSAC 2009
Title: Semantically Rich Application-Centric Security in Android
Authors: Ongtang, Machigar; McLaughlin, Stephen; Enck, William; McDaniel, Patrick
“On the occasion of ACSAC’s 35th Anniversary, the Steering Committee instituted the inaugural Test of Time Awards. These awards provide an opportunity to honor papers that have been published at ACSAC that have had enduring significance and impact to the security community. The committee considered papers published more than 15 years ago, and discussed their impact on academia, industry, and government.”
From: https://www.acsac.org/2023/program/test_of_time/
[2] Today was the regional round of this year’s International Collegiate Programming Contest (ICPC). UW-Madison participated with 6 teams at a site on the Epic campus. Our teams placed 1st, 4th, 7th, 8th, 10th, and 20th among the 96 teams in our region. Our top team was the only to solve 9 of the problems. See the scoreboard for more details.
This is the 6th year in a row that UW-Madison places first in the regional competition, and the 16th year since we started participating in 2001. Our top team advances to the North America Championship, where about 15 teams qualify for the world finals. Given our silver medal at the world finals last year, we’re going for gold this year! 😊
Congratulations to all our teams and thanks to the people at Epic for running a regional site on their campus!
The UW-Madison teams that participated in the regional competition:
– Gryffindor (1st place:Anton Cheng, Harry Huang, Tuan Tai Nguyen
– .sort()[-1] (4th place): Tan Bui, HuaiYuan Jing, Merlin Morton
– Mini Soda (7th place): Guanlin Chen, Xinchen Shen, Enze Zhang
– Apples are great (8th place): Brandon Han, Ishan Joshi, Xuan Nghia Nguyen
– JQKA (10th place): Anda He, Kalyan Sriram, Jierui Xu
– chr(sum(range(ord(min(str(not())))))) (20th place): Ansh Aggarwal, Anna Sun, Austin Tsai
[3] Four papers accepted at NeurIPS:
The AlCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators (Spotlight)
Tzu-Heng Huang, Catherine Cao, Vaishnavi Bhargava, Frederic Sala
Abstract: Abstract: Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, Alchemist, obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a 12.9% enhancement while the total labeling costs across all datasets are reduced by a factor of approximately 500×.
https://arxiv.org/abs/2407.11004
OTTER: Effortless Label Distribution Adaptation of Zero-shot Models
Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala
Abstract: Popular zero-shot models suffer due to artifacts inherited from pretraining. One particularly detrimental issue, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have mismatching requirements, such as needing access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like prior matching—often by significant margins—in 17 out of 21 datasets.
https://arxiv.org/abs/2404.08461
Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks
Tianyi Zhang, Linrong Cai, Jeffrey Li, Nicholas Roberts, Neel Guha, Frederic Sala
Abstract: Weak supervision (WS) is a popular approach for label-efficient learning, leveraging diverse sources of noisy but inexpensive weak labels to automatically annotate training data. Despite heavy usage, the value of WS is challenging to benchmark due to its complexity: the knobs involved include data sources, labeling functions (LFs), aggregation techniques, called label models (LMs), and end model pipelines. Existing evaluation suites tend to be limited, focusing on particular components or specialized use cases, or relying on simplistic benchmark datasets with poor LFs, producing insights that may not generalize to real-world settings. We address these by introducing a new benchmark, BoxWRENCH, designed to more accurately reflect real-world usage of WS. This benchmark features (1) higher class cardinality and imbalance, (2) substantial domain expertise requirements, and (3) linguistic variations found in parallel corpora. We improve upon existing benchmark LFs using a rigorous procedure aimed at mimicking real-world settings. In contrast to existing WS benchmarks, we show that in many practical settings supervised learning requires substantial amounts of labeled data to match WS performance.
https://openreview.net/forum?id=c7SApXZz4b#discussion
Pearls from Pebbles: Improved Confidence Functions for Auto-labeling
Harit Vishwakarma, Yi Chen, Sui Jiet Tay, Satya Sai Srinath Namburi, Frederic Sala, Ramya Korlakai Vinayak
Abstract: Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual annotation. A prominent variant, threshold-based auto-labeling (TBAL), works by finding thresholds on a model’s confidence scores above which it can accurately automatically label unlabeled data. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, we show that such methods fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the optimal TBAL confidence function. We develop a tractable version of the framework to obtain Colander (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of Colander and compare it against methods designed for calibration. Colander achieves up to 60% improvement on coverage over the baselines while maintaining error level below 5% and using the same amount of labeled data.
https://arxiv.org/abs/2404.16188
[4] On Friday, Nov. 1, CS Assistant Professor Sharon Li gave a “Friday Lunch” talk with the UW-Madison Center for the Humanities, titled: “Steering Large Language Models by Human Preferences”.
These informal, dynamic (and popular!) Friday Lunch discussions are hosted over a catered lunch at Memorial Union to foster academic discussion and provide a space for interdisciplinary dialogue and community building across campus. Seats are limited – registration is required and available on a first-come basis.
Abstract: Large language models (LLMs) trained on massive datasets exhibit remarkable abilities, but at the same time, these models can inadvertently generate misinformation and harmful outputs. This concern underscores the urgent challenge of language model alignment: ensuring these models’ behaviors agree with human preferences and safety considerations. In recent years, a spectrum of alignment strategies have emerged, with prominent methods showcasing the effectiveness of reinforcement learning with human feedback (RLHF). RLHF has gained widespread adoption among state-of-the-art models, including OpenAI’s GPT-4, Anthropic’s Claude, Google’s Bard, and Meta’s Llama 2-Chat. A pivotal component within RLHF is proximal policy optimization, which employs an external reward model that mirrors human preferences for its optimization process. Despite the promise, RLHF suffers from unstable and resource-intensive training. Furthermore, the need to repeat PPO training when altering the reward model hinders rapid customization to evolving datasets and emerging needs. In this Friday Lunch talk, Assistant Professor Sharon Li will discuss alternative paradigms on how we can achieve alignment without RL training.
Bio: Sharon Yixuan Li is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. She received a Ph.D. from Cornell University in 2017, advised by John E. Hopcroft. Subsequently, she was a postdoctoral scholar in the Computer Science department at Stanford University. Her research focuses on the algorithmic and theoretical foundations of learning in open worlds. She is the recipient of the AFOSR Young Investigator Program (YIP) award, the NSF CAREER award, the MIT Technology Review TR-35 Award, Forbes “30 Under 30” in Science, and multiple faculty research awards from Google, Meta, and Amazon. Her works received a NeurIPS Outstanding Paper Award, and an ICLR Outstanding Paper Award Honorable Mention in 2022.
[5] “Three Chairs and a Research Booth: How One PhD Advisor Shaped the SC Committee”
by Melyssa Franklin
October, 2024
“An affable, earnest guy, Dr. Barton Miller, the Vilas Distinguished Achievement Professor in the Computer Sciences Department at the University of Wisconsin-Madison (UW), has made an oversized impact on the SC Conference. Since 2012, three of Miller’s graduate students have served as Conference Chair: Dr. Jeffrey Hollingsworth (SC12), Dr. Dorian Arnold (SC23), and now Dr. Phillip Roth (SC24). In addition, many of his advisees have served on the committee in other roles, and their PhD students are now coming along as SC volunteers as well.
Is there something in the water in Madison? What magic words of wisdom has Dr. Miller imparted to his students to spur them to participate in the SC committee and aspire to leadership roles?
Speaking to Dr. Miller, it’s clear his graduate students are some of his favorite people. They become good friends – almost family. And to him, SC is like a family reunion. ‘SC is the center of the HPC universe,” he enthused. “Everything that’s happening in the field is there. It creates community.’”
Read the rest here: https://sc24.supercomputing.org/2024/10/three-chairs-and-a-research-booth-how-one-phd-advisor-shaped-the-sc-committee/
[6] “Apple Day” @ Wisc
November 13th, 2024
“I wanted to share that we hosted an Apple Day on campus next week Wednesday, November 13th. They will provide an information session and then invite students for a 1:1 if they email their resumes to the email provided on the flyer.
Please encourage all students to attend to learn more. Although software development background is their focus for interns, there are other opportunities also.
This is the first engagement since we secured a partnership with their CoreOS team and have lots of opportunity to grow this collaboration. Thanks for sharing and let me know if you have any questions.” -Justin Hines
[7] ‘Trial by fire’: UW-Madison hackathon generates tech ideas in 24 hours
Becky Jacobs, Nov 12 2024
Cap Times
“After 24 hours of programming and coding — fueled by plenty of snacks and energy drinks — the organizers of an annual hackathon event at the University of Wisconsin-Madison announced which of the 55 projects would move on as finalists.”
Read the rest:
MadHacks:
https://www.madhacks.io/
[8] “Contracting with a Learning Agent”
Guru Guruganesh · Yoav Kolumbus · Jon Schneider · Inbal Talgam-Cohen · Emmanouil-Vasileios Vlatakis-Gkaragkounis · Joshua Wang · S. Weinberg
Abstract: Real-life contractual relations typically involve repeated interactions between the principal and agent, where, despite theoretical appeal, players rarely use complex dynamic strategies and instead manage uncertainty through learning algorithms.In this paper, we initiate the study of repeated contracts with learning agents, focusing on those achieving no-regret outcomes. For the canonical setting where the agent’s actions result in success or failure, we present a simple, optimal solution for the principal: Initially provide a linear contract, then switch to a zero-scalar contract. This shift causes the agent to “free-fall” through their action space, yielding non-zero rewards for the principal at zero cost. Interestingly, despite the apparent exploitation, there are instances where our dynamic contract can make \emph{both} players better off compared to the best static contract. We then broaden the scope of our results to general linearly-scaled contracts, and, finally, to the best of our knowledge, we provide the first analysis of optimization against learning agents with uncertainty about the time horizon.
Paper: https://arxiv.org/abs/2401.16198
[9] PRIME grant
PI: Ming LiuAbstract: EFabric provides on-demand supercomputing solutions to empower AI innovation and deployment. We offer agile, flexible, and low-cost access to customized supercomputing clusters for startups, research labs, grad students, and enterprises. EFabric is built atop emerging load-store interconnects, instead of traditional Ethernet/Infiniband. It organizes computation resources into different pools and independently scales them based on users’ needs. You can just focus on building AI applications, tuning AI models, and exploring AI deployments without worrying about the underlying computing infrastructure.
https://d2p.wisc.edu/funding/prime/
[10] “Adversarial Machine Learning: A 10-year Perspective”
Patrick McDaniel
Distinguished Lecture, UIUC
Abstract: Machine learning is revolutionizing technology and society. However, like any technology, it can be misused by adversaries and our understanding of the potential threats remains limited. If machine learning is to part of any critical decision making process, we must be able to detect and mitigate threats at both training and inference phases. Over the past decade, I have studied a myriad of threats to machine learning at all stages of its lifecycle and proposed a variety of methods that bring us closer towards trustworthy machine learning systems. In this talk, I will discuss the arc of adversarial machine learning, how threat models have evolved, and how the community has gained new insights into the root causes of the vulnerabilities present in machine learning systems today.
Bio: Patrick McDaniel is the Tsun-Ming Shih Professor of Computer Sciences in the School of Computer, Data & Information Sciences at the University of Wisconsin-Madison. Professor McDaniel is a Fellow of IEEE, ACM and AAAS, a recipient of the SIGOPS Hall of Fame Award and SIGSAC Outstanding Innovation Award, and the director of the NSF Frontier Center for Trustworthy Machine Learning. He also served as the program manager and lead scientist for the Army Research Laboratory’s Cyber-Security Collaborative Research Alliance from 2013 to 2018. Patrick’s research focuses on a wide range of topics in computer and network security and technical public policy. Prior to joining Wisconsin in 2022, he was the William L. Weiss Professor of Information and Communications Technology and Director of the Institute for Networking and Security Research at Pennsylvania State University.
https://calendars.illinois.edu/detail/1570?eventId=33502561
[11] Reducing the GAP: Improving the Fidelity and Scalability of gem5’s GPU Models
Matt Sinclair
Invited Talk at CAMS ‘24
The breakdown in Moore’s Law and Dennard Scaling is leading to drastic changes in the makeup and constitution of computing systems. For example, a single chip integrates 10-100s of cores and has a heterogeneous mix of general-purpose compute engines and highly specialized accelerators. Traditionally, computer architects have relied on tools like architectural simulators to accurately perform early-stage prototyping and optimizations for the proposed research. However, as systems become increasingly complex and heterogeneous, architectural tools are straining to keep up. In particular, publicly available architectural simulators are often not very representative of the industry parts they intend to represent. This leads to a mismatch in expectations; when prototyping new optimizations researchers may draw the wrong conclusions about the efficacy of proposed optimizations if the tool’s models do not provide high fidelity. Moreover, modeling and simulation tools are also struggling to keep pace with increasingly large, complex workloads from domains such as machine learning (ML).
In this talk, I will discuss our work on improving the open source, publicly available GPU models in the widely used gem5 simulator. gem5 can run entire systems, including CPUs, GPUs, and accelerators, as well as the operating system, runtime, network, and other related components. Thus, gem5 has the potential to allow users to study the behavior of the entire heterogeneous systems. Unfortunately, some of gem5’s publicly available models do not always provide high accuracy relative to their ”real” counterparts, especially for the memory subsystem. I will discuss my group’s efforts to overcome these challenges and improve the fidelity of gem5’s GPU models, as well as our ongoing efforts to scalably run modern ML and HPC workloads in frameworks such as PyTorch and TensorFlow in gem5. Collectively, this work significantly enhances the state-of-the-art and enables more widespread adoption of gem5 as an accurate platform for heterogeneous architecture research.
[12] Two papers at Asiacrypt ‘24
https://asiacrypt.iacr.org/2024/
“Non-interactive Blind Signatures: Post-quantum and Stronger Security”
Foteini Baldimtsi, Jiaqi Cheng, Rishab Goyal, Aayush Yadav
“Leakage-Resilient Incompressible Cryptography: Constructions and Barriers”
Kaartik Bhushan, Rishab Goyal, Venkata Koppula, Varun Narayanan, Manoj Prabhakaran, Mahesh Sreekumar Rajasree
October 2024
[1] The PODS ‘24 program: https://2024.sigmod.org/program_pods.shtml
Award papers include:
- Conjunctive Queries with Negation and Aggregation: A Linear Time Characterization (Hangdong Zhao, Austen Z. Fan, Xiating Ouyang, Paraschos Koutris)
- Tight Bounds of Circuits for Sum-Product Queries (Austen Z. Fan, Paraschos Koutris, Hangdong Zhao) – invited to JACM.
- Topology-aware Parallel Joins (Xiao Hu, Paraschos Koutris)
[2] The Belonging & Inclusion Best Paper Award at UIST 2024
https://uist.acm.org/2024/
Title: CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision
Authors: Jaewook Lee, Andrew D Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, and Yuhang Zhao
Abstract: Cooking is a central activity of daily living, supporting independence as well as mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV), we present CookAR, a head-mounted AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools. To design and implement CookAR, we collected and annotated the first egocentric dataset of kitchen tool affordances, fine-tuned an affordance segmentation model, and developed an AR system with a stereo camera to generate visual augmentations. To validate CookAR, we conducted a technical evaluation of our fine-tuned model as well as a qualitative lab study with 10 LV participants for suitable augmentation design. Our technical evaluation demonstrates that our model outperforms the baseline on our tool affordance dataset, while our user study indicates a preference for affordance augmentations over the traditional whole object augmentations.
[3] September was a busy month of talks/presentations in India and the Nederlands for Prof Miron Livny, including:
- “OSG and Future of Distributed Computing in the Age of Cloud” at the 8th Asia Tier Centre Forum (ACTF8) in Mumbai, India
- “Job Lists in HTCondor {Y[i]=F(X[i]),i=1,n}” at the 2nd Asia HTCondor Workshop in Mumbai, India
- “CHTC – Enabling Throughput Computing at All Scales” IIT Bombay, India
- “CHTC – Advancing Throughput Computing for the Physical Sciences” Tata Institute of Fundamental Research (TIFR) Mumbai, India
- “Introduction to HTCondor and OSG” tutorial on IGWN-DHTC-OSG at the Inter University Centre for Astronomy and Astrophysics (IUCAA) in Pune, India
- “Placing a List of Jobs (Work in Progress)” Utrecht Nikhef HTCondor Tutorial” University of Utrecht, Nederland
- “Philosophy and Architecture” Utrecht Nikhef HTCondor Tutorial University of Utrecht, Nederland
- “Placing a List of Jobs (Work in Progress)” Nikhef HTCondor Tutorial – Amsterdam , National Institute for Subatomic Physics, (Nikhef), Amsterdam, Nederland
- “Philosophy and Architecture” Nikhef HTCondor Tutorial – Amsterdam , National Institute for Subatomic Physics, (Nikhef), Amsterdam, Nederland
- “What the manual does not tell you – Philosophy and Architecture” 10th European HTCondor workshop, Amsterdam, Nederland
- “What the manual does not tell you – History” 10th European HTCondor workshop, Amsterdam, Nederland
- “CHTC Vision: Compute and Data Together” 10th European HTCondor workshop, Amsterdam, Nederland
CHTC did not organize these events; all were organized by the locals with CHTC support. In Europe there were more than 60 attendees.
[4] Prof Cai’s papers include
- ”Dichotomy for Holant* Problems with One Ternary Function on Domain Size 3” Information & Computation (with Pinyan Lu and Mingji Xia)
- “Shor’s Algorithm Does Not Factor Large Integers in the Presence of Noise”
SCIENCE CHINA
ACM Transactions on Computation Theory (with Ben Young)
[5] https://www.gem5.org/events/isca-2024
Prof Matt Sinclair (and student) hosted a tutorial on using gem5 at ISCA 2024 this past week.
Notably, this tutorial (and 2 talks we gave at a concurrent workshop) were the first time his group discussed publicly how they’ve gotten PyTorch and TensorFlow to run at scale in gem5.
[6] Prof Fredrikson (Prof Jha’s student) receives tenure at CMU
Info: https://x.com/jhasomesh/status/1838708177476493639
Prof Fredrikson’s home page: https://mattfredrikson.com/
His research aims to enable systems that make secure, fair, and reliable use of machine learning. His group focuses on finding ways to understand the unique risks and vulnerabilities that arise from learned components, and on developing methods to mitigate them, often with provable guarantees.
[7] We are excited to announce this year’s Wisconsin Database Affiliates Workshop! For detailed information and the official schedule, please visit the workshop’s website. You may also RSVP here.
The workshop took place on October 10 (Thursday), 8:30am–4:00pm, in Union South Room Northwoods (3rd floor). This year’s event included:
– Research talks from faculty and graduate students
– Industry talks from affiliate companies
– A poster and demo session
– Free breakfast and lunch
[8] Jin-Yi Cai chaired the award committee for the 2024 FOCS Test of Time Awards committee. See https://focs.computer.org/2024/awards/ for details.
The members of the committee are
Julia Chuzoy (Toyota Technological Institute at Chicago),
Costis Daskalakis (MIT),
Faith Ellen (University of Toronto),
Shafi Goldwasser (UC Berkeley), and
Leonard Schulman (Caltech).
[9] As organized and related to us by Justin Hines: On Oct 1st, the night before the career fair, we hosted our inaugural Student Org Reception to more casually introduce our student leaders to Employers. We had six employers and over 50 students attend. We invited every active student org across CDIS. We will continue to innovate for the students as they continue to participate in our offerings.
[10] https://www.nsf.gov/awardsearch/showAward?AWD_ID=2431419
“CloudLab Phase-IV: Strengthening a Core Resource for Research and Education”
Prof Ming Liu and Prof Shiv Venkataraman
$12m
Cloud technologies underpin many of the apps and computing services that we use every day, including social networking, online commerce, email, video conferencing, and many more. To be able to do research that drives forward the fundamental architecture of the cloud, scientists need an environment in which they can experiment with the primary building blocks of the cloud: compute power, storage, and networks. CloudLab provides an environment where researchers can build their own clouds, observing and changing how they operate all the way to the bottom layers of software.
CloudLab is a large facility, consisting, as of the start of this award, of more than 1,800 servers hosted at the University of Utah, Clemson University, and the University of Wisconsin-Madison. These servers include cutting-edge compute (such as several different processor architectures, graphics processing units, and field programmable gate arrays), networking (OpenFlow, user-programmable switching), and storage (hard drives, solid state drives) technologies. This phase IV award will support the expansion of this facility by several hundred servers to meet high demand. It will also add new technologies, such as Internet of Things devices, programmable network cards, and non-volatile random access memory, which will in turn support research on data-intensive computing and computing at the edge of the network.
CloudLab is available to researchers who work on the fundamentals of cloud computing. These users do research in areas such as networking, security, and databases; in turn, this work has broad impact, as these are the fundamental technologies upon which we build things such as smart cities, telehealth, online education, and other socially important computer services. Because it is offered at no cost for research and education to academic users, it acts as a level playing field on which institutions large and small, and those with many resources and those with few, can conduct work on an equal footing.
CloudLab can be found online at https://cloudlab.us/, which is the primary interface through which research users interact with the facility, making it accessible from anywhere. This award will support continued operation of the facility, including the aforementioned hardware expansion, development of new features, and user support.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
[11] “Learning to Price Homogeneous Data”
Keran Chen, Joon Suk Huh, Kirthevasan Kandasamy
Accepted at NeurIPS
Abstract: We study a data pricing problem, where a seller has access to N homogeneous data points (e.g. drawn i.i.d. from some distribution). There are m types of buyers in the market, where buyers of the same type i have the same valuation curve vi : [N] → [0, 1], where vi(n) is the value for having n data points. A priori, the seller is unaware of the distribution of buyers, but can repeat the market for T rounds so as to learn the revenue-optimal pricing curve p : [N] → [0, 1]. To solve this online learning problem, we first develop novel discretization schemes to approximate any pricing curve. When compared to prior work, the size of our discretization schemes scales gracefully with the approximation parameter, which translates to better regret in online learning. Under assumptions like smoothness and diminishing returns which are satisfied by data, the discretization size can be reduced further. We then turn to the online learning problem, both in the stochastic and adversarial settings. On each round, the seller chooses an anonymous pricing curve pt. A new buyer appears and may choose to purchase some amount of data. She then reveals her type only if she makes a purchase. Our online algorithms build on classical algorithms such as UCB and FTPL, but require novel ideas to account for the asymmetric nature of this feedback and to deal with the vastness of the space of pricing curves. Using the improved discretization schemes previously developed, we are able to achieve O(m√ T ) regret in the stochastic setting and O(m^{3/2}√ T ) regret in the adversarial setting.
[12] “Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs”
Matthew Zurek, Yudong Chen
NeurIPS ‘24.
We study the sample complexity of learning an ε-optimal policy in an average-reward Markov decision process (MDP) under a generative model. For weakly communicating MDPs, we establish the complexity bound O˜(SAHε2), where H is the span of the bias function of the optimal policy and SA is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters S,A,H, and ε, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. We also initiate the study of sample complexity in general (multichain) average-reward MDPs. We argue a new transient time parameter B is necessary, establish an O˜(SAB+Hε2) complexity bound, and prove a matching (up to log factors) minimax lower bound. Both results are based on reducing the average-reward MDP to a discounted MDP, which requires new ideas in the general setting. To optimally analyze this reduction, we develop improved bounds for γ-discounted MDPs, showing that O˜(SAH(1−γ)2ε2) and O˜(SAB+H(1−γ)2ε2) samples suffice to learn ε-optimal policies in weakly communicating and in general MDPs, respectively. Both these results circumvent the well-known minimax lower bound of Ω˜(SA1(1−γ)3ε2) for γ-discounted MDPs, and establish a quadratic rather than cubic horizon dependence for a fixed MDP instance.
[13] Received an NIH (Smart and Connected Health R01) award to fund our research on designing and developing AI-powered AR systems to support low vision people in complex activities of daily living.
Title: SCH: AI-Assisted Vision: Scene-Aware Augmented Reality Systems to Support Low Vision People in Activities of Daily Living
PIs: Yuhang Zhao (PI), Yapeng Tian (co-PI) from UT-Dallas, Jon Froehlich (co-PI) from UWashington, Sanbrita Mondal (co-I) from UW-Madison
Abstract: Uncorrectable vision loss, a.k.a. low vision, can significantly impact activities of daily living (ADLs), affecting independence, health, and social well-being. Magnifiers are the most commonly used low vision aids, however, they largely reduce visual field. Augmented reality (AR) glasses present a unique opportunity for low vision technology by intelligently recognizing the environment and enhancing users’ vision based on their contexts. However, existing AR solutions mainly rely on image processing techniques, generating pure pixel-level augmentations to the whole scene, such as pixel remapping for visual field loss, and recoloring based on distance. These solutions arbitrarily alter the user’s full visual field with no semantic understanding of the scene, distorting users’ natural vision and diminishing important visual details. As a result, they cannot effectively support complex ADLs that involve constant motion and object interactions, such as cooking in the kitchen, or safely navigating a crowded street. The overall objective of this proposal is to fundamentally advance low vision technology by leveraging state-of-the-art Al techniques and presenting unobtrusive scene-aware AR augmentations that are tailored to users’ environments and visual abilities. Contextualized in two key ADLs (meal preparation, outdoor mobility), the research will investigate how to recognize and augment visual elements in AR from three fundamental dimensions via three specific aims: (A 1) Enhancing perception of visual salience, (A2) Enhancing object affordance for safe and efficient interactions, (A3) Enhancing awareness of environmental dynamics. In each aim, the research will contribute to both HCI and Al by designing scene-aware AR augmentations with low vision users and rehabilitation professionals, and developing egocentric video datasets and Al models to enable fast and robust scene recognition in AR. In the final year, a holistic AR system will be developed and evaluated for real world impact and limitations via field studies (A4). With the Al-powered AR systems for low vision, the project will contribute to the mission of NEI by fundamentally transforming conventional low vision aids and improving independence and quality of life for low vision people in various challenging ADLs. The involvement of low vision participants and rehabilitation professionals throughout the research will also increase their exposure to state-of-the-art technology, potentially increasing the technology acceptance and adoption in low vision rehabilitation.
[14] Presented at ECCV ‘24
“Photon Inhibition for Energy-Efficient Single-Photon Imaging” (link to details)
Lucas J. Koerner, Shantanu Gupta, Atul Ingle, Mohit Gupta
Single-photon cameras (SPCs) are emerging as sensors of choice for various challenging imaging applications. One class of SPCs based on the single-photon avalanche diode (SPAD) detects individual photons using an avalanche process; the raw photon data can then be processed to extract scene information under extremely low light, high dynamic range, and rapid motion. Yet, single-photon sensitivity in SPADs comes at a cost — each photon detection consumes more energy than that of a CMOS camera. This avalanche power significantly limits sensor resolution and could restrict widespread adoption of SPAD-based SPCs. We propose a computational-imaging approach called photon inhibition to address this challenge. Photon inhibition strategically allocates detections in space and time based on downstream inference task goals and resource constraints. We develop lightweight, on-sensor computational inhibition policies that use past photon data to disable SPAD pixels in real-time, to select the most informative future photons. As case studies, we design policies tailored for image reconstruction and edge detection, and demonstrate, both via simulations and real SPC captured data, considerable reduction in photon detections (over 90% of photons) while maintaining task performance metrics. Our work raises the question of “which photons should be detected?”, and paves the way for future energy-efficient single-photon imaging.
“Light-in-Flight for a World-in-Motion” (link to details)
Jongho Lee, Ryan J. Suess, Mohit Gupta
Although time-of-flight (ToF) cameras are becoming the sensor-of-choice for numerous 3D imaging applications in robotics, augmented reality (AR) and human-computer interfaces (HCI), they do not explicitly consider scene or camera motion. Consequently, current ToF cameras do not provide 3D motion information, and the estimated depth and intensity often suffer from significant motion artifacts in dynamic scenes. In this paper, we propose a novel ToF imaging method for dynamic scenes, with the goal of simultaneously estimating 3D geometry, intensity, and 3D motion using a single indirect ToF (I-ToF) camera. Our key observation is that we can estimate 3D motion, as well as motion artifact-free depth and intensity by designing optical-flow-like algorithms that operate on coded correlation images captured by an I-ToF camera. Through the integration of a multi-frequency I-ToF approach with burst imaging, we demonstrate high-quality all-in-one (3D geometry, intensity, 3D motion) imaging even in challenging low signal-to-noise ratio scenarios. We show the effectiveness of our approach through thorough simulations and real experiments conducted across a wide range of motion and imaging scenarios, including indoor and outdoor dynamic scenes.
[15] Prof Matt Sinclair gave an invited talk at the DOE Energy-Efficient Computing for Science Workshop (https://web.cvent.com/event/a3dd901a-699e-408c-8a84-81445e6ea64f/summary).
Title: Creating Flexible, High Fidelity Energy Modeling for Future HPC Systems
Authors: Matthew D. Sinclair, Bobby Bruce, William Godoy, Oscar Hernandez, Jason Lowe-Power, and Shivaram Venkataraman
Abstract: Energy efficient, large scale, co-designed systems will be necessary in the future. As we increasingly adopt AI, workloads are increasingly bounded by new scaling laws, where the loss scales as a power-law with model size, dataset size, and the amount of compute used for training. Thus, to improve accuracy, scaling requires exponentially increasing resources. This shift means that AI growth requires more energy at a faster pace in a manner distinct from prior patterns, making energy efficiency even more critical. Accordingly, future HPC system designs need to optimize increasingly heterogeneous hardware and applications for energy consumption, without significantly compromising performance. To drive these efforts, there is a need for credible, open-source infrastructure that can study the energy effects of new systems early in the design process, before silicon is produced. Such infrastructure will help study concrete changes to the existing state-of-the-art and evaluate the potential delta of more radical changes. Traditionally, developers rely on simulation and modeling techniques to model and prototype energy consumption. Unfortunately, the tools needed to co-design systems for performance and energy are lacking for both existing and future systems. In this talk, we discuss our vision to design new energy models which can improve accuracy for post-tape out hardware, first principle methods for estimating energy use in future systems by leveraging open-source hardware, and finally integration of energy models into tools such as simulators and development environments.
Link: https://drive.google.com/file/d/1aVnUe9hOqNw8DmWRP9aCbf6P9CTb_9B4/view
[16] CDIS transition announced: https://cdis.wisc.edu/erickson-steps-down-arpaci-dusseau-to-lead-school-of-computer-data-information-sciences/
Tom Erickson, founding director of the School of Computer, Data & Information Sciences (CDIS) and executive associate dean for strategy and innovation at the University of Wisconsin–Madison’s College of Letters & Science, has announced he will step down from his leadership roles effective January 2025. Remzi Arpaci-Dusseau, the Grace Wahba and Vilas Distinguished Achievement Professor of Computer Sciences, will assume leadership of CDIS.
September 2024
[1] On behalf of the USENIX Security Test of Time Award Committee, I’m delighted to inform you that your paper, “Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing,” originally presented at USENIX Security ’14, has won the 2024 USENIX Security Test of Time Award. The award recognizes outstanding work in computing systems security and privacy research that has had a lasting impact on the community. Congratulations!
https://www.usenix.org/conferences/test-of-time-awards
[2] “Uncertainty and my scientific journey”
Aws Albarghouthi
Programming Languages Mentoring Workshop @ PLDI
https://www.youtube.com/watch?v=kYZiPZ3EwRc
Abstract: I will walk you through my academic journey from graduate school to a tenured faculty position. Reflecting upon the past 15 years, I will highlight the recurring theme of uncertainty, and how it arose in unexpected ways along the way. I will talk about how uncertainty can be a significant source of mental stress and how I’ve tried to understand it and navigate it in different ways.
[3] “UW–Madison Awarded National Center of Academic Excellence in Cyber Research (CAE-R) Designation”
The University of Wisconsin–Madison has been designated as a National Center of Academic Excellence in Cyber Research (CAE-R) by the National Security Agency (NSA). This prestigious designation underscores UW–Madison’s commitment to advancing cybersecurity education and research and recognizes its position as a leader in the field.
“This designation recognizes that our faculty, staff, and students are on the forefront of Cyber Research, and it is amongst the most important research that we do,” says Eric Wilcots, Dean of UW–Madison’s College of Letters & Science. “As a Center of Academic Excellence, we will have the opportunity to address hard questions and deepen our impact.”
[4] The CS department plans to make the following awards in the inaugural year:
- Rakesh Agarwal — research leadership and excellence – https://en.wikipedia.org/wiki/Rakesh_Agrawal_(computer_scientist)
- Eric Harslem — tech leadership + philanthropy in computing (Austin area) – https://uteach.utexas.edu/profiles/eric-harslem
- Brian Pinkerton — entrepreneurship – https://www.cs.wisc.edu/staff/pinkerton-brian/
- Brent Seales — applied research with community impact – https://www.engr.uky.edu/directory/seales-brent
[5] https://www.nsf.gov/awardsearch/showAward?AWD_ID=2404180
Prof Yong Jae Lee
RI: Small: Understanding and Advancing the Generalization Capabilities of Fake Image Detectors
Abstract: This project develops an integrated research, education, and outreach program to provide a foundation for understanding and advancing the generalization capabilities of fake (AI-generated) image detectors. With the rise and maturity of artificial intelligence (AI) generative models, fake images are proliferating in the digital world at an unprecedented rate. Fake images can cause harm in a variety of ways by deceiving, manipulating, and misinforming individuals and society at large. For example, they can be used for blackmail, disinformation, or financial fraud, etc. To make matters worse, there is no longer a single source of fake images. Synthesized images could take the form of realistic human faces generated using one type of fake image generator or they could take the form of complex scenes generated using another type of fake image generator. One can be almost certain that there will be more methods developed for generating fake images coming in the future. To combat the potential harm caused by fake images, this project aims to advance technology in image forensics, and build a deep understanding as to how AI-generated images differ from real images. In addition to scientific impact, this project performs complementary educational and outreach activities that engage students in research and STEM.
This research provides a foundation for gaining a deeper understanding of, and for advancing fundamental research in, detecting AI-generated images. In particular, it will focus on the generalization properties of fake image detectors. Specifically, it will investigate two main thrusts: (Thrust I) Understanding what makes fake AI-generated images fake, including the role of generative models vs. data, a novel benchmark and toolbox for universal fake image detection, and understanding the features that a fake detector focuses on. In Thrust II, the project will advance universal fake AI-generated image detection, including a novel frequency masking strategy, a few-shot adaptation approach for learning with only a few training examples, and extensions to video to account for spatio-temporal aspects of realism. Both thrusts will be studied in the context of creating generalizable (universal) fake image detection algorithms. The investigators’ wealth of experience in generative models and understanding model robustness, as well as initial work in this space, make them well-positioned to formulate and solve the relevant challenges, and lays the groundwork for the project.
[6] Invited Talk at ModSim (Workshop on Modeling & Simulation of Systems and Applications)
Title: Designing Better Tools for Power- and Sustainability-Aware Co-Design
Prof Matt Sinclair
Abstract: With the waning of Moore’s Law and the end of Dennard’s Scaling, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly turning towards heterogeneity to continue scaling performance and energy efficiency. Thus, modeling increasingly large, complex, diverse accelerators is critical to continued computer architecture and systems innovation. The rise in specialized, compute-heavy workloads like machine learning has given enhanced prominence to hardware accelerators. Thus, the next decades of hardware and system software research must balance efficiency through specialization and flexibility through generalization. To drive those innovations, the community requires credible, open-source infrastructure to study concrete changes to the existing state-of-the-art and evaluate the potential delta of more radical changes. Traditionally, developers rely on architecture simulators to model and prototype these systems. Unfortunately, modern computer architecture simulation and modeling techniques for power and carbon are either too inaccurate, too inflexible (e.g., difficult to extend), or too high level to enable co-design for these increasingly complex, increasingly heterogeneous systems. For sustainable computing to thrive, we believe that simulation and modeling tools can — and must — play a major role, and I describe our efforts in this direction.
Invited tutorial on gem5 at ORNL
Title: Learning How to Use gem5 to Study Large-Scale Workloads
Prof Matt Sinclair
Abstract: The community-developed gem5 infrastructure is cycle-level computer architecture simulator tool. It models a wide set of diverse components including processors of various ISAs, caches, main memory, and network with a high level of microarchitecture fidelity. During DOE’s Exascale FastForward2 program, AMD released support for modeling GPUs executing HPC applications built for the GCN3 ISA. This work laid the foundation for development and prototyping of full-stack GPU optimizations for existing exascale-class HPC systems such as ORNL’s Frontier supercomputer. This tutorial workshop will introduce the gem5 simulator, how to use it, and how to use a variety of its important features for prototyping full stack optimizations for large-scale workloads, including the latest AMD MI200 and MI300 ISAs.
[7] Prof Bilge Mutlu is part of a new NSF Engineering Research Center around advancing the state of the art in “dexterous robotics,” led by Northwestern University. The center will receive approximately $52M in funding over 10 years, and UW’s work will focus on programming interfaces for dexterous robots.
News article: https://www.cs.wisc.edu/2024/08/21/human-augmentation-via-dexterity-erc-receives-nsf-funding
[9] https://news.wisc.edu/gilman-walker-and-arpaci-dusseau-to-assume-new-leadership-roles/
From the article: …Isbell has appointed Remzi Arpaci-Dusseau, the Grace Wahba and Vilas Distinguished Achievement Professor of Computer Sciences, to a new role as special advisor to the provost for computing. “This role will advance our institutional planning to ensure that we remain on the forefront in the rapidly advancing research and education areas associated with computing writ large, including artificial intelligence, data and information sciences in the coming decades,” says Isbell.
[10] New faculty in the CS department this fall:
- Ali Abedi https://people.eecs.berkeley.edu/~abedi/
- Sandeep Silwal https://sandeepsilwal.com/
- Manolis Vlatakis https://www.cs.columbia.edu/~emvlatakis/
- Tengyang Xie https://tengyangxie.github.io/
August 2024
[1] https://www.siam.org/publications/siam-news/articles/2024-august-prize-spotlight/#Wright
Dr. Stephen Wright, University of Wisconsin – Madison, is the 2024 recipient of the George B. Dantzig Prize. He received the prize for his fundamental contributions to nonlinear optimization and algorithms for control, compressed sensing, machine learning, and data science. He pioneered infeasible interior point methods which culminated in his 1997 SIAM monograph on the subject. Moreover, Dr. Wright contributed highly cited, outstanding, and very influential work in a broad range of fields in mathematical optimization, including algorithms for control, nonsmooth optimization with applications to compressed sensing, machine learning, and data science. His comprehensive contributions range from theory, algorithm design and analysis, to applications and the development of high-impact software.
Read more at the URL above.
[2] “Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture”,
Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore
https://ieeexplore.ieee.org/document/1207019
Original Publication: ISCA 2003.
Award Details: The ACM SIGARCH/IEEE-CS TCCA Influential ISCA Paper Award is presented annually at the International Symposium on Computer Architecture. It honors the paper from the ISCA Proceedings published 20 years prior that has had the greatest impact on the field, whether through research, development, products, or ideas. The award includes a $1,000 honorarium for the authors and a certificate.
https://www.sigarch.org/benefit/awards/acm-sigarchieee-cs-tcca-influential-isca-paper-award/
Abstract: We describe the polymorphous TRIPS architecture, which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and large-grain concurrency, the TRIPS architecture contains four out-of-order, 16-wide-issue grid processor cores, which can be partitioned when easily extractable fine-grained parallelism exists. This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism. Our results show that high performance can be obtained in each of the three modes-ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.
[3] “Active Adaptation for Decentralized Foundation Models”
Prof Fred Sala
DARPA Young Faculty Award
Abstract: Foundation models have revolutionized the way that machine learning and data science tasks are performed. While exciting, these models require large amounts of data, both for creating them and for specializing them to satisfy user needs. This makes these models unsuitable for data-constrained scenarios. Model training and adaptation also impose heavy computational burdens, blocking the use of foundation models in challenging decentralized environments. This award studies how to create new forms of foundation models that overcome these obstacles. It aims to build these models with less data by focusing on data quality rather than quantity, to train them more efficiently, and to enable their use to power decentralized agents that face communication and other constraints.
More about the DARPA Young Faculty Award here:
https://www.darpa.mil/work-with-us/for-universities/young-faculty-award
The objective of the DARPA Young Faculty Award (YFA) program is to identify and engage researchers in junior faculty positions at U.S. academic and non-profit research institutions and expose them to Department of Defense (DoD) needs and DARPA’s program development process.
[4] This week NSF awarded supplements to two awards – “Pelican: Advancing the Open Science Data Federation Platform” (Bockelman PI and Livny co-PI) and “Partnership to Advance Throughput Computing (PATh)” (Livny PI, Bockelman and Tannenbaum co-PIs) – for a total of ~$1.2M. One of these supplements involves Tony Gitter and is in the context of the National AI Research Resource (NAIRR).
[5] “Anvil: Verifying Liveness of Cluster Management Controllers”
Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, and Zicheng Ma, University of Illinois Urbana-Champaign; Tej Chajed, University of Wisconsin-Madison; Jon Howell, Andrea Lattuada, and Oded Padon, VMware Research; Lalith Suresh, Feldera; Adriana Szekeres, VMware Research; Tianyin Xu, University of Illinois Urbana-Champaign
https://www.usenix.org/conference/osdi24/presentation/sun-xudong
Awarded Best Paper!
Abstract: Modern clouds depend crucially on an extensible ecosystem of thousands of controllers, each managing critical systems (e.g., a ZooKeeper cluster). A controller continuously reconciles the current state of the system to a desired state according to a declarative description. However, controllers have bugs that make them never achieve the desired state, due to concurrency, asynchrony, and failures; there are cases where after an inopportune failure, a controller can make no further progress. Formal verification is promising for avoiding bugs in distributed systems, but most work so far focused on safety, whereas reconciliation is fundamentally not a safety property.
This paper develops the first tool to apply formal verification to the problem of controller correctness, with a general specification we call eventually stable reconciliation, written as a concise temporal logic liveness property. We present Anvil, a framework for developing controller implementations in Rust and verifying that the controllers correctly implement eventually stable reconciliation. We use Anvil to verify three Kubernetes controllers for managing ZooKeeper, RabbitMQ, and FluentBit, which can readily be deployed in Kubernetes platforms and are comparable in terms of features and performance to widely used unverified controllers.
[6] https://caecommunity.org/about-us/what-cae-cybersecurity
PhD students are invited to this event:
This past week some UW-Madison students and Prof Hanna participated in the international RoboCup competition in Eindhoven, Netherlands as part of a new joint team between UW-Madison and UT Austin. They participated in the standard platform league in which all teams use the same robot platform to emphasize AI research. As a new team, they chose to participate in the “Challenge Shield” (lower) division of the competition — and they won!
The UW-Madison students who traveled were PhD students Adam Labiosa and Will Cong and junior student Chen Li. Several other UW-Madison undergraduates and another PhD student contributed beforehand, in addition to some graduate students from UT Austin
The final match is here with the UW team in red and white: https://www.youtube.com/watch?v=JDZ6wihyELQ#t=2h14m14s
(a convincing 7-0 victory)
The most exciting games is linked here (robots wore UT colors for this one):
https://www.youtube.com/live/rUyYoW4jcfc#t=6h13m10s
(a come-from-behind victory in the 2nd half, with a little bit of luck involved)
This is actually the second time Prof Hanna participated in RoboCup with Wisconsin students. Last year, the team finished third in the same division with the aim to largely base our decision-making on reinforcement learning. UT’s team was interested in adopting the same approach which led to the joint effort this year.
[8] Title: CAREER: Rethinking Rack-Scale System Design over Memory Fabrics
Prof Ming Liu
Abstract: Memory fabrics, an emerging and powerful cluster interconnect technology, are revolutionizing the construction of the modern data center. Albeit promising, this infrastructure shift fundamentally changes decade-long system design assumptions and challenges conventional wisdom on how to build efficient rack-scale systems. This project aims to develop a new computing paradigm that views the memory fabric as a first-class citizen on which to instantiate, orchestrate, and reclaim computations. The project’s novelties are that a memory fabric-aware intermediate system stack allows applications to effectively harness the capabilities of memory fabrics and hide their limitations. The project’s broader significance and importance are laying out the system foundation for building next-generation sustainable and cost-efficient computing infrastructures for enterprise on-premise and cloud-scale data center systems using memory fabrics. This project introduces the technology to a broader audience via the LegoCluster program and BigComputer workshop education activities.
This project rethinks the rack-scale system design from memory, computing, and communication perspectives. The first thrust classifies remote memory nodes into four categories and develops a benchmarking framework to characterize their performance. It builds an active remote memory system that transparently and dynamically moves a memory object to a suitable memory node at runtime based on the object’s access profile and the underlying node’s capabilities. The second thrust applies idempotent tasks as the fault-tolerant computation abstraction and develops a language system to generate them. It designs hardware cooperative engines and an execution framework on which to schedule idempotent tasks. The third thrust introduces new communication primitives, translates them into the memory fabric’s commands, and builds a transport layer for end-to-end traffic control. Together, these efforts produce a new open-source software system over memory fabrics encompassing APIs, libraries, programming systems, and software runtimes.
Link: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2339755&HistoricalAwards=false
[9] https://chtc.cs.wisc.edu/htc-24-event.html
CHTC and the OSG Consortium hosted its second annual Throughput Computing Week in Madison, Wisconsin joined in person and remotely by 388 participants. This year’s themes included dedicated sessions on campus cyberinfrastructure, talks on AI and machine learning enabled by high throughput computing, and tutorials and presentations on the new Pelican Platform project.
[10]
Navigating the labyrinthine corridors of academic publishing, authors often encounter three formidable guardians, each with their unique, sometimes perplexing, approach to gatekeeping. These are the Confused Reviewer, the Rational Reviewer, and the Lazy Reviewer. Their presence not only adds a layer of complexity to the publication process but also infuses it with a certain tragicomic flair.
Read more at the link above.
[11] https://awards.advising.wisc.edu/all-scholarships/hilldale-undergraduatefaculty-research-fellowship/
The Hilldale Undergraduate/Faculty Research Fellowship provides research training and support to undergraduates. Students have the opportunity to undertake their own research project in collaboration with UW–Madison faculty or research/instructional academic staff. Please Note: Graduate students are ineligible to serve as the project advisor. Approximately 97 – 100 Hilldale awards are available each year.
From CS:
- Professor Remzi Arpaci-Dusseau (students Yuanzhuo Yang, Hongtao Zhang)
- Professor Jelena Diakonikolas (student Yi Wei)
- Professor Shivaram Venkataraman (student Mengze Tang)
[12] “Simulating and Modeling Hardware for Machine Learning Workloads at Scale”
Prof Matt Sinclair
SRC AIHW PI meeting
Abstract: As both the size of Machine Learning (ML) and Artificial Intelligence (AI) hardware and the data they operate on scale-up, there is a growing challenge in performing early-system exploration based on sound simulation methodology. Using existing tools and infrastructure, a detailed simulation of a contemporary deep neural network can take years. In addition, the difficult task of modeling energy at scale, without working RTL for a newly proposed accelerator, or accelerator component (i.e., Tensor Cores), remains an important research challenge. In particular modeling energy in a way that is flexible and can be extended to both different types of programmable accelerators (like GPUs) and to more bespoke discrete accelerators (like TPUs) is needed. The broad goals of this proposal are to (1) reduce detailed simulation time, particularly on ML training and inference workloads, by characterizing execution in silicon, validated with detailed simulation, to find representative program slices, (2) develop a flexible energy model for accelerators, and (3) develop infrastructure to model the multi-GPU and multi-accelerator system of the future. To tackle these problems, we propose to extend existing simulation infrastructure, such as gem5, to create a scalable, parallel simulation framework that models diverse system components with varying levels of detail. In this talk, I discuss our on-going efforts to identify the most important parts of the applications to simulate at high fidelity while reducing simulation time.
“Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems”
Prof Matt Sinclair
University of Connecticut CS Seminar
Abstract: In recent years, system designers have increasingly been turning to heterogeneous systems to improve performance and energy efficiency. Specialized accelerators are frequently used to improve the efficiency of computations that run inefficiently on conventional, general-purpose processors. As a result, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly using large numbers of accelerators (including GPUs) while providing better efficiency than CPU-based solutions. In particular, GPUs are widely used in these systems due to their combination of programmability and efficiency. Traditionally, GPUs are throughput-oriented, focused on data parallelism, and assume synchronization happens at a coarse granularity. However, programmers have begun using these systems for a wider variety of applications which exhibit different characteristics, including latency-sensitivity, mixes of both task and data parallelism, and fine-grained synchronization. Thus, future heterogeneous systems must evolve and make deadline-aware scheduling, more intelligent data movement, efficient fine-grained synchronization, and effective power management first-order design constraints. In the first part of this talk, I will discuss our efforts to apply hardware-software co-design to help future heterogeneous systems overcome these challenges and improve performance, energy efficiency, and scalability. Then, in the second part I will discuss how the on-going transition to chiplet-based heterogeneous systems exacerbates these challenges and how we address these challenges in chiplet-based heterogeneous systems by rethinking the control plane.
[13] “DPU-Centric Disaggregated Storage”
Prof Ming Liu
Grant from Western Digital
Abstract: Storage disaggregation has gained significant traction and is being widely deployed. Lately, with the advent of high-bandwidth server interconnects and new NVMe form factor, commodity storage appliances are becoming denser and able to deliver tens of millions of IOPS. To fully unleash the I/O capability, the server processor has to spend tremendous CPU cycles on I/O processing, leaving little computing headroom available to other co-located applications. This project aims to build a DPU-centric storage stack that manages, schedules, and processes I/O-induced tasks with minimal CPU involvement.
[14] https://www.nsf.gov/awardsearch/showAward?AWD_ID=2431419&HistoricalAwards=false
Cloud technologies underpin many of the apps and computing services that we use every day, including social networking, online commerce, email, video conferencing, and many more. To be able to do research that drives forward the fundamental architecture of the cloud, scientists need an environment in which they can experiment with the primary building blocks of the cloud: compute power, storage, and networks. CloudLab provides an environment where researchers can build their own clouds, observing and changing how they operate all the way to the bottom layers of software.
CloudLab is a large facility, consisting, as of the start of this award, of more than 1,800 servers hosted at the University of Utah, Clemson University, and the University of Wisconsin-Madison. These servers include cutting-edge compute (such as several different processor architectures, graphics processing units, and field programmable gate arrays), networking (OpenFlow, user-programmable switching), and storage (hard drives, solid state drives) technologies. This phase IV award will support the expansion of this facility by several hundred servers to meet high demand. It will also add new technologies, such as Internet of Things devices, programmable network cards, and non-volatile random access memory, which will in turn support research on data-intensive computing and computing at the edge of the network.
CloudLab is available to researchers who work on the fundamentals of cloud computing. These users do research in areas such as networking, security, and databases; in turn, this work has broad impact, as these are the fundamental technologies upon which we build things such as smart cities, telehealth, online education, and other socially important computer services. Because it is offered at no cost for research and education to academic users, it acts as a level playing field on which institutions large and small, and those with many resources and those with few, can conduct work on an equal footing.
CloudLab can be found online at https://cloudlab.us/, which is the primary interface through which research users interact with the facility, making it accessible from anywhere. This award will support continued operation of the facility, including the aforementioned hardware expansion, development of new features, and user support.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
[15] https://s2024.siggraph.org/program/emerging-technologies/
BEST IN SHOW
Congratulations to “A Live Demo of Single-photon Imaging and Applications” (Sacha Jungerman, Varun Sundar, Mohit Gupta), recipient of the SIGGRAPH 2024 Emerging Technologies Best in Show Award!
About A Live Demo of Single-photon Imaging and Applications
Single-photon sensors are a rapidly evolving imaging technology featuring the unique ability of sensing light at its finest possible granularity, viz., at the photon level. Our demo will exhibit several exciting capabilities of single-photon sensors, which is made possible by an interplay of novel algorithms and recently developed, large-format sensor arrays.
More info: https://wisionlab.com/project/single-photon-imaging-at-siggraph/
[16] Reactions from colleagues on the Internet had regarding the announcement of the retirement of our esteemed colleague Tom Reps. Thank you Prof Loris D’Antoni for collecting these.
Satnam Singh
I sadly don’t know Prof. Reps well, but I did have the honour of presenting the ACM SIGPLAN 2017 Programming Languages Achievement Award to him in person. The process of selecting him for this highly distinguished award made me read much about his research contributions and industrial impact. One thing I wrote in the citation for the award was “A common thread in all of Tom’s research is that it provides elegant solutions to deep foundational problems”, which is amazing, and not easily achieved. Congratulations, Prof. Reps.
Manu Sridharan
A truly astounding career. Tom’s work has influenced mine profoundly. I still learn something new from his papers every time I read them.
Rajeev Alur
Congratulations Tom for all your contributions. I would recommend “program analysis via graph reachability” as an exemplar of great PL research!
Swarat Chaudhuri
One of the most inspiring scholars I have met in my life. I am honored to work with Tom.
Jeff Dean
Congrats to Tom on his retirement! I have fond memories of reading this gem of a paper when I was in grad school on “Interprocedural slicing using dependence graphs” that Susan Horowitz, he and David Binkley wrote.
Mooly Sagiv
I still remember skiing with Susan and you, Tom. Susan’s 40th birthday, our collaboration in Copenhagen. Thanksgiving with you guys.
Isil Dillig
Kudos to Tom and best wishes for a fulfilling retirement! And what a wonderful way to commemorate Susan and all her accomplishments
Wei Yang
One of the first research papers I have read in my undergraduate years (15 years ago) is the IFDS framework by Prof. Reps. Later, my ICSE’ 15 and ICSE’ 18 papers are all developed based on it (I suppose most of interprocedural static analysis papers are based on IFDS).
Thomas Gilray
Always loved Rep’s comments on his “POPL drought”: https://cs.cmu.edu/~popl-interviews/reps.html
Those striving and struggling to publish at the top PL venues should really check it out for a dose of perspective.
Shuvendu Lahiri
Very fortunate to have collaborated with Tom, and also to revive in small parts Tom and (late) Susan’s seminal work on program integration/merge conflict resolution through formal methods and ML.
[a] https://dl.acm.org/doi/10.1145/3672202.3673715
Poster: Beyond Proximity: Exploring Remote Cloud Peering
Authors: Esteban Carisimo, Mia Weaver, Fabián Bustamante, Paul Barford
We investigate network peering location choices, focusing on whether networks opt for distant peering sites even when nearby options are available. We conduct a network-wide cloud-based traceroute campaign using virtual machine instances from four major cloud providers to identify peering locations and calculate the “peering stretch”: the extra distance networks travel beyond the nearest data center to their actual peering points. Our results reveal a median peering stretch of 300 kilometers, with some networks traveling as much as 6,700 kilometers. We explore the characteristics of networks that prefer distant peering points and the potential motivations behind these choices, providing insights into digital sovereignty and cybersecurity implications.
[b] Efficient Implementation of an Abstract Domain of Quantified First-Order Formulas
Eden Frenkel, Tej Chajed, Oded Padon, Sharon Shoham
This paper lays a practical foundation for using abstract interpretation with an abstract domain that consists of sets of quantified first-order logic formulas. This abstract domain seems infeasible at first sight due to the complexity of the formulas involved and the enormous size of sets of formulas (abstract elements). We introduce an efficient representation of abstract elements, which eliminates redundancies based on a novel syntactic subsumption relation that under-approximates semantic entailment. We develop algorithms and data structures to efficiently compute the join of an abstract element with the abstraction of a concrete state, operating on the representation of abstract elements. To demonstrate feasibility of the domain, we use our data structures and algorithms to implement a symbolic abstraction algorithm that computes the least fixpoint of the best abstract transformer of a transition system, which corresponds to the strongest inductive invariant. We succeed at finding, for example, the least fixpoint for Paxos (which in our representation has 1,438 formulas with ∀∗∃∗∀∗ quantification) in time comparable to state-of-the-art property-directed approaches.
[c] See above [5]
[d] https://www.usenix.org/conference/osdi24/presentation/zha
“Inductive Invariants That Spark Joy: Using Invariant Taxonomies to Streamline Distributed Protocol Proofs”
Authors: Tony Nuda Zhang, University of Michigan; Travis Hance, Carnegie Mellon University; Manos Kapritsos, University of Michigan; Tej Chajed, University of Wisconsin–Madison; Bryan Parno, Carnegie Mellon University
Abstract: Proving the correctness of a distributed protocol is a challenging endeavor. Central to this task is finding an inductive invariant for the protocol. Currently, automated invariant inference algorithms require developers to describe protocols using a restricted logic. If the developer wants to prove a protocol expressed without these restrictions, they must devise an inductive invariant manually.
We propose an approach that simplifies and partially automates finding the inductive invariant of a distributed protocol, as well as proving that it really is an invariant. The key insight is to identify an invariant taxonomy that divides invariants into Regular Invariants, which have one of a few simple low-level structures, and Protocol Invariants, which capture the higher-level host relationships that make the protocol work.
Building on the insight of this taxonomy, we describe the Kondo methodology for proving the correctness of a distributed protocol modeled as a state machine. The developer first manually devises the Protocol Invariants by proving a synchronous version of the protocol correct. In this simpler version, sends and receives are replaced with atomic variable assignments. The Kondo tool then automatically generates the asynchronous protocol description, Regular Invariants, and proofs that the Regular Invariants are inductive on their own. Finally, Kondo combines these with the synchronous proof into a draft proof of the asynchronous protocol, which may then require a small amount of user effort to complete. Our evaluation shows that Kondo reduces developer effort for a wide variety of distributed protocols.
[e] https://arxiv.org/abs/2309.04664
“Compact: Approximating Complex Activation Functions for Secure Computation”
Mazharul Islam, Sunpreet S. Arora, Rahul Chatterjee, Peter Rindal, Maliheh Shirvanian
Secure multi-party computation (MPC) techniques can be used to provide data privacy when users query deep neural network (DNN) models hosted on a public cloud. State-of-the-art MPC techniques can be directly leveraged for DNN models that use simple activation functions such as ReLU. However, these techniques are ineffective and/or inefficient for the complex and highly non-linear activation functions used in cutting-edge DNN models.
We present Compact, which produces piece-wise polynomial approximations of complex AFs to enable their efficient use with state-of-the-art MPC techniques. Compact neither requires nor imposes any restriction on model training and results in near-identical model accuracy. To achieve this, we design Compact with input density awareness and use an application-specific simulated annealing type optimization to generate computationally more efficient approximations of complex AFs. We extensively evaluate Compact on four different machine-learning tasks with DNN architectures that use popular complex AFs silu, gelu, and mish. Our experimental results show that Compact incurs negligible accuracy loss while being 2x-5x computationally more efficient than state-of-the-art approaches for DNN models with large number of hidden layers. Our work accelerates easy adoption of MPC techniques to provide user data privacy even when the queried DNN models consist of a number of hidden layers and trained over complex AFs.
[f] https://petsymposium.org/popets/2024/popets-2024-0135.php
Scalable Metadata-Hiding for Privacy-Preserving IoT Systems
Authors: Yunang Chen (University of Wisconsin-Madison), David Heath (University of Illinois Urbana-Champaign), Rahul Chatterjee (University of Wisconsin-Madison), Earlence Fernandes (University of California San Diego)
Abstract: Modern cloud-based IoT services comprise an integrator service and several device vendor services. The vendor services enable users to remotely control their devices, while the integrator serves as a central intermediary, offering a unified interface for managing devices from different vendors. Although such a model is quite beneficial for IoT services to evolve quickly, it also creates a serious privacy concern: the vendor and integrator services observe all interactions between users and devices. Toward this, we propose Mohito, a privacy-preserving IoT system that hides such interactions from both the integrator and the vendors. In Mohito, we protect both the interaction data and the metadata, so that no one learns which user is communicating with which device. By utilizing oblivious key-value storage as a primitive and leveraging the unique communication graph of IoT services, we build a scalable protocol specialized in handling large concurrent traffic, a common demand in IoT systems. Our evaluation shows that Mohito can achieve up to 600x more throughput than the state-of-the-art general-purpose systems that provide similar security guarantees.
[g] https://dl.acm.org/doi/10.1145/3650110
Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program Traces
Authors: Asmita Pal, Keerthana Desai, Rahul Chatterjee, Joshua San Miguel
ACM Transactions on Architecture and Code Optimization
Volume 21, Issue 2
Trace-based simulation is a widely used methodology for system design exploration. It relies on realistic traces that represent a range of behaviors necessary to be evaluated, containing a lot of information about the application, its inputs and the underlying system on which it was generated. Consequently, generating traces from real-world executions risks leakage of sensitive information. To prevent this, traces can be obfuscated before release. However, this can undermine their ideal utility, i.e., how realistically a program behavior was captured. To address this, we propose Camouflage, a novel obfuscation framework, designed with awareness of the necessary architectural properties required to preserve trace utility, while ensuring secrecy of the inputs used to generate the trace. Focusing on memory access traces, our extensive evaluation on various benchmarks shows that camouflaged traces preserve the performance measurements of the original execution, with an average τ correlation of 0.66. We model input secrecy as an input indistinguishability problem and show that the average security loss is 7.8%, which is better than traces generated from the state-of-the-art.
[h] https://www.usenix.org/conference/usenixsecurity24/presentation/gupta
‘”I really just leaned on my community for support”: Barriers, Challenges, and Coping Mechanisms Used by Survivors of Technology-Facilitated Abuse to Seek Social Support’
Naman Gupta, University of Wisconsin–Madison, Madison Tech Clinic; Kate Walsh, University of Wisconsin–Madison; Sanchari Das, University of Denver; Rahul Chatterjee, University of Wisconsin–Madison, Madison Tech Clinic
Abstract: Technology-facilitated abuse (TFA) from an intimate partner is a growing concern for survivors’ safety and security. Prior research introduced tailored interventions to support the survivors of TFA. However, most survivors do not have access to these interventions or are unaware of the appropriate support networks and resources in their community. We conducted nine semi-structured interviews to examine survivors’ support-seeking dynamics from their social networks and the effectiveness of social networks in addressing survivors’ needs. Through survivors’ lived experiences, we systematize socio-technical barriers that impede the participant from seeking support and challenges in seeking support. Further, we identify coping mechanisms used by the survivors despite those barriers and challenges. Through a participatory lens, we echo survivors’ call for action to improve support networks and propose recommendations for technology design to promote safer support-seeking practices and resources, consciousness-raising awareness campaigns, and collaborations with the community. Finally, we call for a restorative-justice-oriented framework that recognizes TFA.
[i] https://arxiv.org/abs/2406.01792
The SemGuS Toolkit
Keith J.C. Johnson, Andrew Reynolds, Thomas Reps, Loris D’Antoni
Semantics-Guided Synthesis (SemGuS) is a programmable framework for defining synthesis problems in a domain- and solver-agnostic way. This paper presents the standardized SemGuS format, together with an open-source toolkit that provides a parser, a verifier, and enumerative SemGuS solvers. The paper also describes an initial set of SemGuS benchmarks, which form the basis for comparing SemGuS solvers, and presents an evaluation of the baseline enumerative solvers.
[j] https://learningtheory.org/colt2024/schedule/poster_206.html
Testable Learning of General Halfspaces with Adversarial Label Noise
Diakonikolas, Ilias; Kane, Daniel M; Liu, Sihan; Zarifis, Nikos
Abstract: We study the task of testable learning of general — not necessarily homogeneous — halfspaces with adversarial label noise with respect to the Gaussian distribution. In the testable learning framework, the goal is to develop a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data. Our main result is the first polynomial time tester-learner for general halfspaces that achieves dimension-independent misclassification error. At the heart of our approach is a new methodology to reduce testable learning of general halfspaces to testable learning of nearly homogeneous halfspaces that may be of broader interest.
[k] https://arxiv.org/abs/2403.02300
Statistical Query Lower Bounds for Learning Truncated Gaussians
Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas, Nikos Zarifis
We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family of sets. Specifically, for a fixed but unknown truncation set S⊆ℝd, we are given access to samples from the distribution (μ,I) truncated to the set S. The goal is to estimate μ within accuracy ϵ>0 in ℓ2-norm. Our main result is a Statistical Query (SQ) lower bound suggesting a super-polynomial information-computation gap for this task. In more detail, we show that the complexity of any SQ algorithm for this problem is dpoly(1/ϵ), even when the class is simple so that poly(d/ϵ) samples information-theoretically suffice. Concretely, our SQ lower bound applies when is a union of a bounded number of rectangles whose VC dimension and Gaussian surface are small. As a corollary of our construction, it also follows that the complexity of the previously known algorithm for this task is qualitatively best possible.
[l] https://arxiv.org/abs/2307.12840
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials
Ilias Diakonikolas, Daniel M. Kane
We study the problem of PAC learning a linear combination of k ReLU activations under the standard Gaussian distribution on ℝd with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity (dk/ϵ)O(k), where ϵ>0 is the target accuracy. Prior work had given an algorithm for this problem with complexity (dk/ϵ)h(k), where the function h(k) scales super-polynomially in k. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the O(k)-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.
[m] https://arxiv.org/abs/2402.03502
How Does Unlabeled Data Provably Help Out-of-Distribution Detection?
Xuefeng Du, Zhen Fang, Ilias Diakonikolas, Yixuan Li
Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD classifier. Currently, there is a lack of research on formally understanding how unlabeled data helps OOD detection. This paper bridges the gap by introducing a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness. The framework separates candidate outliers from the unlabeled data and then trains an OOD classifier using the candidate outliers and the labeled ID data. Theoretically, we provide rigorous error bounds from the lens of separability and learnability, formally justifying the two components in our algorithm. Our theory shows that SAL can separate the candidate outliers with small error rates, which leads to a generalization guarantee for the learned OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, reinforcing our theoretical insights. Code is publicly available at this https URL.
[n] https://openreview.net/forum?id=0i6Z9N5MLY
Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions
Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas
Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which $n$ component operators in the finite sum are “on average” either cocoercive or Lipschitz continuous and monotone, with parameter $L$. The resulting oracle complexity of our methods, which provide guarantees for the last iterate and for a (computable) operator norm residual, is $\widetilde{\mathcal{O}}( n + \sqrt{n}L\varepsilon^{-1})$, which improves upon existing methods by a factor up to $\sqrt{n}$. This constitutes the first variance reduction-type result for general finite-sum monotone inclusions and for more specific problems such as convex-concave optimization when operator norm residual is the optimality measure. We further argue that, up to poly-logarithmic factors, this complexity is unimprovable in the monotone Lipschitz setting; i.e., the provided result is near-optimal.
[o] https://eprint.iacr.org/2024/798
“Incompressible Functional Encryption”
Rishab Goyal, Venkata Koppula, Mahesh Sreekumar Rajasree, Aman Verma
Incompressible encryption (Dziembowski, Crypto’06; Guan, Wichs, Zhandry, Eurocrypt’22) protects from attackers that learn the entire decryption key, but cannot store the full ciphertext. In incompressible encryption, the attacker must try to compress a ciphertext within pre-specified memory bound before receiving the secret key.
In this work, we generalize the notion of incompressibility to functional encryption. …
[p] https://eprint.iacr.org/2024/931
“Leveled Fully-Homomorphic Signatures from Batch Arguments”
Abtin Afshar, Jiaqi Cheng, Rishab Goyal
Fully homomorphic signatures are a significant strengthening of digital signatures, enabling computations on \emph{secretly} signed data. Today, we have multiple approaches to design fully homomorphic signatures such as from lattices, or succinct functional commitments, or indistinguishability obfuscation, or mutable batch arguments. Unfortunately, all existing constructions for homomorphic signatures suffer from one or more limitations. We do not have homomorphic signatures with features such as multi-hop evaluation, context hiding, and fast amortized verification, while relying on standard falsifiable assumptions.
In this work, we design homomorphic signatures satisfying all above properties. We construct homomorphic signatures for polynomial-sized circuits from a variety of standard assumptions such as sub-exponential DDH, standard pairing-based assumptions, or learning with errors. We also discuss how our constructions can be easily extended to the multi-key setting.
[q] https://dl.acm.org/doi/10.1145/3641517.3664397
“A Live Demo of Single-Photon Imaging and Applications”
Authors: Sacha Jungerman, Varun Sundar, Mohit Gupta
SIGGRAPH ’24: ACM SIGGRAPH 2024 Emerging Technologies
Single-photon sensors are a novel class of imaging sensors that are sensitive to the individual arrival of photons. In recent years, single-photon sensors have witnessed rapid growth in key sensor characteristics such as array resolution, which, for the first time, has made computer vision and imaging with these emerging sensors possible. This rapid growth has also spurred the development of algorithms that can harness the rich spatio-temporal scene information captured by single-photon sensors, which has led to several new imaging capabilities. These capabilities include imaging in challenging SNR conditions, high-dynamic range imaging, compensating extreme camera motion extents, and even providing the functionality of numerous imaging systems via post-capture processing. Our demo will showcase these exciting capabilities to a wide computer vision and graphics audience, and in doing so, make a case for the mainstream adoption of single-photon technology.
[r] https://openreview.net/forum?id=sVEu295o70
“Understanding when Dynamics-Invariant Data Augmentations Benefit Model-free Reinforcement Learning Updates”
Nicholas Corrado, Josiah P. Hanna
ICLR 2024
Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency. While prior work has demonstrated the utility of incorporating augmented data directly into model-free RL updates, it is not well-understood when a particular DA strategy will improve data efficiency. In this paper, we seek to identify general aspects of DA responsible for observed learning improvements. Our study focuses on sparse-reward tasks with dynamics-invariant data augmentation functions, serving as an initial step towards a more general understanding of DA and its integration into RL training. Experimentally, we isolate three relevant aspects of DA: state-action coverage, reward density, and the number of augmented transitions generated per update (the augmented replay ratio). From our experiments, we draw two conclusions: (1) increasing state-action coverage often has a much greater impact on data efficiency than increasing reward density, and (2) decreasing the augmented replay ratio substantially improves data efficiency. In fact, certain tasks in our empirical study are solvable only when the replay ratio is sufficiently low.
[s] https://dl.acm.org/doi/10.1145/3579986
“Introduction to the Special Issue on Automotive CPS Safety & Security: Part 1”
Authors: Samarjit Chakraborty, Somesh Jha, Soheil Samii, Philipp Mundhenk
[t] https://arxiv.org/abs/2304.03870
“ASPEST: Bridging the Gap Between Active Learning and Selective Prediction”
Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan Arik, Somesh Jha, Tomas Pfister
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST→SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop.
[u] https://openreview.net/forum?id=dwzLn78jq7
On the Scalability and Memory Efficiency of Semidefinite Programs for Lipschitz Constant Estimation of Neural Networks
Download PDF
Zi Wang, Bin Hu, Aaron J Havens, Alexandre Araujo, Yang Zheng, Yudong Chen, Somesh Jha
ICLR 2024
TL;DR: We scale the SDP for Lipschitz constant estimation to practical CIFAR10/ImageNet networks.
Lipschitz constant estimation plays an important role in understanding generalization, robustness, and fairness in deep learning. Unlike naive bounds based on the network weight norm product, semidefinite programs (SDPs) have shown great promise in providing less conservative Lipschitz bounds with polynomial-time complexity guarantees. However, due to the memory consumption and running speed, standard SDP algorithms cannot scale to modern neural network architectures. In this paper, we transform the SDPs for Lipschitz constant estimation into an eigenvalue optimization problem, which aligns with the modern large-scale optimization paradigms based on first-order methods. This is amenable to autodiff frameworks such as PyTorch and TensorFlow, requiring significantly less memory than standard SDP algorithms. The transformation also allows us to leverage various existing numerical techniques for eigenvalue optimization, opening the way for further memory improvement and computational speedup. The essential technique of our eigenvalue-problem transformation is to introduce redundant quadratic constraints and then utilize both Lagrangian and Shor’s SDP relaxations under a certain trace constraint. Notably, our numerical study successfully scales the SDP-based Lipschitz constant estimation to address large neural networks on ImageNet. Our numerical examples on CIFAR10 and ImageNet demonstrate that our technique is more scalable than existing approaches. Our code is available at https://github.com/z1w/LipDiff.
[v] https://arxiv.org/abs/2310.05385
Conjunctive Queries with Negation and Aggregation: A Linear Time Characterization
Hangdong Zhao, Austen Z. Fan, Xiating Ouyang, Paraschos KoutrisIn this paper, we study the complexity of evaluating Conjunctive Queries with negation (\cqneg). First, we present an algorithm with linear preprocessing time and constant delay enumeration for a class of CQs with negation called free-connex signed-acyclic queries. We show that no other queries admit such an algorithm subject to lower bound conjectures. Second, we extend our algorithm to Conjunctive Queries with negation and aggregation over a general semiring, which we call Functional Aggregate Queries with negation (\faqneg). Such an algorithm achieves constant delay enumeration for the same class of queries, but with a slightly increased preprocessing time which includes an inverse Ackermann function. We show that this surprising appearance of the Ackermmann function is probably unavoidable for general semirings, but can be removed when the semiring has specific structure. Finally, we show an application of our results to computing the difference of CQs.
[w] https://arxiv.org/abs/2310.19642
Consistent Query Answering for Primary Keys on Rooted Tree Queries
Paraschos Koutris, Xiating Ouyang, Jef Wijsen
We study the data complexity of consistent query answering (CQA) on databases that may violate the primary key constraints. A repair is a maximal subset of the database satisfying the primary key constraints. For a Boolean query q, the problem CERTAINTY(q) takes a database as input, and asks whether or not each repair satisfies q. The computational complexity of CERTAINTY(q) has been established whenever q is a self-join-free Boolean conjunctive query, or a (not necessarily self-join-free) Boolean path query. In this paper, we take one more step towards a general classification for all Boolean conjunctive queries by considering the class of rooted tree queries. In particular, we show that for every rooted tree query q, CERTAINTY(q) is in FO, NL-hard ∩ LFP, or coNP-complete, and it is decidable (in polynomial time), given q, which of the three cases applies. We also extend our classification to larger classes of queries with simple primary keys. Our classification criteria rely on query homomorphisms and our polynomial-time fixpoint algorithm is based on a novel use of context-free grammar (CFG).
[x] https://dl.acm.org/doi/10.1145/3651588
Tight Bounds of Circuits for Sum-Product Queries
Authors: Austen Z. Fan, Paraschos Koutris, Hangdong Zhao
Proceedings of the ACM on Management of Data, Volume 2, Issue 2
In this paper, we ask the following question: given a Boolean Conjunctive Query (CQ), what is the smallest circuit that computes the provenance polynomial of the query over a given semiring? We answer this question by giving upper and lower bounds. Notably, it is shown that any circuit F that computes a CQ over the tropical semiring must have size log |F| ≥ (1-ε) · da-entw for any ε >0, where da-entw is the degree-aware entropic width of the query. We show a circuit construction that matches this bound when the semiring is idempotent. The techniques we use combine several central notions in database theory: provenance polynomials, tree decompositions, and disjunctive Datalog programs. We extend our results to lower and upper bounds for formulas (i.e., circuits where each gate has outdegree one), and to bounds for non-Boolean CQs.
[y] https://arxiv.org/abs/2403.12436
Evaluating Datalog over Semirings: A Grounding-based Approach
Hangdong Zhao, Shaleen Deep, Paraschos Koutris, Sudeepa Roy, Val Tannen
Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring σ, what is the tightest possible runtime? To this end, our main contribution is a general two-phase framework for analyzing the data complexity of Datalog over σ: first ground the program into an equivalent system of polynomial equations (i.e. grounding) and then find the least fixpoint of the grounding over σ. We present algorithms that use structure-aware query evaluation techniques to obtain the smallest possible groundings. Next, efficient algorithms for fixpoint evaluation are introduced over two classes of semirings: (1) finite-rank semirings and (2) absorptive semirings of total order. Combining both phases, we obtain state-of-the-art and new algorithmic results. Finally, we complement our results with a matching fine-grained lower bound.
[z] https://dl.acm.org/doi/abs/10.1145/3651598
Topology-aware Parallel Joins
Authors: Xiao Hu, Paraschos KoutrisAuthors Info & Claims
Proceedings of the ACM on Management of Data, Volume 2, Issue 2
We study the design and analysis of parallel join algorithms in a topology-aware computational model. In this model, the network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred between the two endpoints and the link bandwidth. The computation proceeds in synchronous rounds and the cost of each round is measured as the maximum cost over all the edges in the network. Our main result is an asymptotically optimal join algorithm over symmetric tree topologies. The algorithm generalizes prior topology-aware protocols for set intersection and cartesian product to a binary join over an arbitrary input distribution with possible data skew.
[aa] https://arxiv.org/abs/2311.07377
Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach
Xi Zheng, Aloysius K. Mok, Ruzica Piskac, Yong Jae Lee, Bhaskar Krishnamachari, Dakai Zhu, Oleg Sokolsky, Insup Lee
The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations. This convergence has accelerated the development and deployment of a range of real-world applications, such as autonomous vehicles, delivery drones, service robots, and telemedicine procedures. However, the software development life cycle (SDLC) for AI-infused CPS diverges significantly from traditional approaches, featuring data and learning as two critical components. Existing verification and validation techniques are often inadequate for these new paradigms. In this study, we pinpoint the main challenges in ensuring formal safety for learningenabled CPS.We begin by examining testing as the most pragmatic method for verification and validation, summarizing the current state-of-the-art methodologies. Recognizing the limitations in current testing approaches to provide formal safety guarantees, we propose a roadmap to transition from foundational probabilistic testing to a more rigorous approach capable of delivering formal assurance.
[ab] https://arxiv.org/abs/2402.07785
HYPO: Hyperspherical Out-of-Distribution Generalization
Yifei Ming, Haoyue Bai, Julian Katz-Samuels, Yixuan Li
Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires the ability to learn invariant features across different domains or environments. In this paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space. In particular, our hyperspherical learning algorithm is guided by intra-class variation and inter-class separation principles — ensuring that features from the same class (across different training domains) are closely aligned with their class prototypes, while different class prototypes are maximally separated. We further provide theoretical justifications on how our prototypical learning objective improves the OOD generalization bound. Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. Code is available at this https URL.
[ac] https://arxiv.org/abs/2402.17888
ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection
Bo Peng, Yadan Luo, Yonggang Zhang, Yixuan Li, Zhen Fang
Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. Many efforts have been dedicated to deriving score functions based on logits, distances, or rigorous data distribution assumptions to identify low-scoring OOD samples. Nevertheless, these estimate scores may fail to accurately reflect the true data density or impose impractical constraints. To provide a unified perspective on density-based score design, we propose a novel theoretical framework grounded in Bregman divergence, which extends distribution considerations to encompass an exponential family of distributions. Leveraging the conjugation constraint revealed in our theorem, we introduce a \textsc{ConjNorm} method, reframing density function design as a search for the optimal norm coefficient p against the given dataset. In light of the computational challenges of normalization, we devise an unbiased and analytically tractable estimator of the partition function using the Monte Carlo-based importance sampling technique. Extensive experiments across OOD detection benchmarks empirically demonstrate that our proposed \textsc{ConjNorm} has established a new state-of-the-art in a variety of OOD detection setups, outperforming the current best method by up to 13.25% and 28.19% (FPR95) on CIFAR-100 and ImageNet-1K, respectively.
[ad] https://arxiv.org/abs/2402.15017
Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning
Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang
Foundation models have emerged as a powerful tool for many AI problems. Despite the tremendous success of foundation models, effective adaptation to new tasks, particularly those with limited labels, remains an open question and lacks theoretical understanding. An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples. In this paper, we study the theoretical justification of this multitask finetuning approach. Our theoretical analysis reveals that with a diverse set of related tasks, this multitask finetuning leads to reduced error in the target task, in comparison to directly adapting the same pretrained model. We quantify the relationship between finetuning tasks and target tasks by diversity and consistency metrics, and further propose a practical task selection algorithm. We substantiate our theoretical claims with extensive empirical evidence. Further, we present results affirming our task selection algorithm adeptly chooses related finetuning tasks, providing advantages to the model performance on target tasks. We believe our study shed new light on the effective adaptation of foundation models to new tasks that lack abundant labels. Our code is available at this https URL.
[ae] https://dl.acm.org/doi/full/10.1145/3653716
eZNS: An Elastic Zoned Namespace for Commodity ZNS SSDs
Jaehong Min, Chenxingyu Zhao, Ming Liu, and Arvind Krishnamurthy
Emerging Zoned Namespace (ZNS) SSDs, providing the coarse-grained zone abstraction, hold the potential to significantly enhance the cost-efficiency of future storage infrastructure and mitigate performance unpredictability. However, existing ZNS SSDs have a static zoned interface, making them in-adaptable to workload runtime behavior, unscalable to underlying hardware capabilities, and interfering with co-located zones. Applications either under-provision the zone resources yielding unsatisfied throughput, create over-provisioned zones and incur costs, or experience unexpected I/O latencies. We propose eZNS, an elastic-zoned namespace interface that exposes an adaptive zone with predictable characteristics. eZNS comprises two major components: a zone arbiter that manages zone allocation and active resources on the control plane, a hierarchical I/O scheduler with read congestion control, and write admission control on the data plane. Together, eZNS enables the transparent use of a ZNS SSD and closes the gap between application requirements and zone interface properties. Our evaluations over RocksDB demonstrate that eZNS outperforms a static zoned interface by 17.7% and
80.3% in throughput and tail latency, respectively, at most.
[af] https://www.usenix.org/conference/nsdi24/presentation/hou
Understanding Routable PCIe Performance for Composable Infrastructures
Wentao Hou, Jie Zhang, Zeke Wang, Ming Liu
Routable PCIe has become the predominant cluster interconnect to build emerging composable infrastructures. Empowered by PCIe non-transparent bridge devices, PCIe transactions can traverse multiple switching domains, enabling a server to elastically integrate a number of remote PCIe devices as local ones. However, it is unclear how to move data or perform communication efficiently over the routable PCIe fabric without understanding its capabilities and limitations.
This paper presents the design and implementation of rPCIeBench, a software-hardware co-designed benchmarking framework to systematically characterize the routable PCIe fabric. rPCIeBench provides flexible data communication primitives, exposes end-to-end PCIe transaction observability, and enables reconfigurable experiment deployment. Using rPCIeBench, we first analyze the communication characteristics of a routable PCIe path, quantify its performance tax, and compare it with the local PCIe link. We then use it to dissect in-fabric traffic orchestration behaviors and draw three interesting findings: approximate max-min bandwidth partition, fast end-to-end bandwidth synchronization, and interference-free among orthogonal data paths. Finally, we encode gathered characterization insights as traffic orchestration rules and develop an edge constraints relaxing algorithm to estimate PCIe flow transmission performance over a shared fabric. We validate its accuracy and demonstrate its potential to provide an optimization guide to design efficient flow schedulers.
[ag] https://arxiv.org/abs/2403.09543
Explorations in Texture Learning
Blaine Hoak, Patrick McDaniel
In this work, we investigate \textit{texture learning}: the identification of textures learned by object classification models, and the extent to which they rely on these textures. We build texture-object associations that uncover new insights about the relationships between texture and object classes in CNNs and find three classes of results: associations that are strong and expected, strong and not expected, and expected but not present. Our analysis demonstrates that investigations in texture learning enable new methods for interpretability and have the potential to uncover unexpected biases.
[ah] https://arxiv.org/abs/2403.19577
A Public and Reproducible Assessment of the Topics API on Real Data
Yohan Beugin, Patrick McDaniel
The Topics API for the web is Google’s privacy-enhancing alternative to replace third-party cookies. Results of prior work have led to an ongoing discussion between Google and research communities about the capability of Topics to trade off both utility and privacy. The central point of contention is largely around the realism of the datasets used in these analyses and their reproducibility; researchers using data collected on a small sample of users or generating synthetic datasets, while Google’s results are inferred from a private dataset. In this paper, we complement prior research by performing a reproducible assessment of the latest version of the Topics API on the largest and publicly available dataset of real browsing histories. First, we measure how unique and stable real users’ interests are over time. Then, we evaluate if Topics can be used to fingerprint the users from these real browsing traces by adapting methodologies from prior privacy studies. Finally, we call on web actors to perform and enable reproducible evaluations by releasing anonymized distributions. We find that 2%, 3%, and 4% of the 1207 users in the dataset are uniquely re-identified across websites after only 1, 2, and 3 observations of their topics by advertisers, respectively. This paper shows on real data that Topics does not provide the same privacy guarantees to all users and that the information leakage worsens over time, further highlighting the need for public and reproducible evaluations of the claims made by new web proposals.
[ai] https://dl.acm.org/doi/10.1145/3656156.3658395
RoboCare Design Workshop: Understanding, Translating, Operationalizing, and Scaling Up Design Knowledge Regarding Robotic Systems for Care Assistance
Laura Stegner, Richard Paluch, Long-Jing Hsu, Sawyer Collins, Yaxin Hu, Marius Greuèl, Naonori Kodate, Claudia Müller, Bilge Mutlu, Selma Šabanović
Robots and other autonomous agents are well-positioned in the research discourse to support the care of people with challenges such as physical and/or cognitive disabilities. However, designing these robots can be complex as it involves considering a wide range of factors (e.g., individual needs, physical environment, technology capabilities, digital literacy), stakeholders (e.g., care recipients, formal and informal caregivers, technology developers), and contexts (e.g., hospitals, nursing homes, outpatient care facilities, private homes). The challenges are in gaining design insights for this unique use case and translating this knowledge into actionable, generalizable guidelines for other designers. This one-day workshop seeks to bring together researchers with diverse expertise and experience across academia, healthcare, and industry, spanning perspectives from multiple disciplines, including design, robotics, and human-computer interaction, with the primary goal being a consensus on best practices for generating and operationalizing design knowledge for robotic systems for care settings.
[aj] https://arxiv.org/abs/2307.09064
Newtonian Program Analysis of Probabilistic Programs
Di Wang, Thomas Reps
Due to their quantitative nature, probabilistic programs pose non-trivial challenges for designing compositional and efficient program analyses. Many analyses for probabilistic programs rely on iterative approximation. This article presents an interprocedural dataflow-analysis framework, called NPA-PMA, for designing and implementing (partially) non-iterative program analyses of probabilistic programs with unstructured control-flow, nondeterminism, and general recursion. NPA-PMA is based on Newtonian Program Analysis (NPA), a generalization of Newton’s method to solve equation systems over semirings. The key challenge for developing NPA-PMA is to handle multiple kinds of confluences in both the algebraic structures that specify analyses and the equation systems that encode control flow: semirings support a single confluence operation, whereas NPA-PMA involves three confluence operations (conditional, probabilistic, and nondeterministic).
Our work introduces ω-continuous pre-Markov algebras (ωPMAs) to factor out common parts of different analyses; adopts regular infinite-tree expressions to encode program-execution paths in control-flow hyper-graphs; and presents a linearization method that makes Newton’s method applicable to the setting of regular-infinite-tree equations over ωPMAs. NPA-PMA allows analyses to supply a non-iterative strategy to solve linearized equations. Our experimental evaluation demonstrates that (i) NPA-PMA holds considerable promise for outperforming Kleene iteration, and (ii) provides great generality for designing program analyses.
[ak] https://arxiv.org/abs/2211.06818
CFLOBDDs: Context-Free-Language Ordered Binary Decision Diagrams
Meghana Sistla, Swarat Chaudhuri, Thomas Reps
This paper presents a new compressed representation of Boolean functions, called CFLOBDDs (for Context-Free-Language Ordered Binary Decision Diagrams). They are essentially a plug-compatible alternative to BDDs (Binary Decision Diagrams), and hence useful for representing certain classes of functions, matrices, graphs, relations, etc. in a highly compressed fashion. CFLOBDDs share many of the good properties of BDDs, but–in the best case–the CFLOBDD for a Boolean function can be exponentially smaller than any BDD for that function. Compared with the size of the decision tree for a function, a CFLOBDD–again, in the best case–can give a double-exponential reduction in size. They have the potential to permit applications to (i) execute much faster, and (ii) handle much larger problem instances than has been possible heretofore.
CFLOBDDs are a new kind of decision diagram that go beyond BDDs (and their many relatives). The key insight is a new way to reuse sub-decision-diagrams: components of CFLOBDDs are structured hierarchically, so that sub-decision-diagrams can be treated as standalone ”procedures” and reused.
We applied CFLOBDDs to the problem of simulating quantum circuits, and found that for several standard problems the improvement in scalability–compared to simulation using BDDs–is quite dramatic. In particular, the number of qubits that could be handled using CFLOBDDs was larger, compared to BDDs, by a factor of 128x for GHZ; 1,024x for BV; 8,192x for DJ; and 128x for Grover’s algorithm. (With a 15-minute timeout, the number of qubits that CFLOBDDs can handle are 65,536 for GHZ, 524,288 for BV; 4,194,304 for DJ; and 4,096 for Grover’s Algorithm.)
[al] https://arxiv.org/abs/2309.04344
Zero-Shot Robustification of Zero-Shot Models
Dyah Adila, Changho Shin, Linrong Cai, Frederic Sala
Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We propose RoboShot, a method that improves the robustness of pretrained model embeddings in a fully zero-shot fashion. First, we use language models (LMs) to obtain useful insights from task descriptions. These insights are embedded and used to remove harmful and boost useful components in embeddings — without any supervision. Theoretically, we provide a simple and tractable model for biases in zero-shot embeddings and give a result characterizing under what conditions our approach can boost performance. Empirically, we evaluate RoboShot on nine image and NLP classification tasks and show an average improvement of 15.98% on worst group accuracy, with trivial decrease in overall accuracy over several zero-shot baselines. Additionally, we demonstrate that RoboShot is compatible with a variety of pretrained and language models and propose a way to further boost performance with a zero-shot adaptation variant.
[am] https://www.usenix.org/conference/atc24/presentation/tabatabai
FBMM: Making Memory Management Extensible With Filesystems
Bijan Tabatabai, James Sorenson, and Michael M. Swift, University of Wisconsin—Madison
New memory technologies like CXL promise diverse memory configurations such as tiered memory, far memory, and processing in memory. Operating systems must be modified to support these new hardware configurations for applications to make use of them. While many parts of operating systems are extensible, memory management remains monolithic in most systems, making it cumbersome to add support for a diverse set of new memory policies and mechanisms.
Rather than creating a whole new extensible interface for memory managers, we propose to instead use the memory management callbacks provided by the Linux virtual file system (VFS) to write memory managers, called memory management filesystems (MFSs). Memory is allocated by creating and mapping a file in an MFS’s mount directory and freed by deleting the file. Use of an MFS is transparent to applications. We call this system File Based Memory Management (FBMM).
Using FBMM, we created a diverse set of standalone memory managers for tiered memory, contiguous allocations, and memory bandwidth allocation, each comprising 500-1500 lines of code. Unlike current approaches that require custom kernels, with FBMM, an MFS can be compiled separately from the kernel and loaded dynamically when needed. We measure the overhead of using filesystems for memory management and found the overhead to be less than 8% when allocating a single page, and less than 0.1% when allocating as little as 128 pages. MFSs perform competitively with kernel implementations, and sometimes better due to simpler implementations.
[an] https://arxiv.org/abs/2309.01753
On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation
Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak
In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter σ>0. In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be O(σ)-close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as O(σ)-approximation of the original BO, we propose first-order algorithms that find an ϵ-stationary solution by optimizing the penalty formulation with σ=O(ϵ). When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an ϵ-stationary point of the penalty function, using in total O(ϵ−3) and O(ϵ−7) accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully {\it single-loop} manner, i.e., with O(1) samples per iteration, and achieves the improved oracle-complexity of O(ϵ−3) and O(ϵ−5), respectively.
July 2024
[1] “FlowDroid: precise context, flow, field, object-sensitive and
lifecycle-aware taint analysis for Android apps”
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel
PLDI ‘14
Most influential paper award: https://www.sigplan.org/Awards/PLDI/
Abstract: Today’s smartphones are a ubiquitous source of private and confidential data. At the same time, smartphone users are plagued by carelessly programmed apps that leak important data by accident, and by malicious apps that exploit their given privileges to copy such data intentionally. While existing static taint-analysis approaches have the potential of detecting such data leaks ahead of time, all approaches for Android use a number of coarse-grain approximations that can yield high numbers of missed leaks and false alarms.
In this work we thus present FlowDroid, a novel and highly precise static taint analysis for Android applications. A precise model of Android’s lifecycle allows the analysis to properly handle callbacks invoked by the Android framework, while context, flow, field and object-sensitivity allows the analysis to reduce the number of false alarms. Novel on-demand algorithms help FlowDroid maintain high efficiency and precision at the same time.
We also propose DroidBench, an open test suite for evaluating the effectiveness and accuracy of taint-analysis tools specifically for Android apps. As we show through a set of experiments using SecuriBench Micro, DroidBench, and a set of well-known Android test applications, FlowDroid finds a very high fraction of data leaks while keeping the rate of false positives low. On DroidBench, FlowDroid achieves 93% recall and 86% precision, greatly outperforming the commercial tools IBM AppScan Source and Fortify SCA. FlowDroid successfully finds leaks in a subset of 500 apps from Google Play and about 1,000 malware apps from the VirusShare project.
[2] ACM SIGGRAPH Seminal Graphics Papers
When we began planning how to celebrate 50 years of SIGGRAPH Conferences, there was unanimous agreement that one of the projects should be publishing a second volume of Seminal Graphics Papers. The first volume was published in 1998 as part of the celebration of the 25th SIGGRAPH conference. Seminal Graphics Papers Volume 2, perhaps more than any other activity undertaken in this milestone year, celebrates ACM SIGGRAPH’s origins and continued success as a Technical and Professional Society. This collection of papers typifies the ground-breaking research that has been the conference’s hallmark since 1974. A quick scan of the chapter and the paper titles shows just how far SIGGRAPH research has pushed the boundaries of our discipline and contributed to its evolution.
The ACM Digital Library team has been supportive of this Seminal Graphics Papers project from the beginning. I am pleased to let you know that both Volumes 1 and 2 of Seminal Graphics Papers are freely available from the ACM Digital Library at these URLs:
Volume 1: https://dl.acm.org/doi/book/10.1145/280811
Volume 2: https://dl.acm.org/doi/book/10.1145/3596711
“Content-Preserving Warps for 3D Video Stabilization”
Feng Liu, Michael Gleicher, Hailin Jin, and Aseem Agarwala
https://dl.acm.org/doi/10.1145/1531326.1531350
ACM Transactions on Graphics (TOG), Volume 28, Issue 3, July 2009
We describe a technique that transforms a video from a hand-held video camera so that it appears as if it were taken with a directed camera motion. Our method adjusts the video to appear as if it were taken from nearby viewpoints, allowing 3D camera movements to be simulated. By aiming only for perceptual plausibility, rather than accurate reconstruction, we are able to develop algorithms that can effectively recreate dynamic scenes from a single source video. Our technique first recovers the original 3D camera motion and a sparse set of 3D, static scene points using an off-the-shelf structure-from-motion system. Then, a desired camera path is computed either automatically (e.g., by fitting a linear or quadratic path) or interactively. Finally, our technique performs a least-squares optimization that computes a spatially-varying warp from each input video frame into an output frame. The warp is computed to both follow the sparse displacements suggested by the recovered 3D structure, and avoid deforming the content in the video frame. Our experiments on stabilizing challenging videos of dynamic scenes demonstrate the effectiveness of our technique.
“Motion Graphs”
Lucas Kovar, Michael Gleicher, and Frédéric Pighin
https://dl.acm.org/doi/10.1145/566654.566605
ACM Transactions on Graphics (TOG), Volume 21, Issue 3, July 2002
In this paper we present a novel method for creating realistic, controllable motion. Given a corpus of motion capture data, we automatically construct a directed graph called a motion graph that encapsulates connections among the database. The motion graph consists both of pieces of original motion and automatically generated transitions. Motion can be generated simply by building walks on the graph. We present a general framework for extracting particular graph walks that meet a user’s specifications. We then show how this framework can be applied to the specific problem of generating different styles of locomotion along arbitrary paths.
[3] https://research.wisc.edu/research-forward-round-4/
The Office of the Vice Chancellor for Research (OVCR) hosts the Research Forward initiative to stimulate and support highly innovative and groundbreaking research at the University of Wisconsin–Madison. The initiative is supported by the Wisconsin Alumni Research Foundation (WARF) and will provide funding for 1–2 years, depending on the needs and scope of the project.
Research Forward seeks to support collaborative, multidisciplinary, multi-investigator research projects that are high-risk, high-impact, and transformative. It seeks to fund research projects that have the potential to fundamentally transform a field of study as well as projects that require significant development prior to the submission of applications for external funding. Collaborative research proposals are welcome from within any of the four divisions (Arts & Humanities, Biological Sciences, Physical Sciences, Social Sciences), as are cross-divisional collaborations.
CS Projects are:
“Quanta sensing for next generation quantum computing”
PRINCIPAL INVESTIGATOR
Mohit Gupta, associate professor of computer sciences
CO-PRINCIPAL INVESTIGATORS
Mark Saffman, professor of physics
Swamit Tannu, assistant professor of computer sciences
Andreas Velten, associate professor of biostatistics and medical informatics, electrical and computer engineering
Abstract: Future quantum computers could open new scientific and engineering frontiers, impacting existential challenges like climate change. However, quantum information is delicate; it leaks with time and is prone to significant errors. These errors are exacerbated by imperfect reading and writing of quantum bits (qubits). These challenges fundamentally limit our ability to run quantum programs, and could hold back this powerful technology. Fast and accurate qubit readout, therefore, is essential for unlocking the quantum advantage. Current quantum computers use conventional cameras for reading qubits, which are inherently slow and noisy.
This research project will use quanta (single-photon) sensors for fast and accurate qubit readout. Quanta sensors detect individual photons scattered from qubits, thus enabling sensing qubits at 2-3 orders of magnitude higher speeds (few microseconds from ~10 milliseconds), thereby transforming the capabilities (speed, accuracy) of future quantum computers, and for the first time, paving the way for scalable and practical quantum computing.
“A novel deep learning platform in design molecular glues that stabilize protein-protein interactions”
PRINCIPAL INVESTIGATOR
Xuhui Huang, professor of chemistry
CO-PRINCIPAL INVESTIGATORS
Sharon Li, assistant professor of computer science
Weiping Tang, professor of pharmacy
Molecular glues can stabilize protein-protein interactions (PPIs) and have emerged as a promising class of therapeutics for many “undruggable” proteins. Deep-learning approaches show enormous potential for revolutionizing structure-based drug design. However, all deep learning methods require a large training dataset containing information on the binding pocket. Such information is often unavailable for molecular glues.
This research project will create a deep-learning platform, Chemical Feature Transformer (CFT), that learns from patterns in chemical environments (using known molecular glue-PPI structures) instead of complementary geometric shapes (like the Key-and-Lock model). To experimentally validate the predictions of the CFT, researchers will apply CFT to identify molecular glues for the degradation of cancer-associated proteins. It is expected that the deep-learning platform can greatly help other UW researchers identify novel molecular glues to tackle their disease-relevant targets. This project will provide the foundation for potentially establishing a center of excellence in AI-driven molecular design.
“An AI Terrarium for understanding intermediated and personal communication”
PRINCIPAL INVESTIGATOR
Timothy Rogers, professor of psychology
CO-PRINCIPAL INVESTIGATORS
Robert Hawkins, assistant professor of psychology
Dhavan Shah, professor of journalism
Sijia Yang, assistant professor of journalism
Jerry Zhu, professor of computer science
[4] https://research.wisc.edu/professorships-and-faculty-fellowships/kellett-mid-career-award/
This award is comparable in competitiveness to the Romnes Faculty Fellowships and the WARF Professorships but is intended to recognize and support mid-career faculty, seven to twenty years past their first promotion to a tenured position. The Mid-Career Award was created to provide needed support and encouragement to faculty at a critical stage of their careers. This award cannot be awarded more than one time to any faculty member.
[5] https://research.wisc.edu/professorships-and-faculty-fellowships/h-i-romnes-faculty-fellowships/
This program, funded by WARF in recognition of the leadership of the late WARF Trustee President H. I. Romnes, is designed to bridge the gap between the Research Committee’s initial research support for new faculty and the Mid-Career Award for Faculty Research. This award is intended to recognize and support faculty up to SIX years past their first promotion to a tenured position. This award cannot be awarded more than one time to any faculty member.
Winners:
https://research.wisc.edu/professorships-and-faculty-fellowships/h-i-romnes-faculty-fellowships/past-winners-h-i-romnes-faculty-fellowship/
The immediacy, reach and virality of contemporary communication technologies present an enormous societal opportunity and an escalating threat. Research efforts to understand these phenomena face significant challenges: field studies are complex to design, expensive to conduct, and raise concerns about user manipulation.
This research project leverages AI to power more realistic simulated personas that can interact via natural language and are tuned to update their beliefs in ways that mimic real human participants — an AI Terrarium. The AI agents will be engineered and fine-tuned to capture patterns of real communication and belief change observed in representative human populations collected using panel surveys and behavioral experiments. The behaviors of interacting “digital twins” – AI models matched to human participants in the survey and experimental collections–will then be used to simulate and understand patterns of communication and persuasion in the real world.
Once developed, the AI Terrarium will enable low-cost testing of different message campaigns, news feeds, or interaction rules. Promising strategies will be validated in carefully controlled experiments using lab-based networked messaging platforms. The ability to simulate social media interactions with human fidelity would permit rapid-cycle testing of intervention strategies on a variety of health and political issues, ranging from vaccine skepticism to electoral misinformation, and allow piloting prior to costly real-world trials.
[6] https://mlcommons.org/2024/02/call-for-ml-and-systems-rising-stars-2024/
The Rising Stars program provides a platform for talented young researchers working at the intersection of machine learning (ML) and systems to build connections with a vibrant research community, engage with both industry and academic experts, and develop their skills.
Saurabh Agarwal
“I am a fifth year grad student at UW-Madison. Advised by Dimitris Papailiopoulos and Shivaram Venkataraman. My research interests are primarily in Systems for Machine Learning, especially around distributed training and inference of ML workloads. During my PhD I have been very fortunate to intern with Bilge Acun at FAIR, Amar Phanishayee at Microsoft Research and Yucheng Low at Apple.
When I am not being a grad student, I can be found racing keelboats on Lake Mendota or alpine skiing in the winters. I also double up as a sailing instructor at the UW-Madison’s Hoofers Sailing club”
[7] “Mechanism Design for Collaborative Normal Mean estimation”
Prof Kirthi Kandasamy
Talk at RAIN Seminar
Abstract: With the rise in popularity of machine learning, data is becoming an increasingly valuable resource. While data collection is often costly, once collected, data can be freely replicated and used by many others. Hence, instead of simply collecting and learning from their own data, by sharing data with each other, agents can reduce their own data collection costs and improve the utility they derive from data. Yet naive mechanisms for sharing data can lead to free-riding, where agents may attempt to benefit from the data that the others have collected without contributing any themselves. Free-riding can manifest in anti-social behavior, such as under-collecting data, or submitting fabricated datasets.
In this talk, I will present some of our recent work where we study these challenges in one of the most foundational statistical problems, normal mean estimation. Here, a set of strategic agents collect i.i.d samples from a normal distribution at a cost, and wish to estimate the mean of this distribution. To facilitate collaboration, we design mechanisms that incentivize agents to collect a sufficient amount of data and share it truthfully, so that they are all better off than working alone. Our approach prevents under-collection and data fabrication via two key techniques: first, when sharing the others’ data with an agent, the mechanism corrupts this dataset proportional to how much the data reported by the agent differs from the others; second, we design minimax optimal estimators for the corrupted dataset. Our mechanism, which is Nash incentive compatible and individually rational, achieves a social penalty (sum of all agents’ estimation errors and data collection costs) that is at most a factor 2 of the global minimum. In two special cases where we restrict the strategy space of the agents, we design mechanisms that essentially achieve the global minimum. When applied to high dimensional (non-Gaussian) distributions with bounded variance, our mechanisms retain these properties, but with slightly weaker results.
“Data without Borders: Game-theoretic challenges in democratizing data”
Prof Kirthi Kandasamy
Talk at Midwest Machine Learning Symposium
Abstract: Due to the popularity of machine learning, many organizations view data as an invaluable resource, likening it to the “new oil/gold”. However, unlike many types of resources, data is nonrivalrous: it can be freely replicated and used by many. Hence, data produced by one organization, can, in principle, generate limitless value to many others. This will accelerate economic, social, and scientific breakthroughs and benefit society at large. However, considerations of free-riding and competition may prevent such open sharing of data between organizations. An organization may be wary that others may not be contributing a sufficient amount of data, or contributing fabricated/poisoned datasets. Organizations may also wish to monetize the data they have for profit. In some recent work, we leverage ideas from game theory, market design, and robust statistics to design protocols for data sharing. Our methods incentivize organizations to collect and truthfully contribute large amounts of data, so that socially optimal outcomes can be achieved.
[8] “Data-Efficient Adaptation for Pretrained Decision-Making Models”
Prof Frederic Sala
Talk at Midwest Machine Learning Symposium
Abstract: The use of pretrained models forms the major paradigm change in machine learning workflows this decade, including for decision making. These powerful and typically massive models have the promise to be used as a base for diverse applications. Unfortunately, it turns out that adapting these models for downstream tasks tends to be difficult and expensive, often requiring collecting and labeling additional data to further train or fine-tune the model. In this talk, I will describe my group’s work on addressing this challenge via efficient adaptation. First, when adapting vision-language models to make robust predictions, we show how to self-guide the adaptation process, without any additional data. Second, we show how to integrate relational structures like knowledge graphs into model prediction pipelines, enabling models to adapt to new domains unseen during training, without additional annotated examples. Lastly, in the most challenging scenarios, when the model must be fine-tuned on labeled data, we show how to obtain this data efficiently through techniques called weak supervision.
[9] “Characterizing and Harnessing Performance Variability in Accelerator-rich HPC systems ORNL”
Prof Matt Sinclair
Talk at NVIDIA
Abstract: Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling exascale levels of compute. Recent work demonstrated that power management (PM) can impact application performance in CPU-based HPC systems, even when machines have the same architecture and often the same SKU (stock keeping unit). This variation occurs due to manufacturing variability and the chip’s PM. However, while modern HPC systems widely employ accelerators such as GPUs, it is unclear how much this variability affects applications. In this talk, I will first discuss our work on characterizing the extent of variation due to GPU PM in modern HPC and supercomputing systems across five large-scale clusters: Oak Ridge’s Summit, Sandia’s Vortex, TACC’s Frontera and Longhorn, and Livermore’s Corona. Regardless of the application, cluster, GPU vendor, and cooling method, our results show significant variation, highlighting the difficulty in efficiently using existing GPU clusters for modern HPC and ML workloads, and the need to embrace variability in future accelerator-based systems. Second, I will discuss our efforts to embrace this variability in current systems. Specifically, we design a novel cluster scheduler , PAL, which uses performance variability measurements and application-specific profiles to improve job performance and resource utilization. PAL also balances performance variability with locality to ensure jobs are spread across as few nodes as possible. Time permitting, I will also discuss our plans to embrace performance variability in future systems.
“GEMMs: (One of) the Building Blocks of Fast Machine Learning”
Prof Matt Sinclair
Abstract: Machine learning (ML) has transformed society with significant accuracy improvements for tasks. This tremendous, transformative effect has been enabled by a virtuous synergy of (1) better hardware systems, (2) larger datasets, and (3) improved ML structures and algorithms that further benefit from more efficient hardware and larger datasets. Moreover, while ML relies on a number of important primitives, in this talk I will focus on one of them: General Matrix Multiplies (GEMMs). Specifically, I will discuss why GEMMs are well suited to running some important ML layers, and introduce interactive examples that iteratively improve performance and increase GEMM code complexity for both CPU and GPU programs, before finally discussing how CPU and GPU BLAS libraries provide high performance GEMM implementations without requiring users to write complex code.
[a] “Shadow Filesystems: Recovering from Filesystem Runtime Errors via Robust Alternative Execution”
Jing Liu, Xiangpeng Hao, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tej Chajed
HotStorage ‘24
We present Robust Alternative Execution (RAE), an approach to transparently mask runtime errors in performance-oriented filesystems via temporarily executing an alternative shadow filesystem. A shadow filesystem has the primary goal of robustness, achieved through a simple implementation without performance optimizations and concurrency while adhering to the same API and on-disk formats as the base filesystem it enhances. While the base performance-oriented filesystem may contain bugs, the shadow implementation is formally verified, leveraging advancements in the verification of low-level systems code. In the common case, the base filesystem executes and delivers high performance to applications; however, when a bug is triggered, the slow-but-correct shadow takes over, updates state correctly, and then resumes the base, thus providing high availability.
https://dl.acm.org/doi/10.1145/3655038.3665942
[b] “Homebrew: Optical Polarization Change Detection for Ground Motion Sensing”
Joseph Catudal; Zhenhao Zhou; Weijun Pan; Paul Barford; Dante Fratta; Herb Wang
2024 Optical Fiber Communications Conference and Exhibition (OFC)
We examine laser light polarization measurements using our own novel polarimeter design for ground motion sensing and show that efficacy is highly dependent on the coupling of fiber routes to vibration sources.
https://ieeexplore.ieee.org/document/10527030
[c] “DBSP: Incremental Computation on Streams and Its Applications to Databases”
Authors: Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen
ACM SIGMOD Record
We describe DBSP, a framework for incremental computation. Incremental computations repeatedly evaluate a function on some input values that are “changing”. The goal of an efficient implementation is to “reuse” previously computed results. Ideally, when presented with a new change to the input, an incremental computation should only perform work proportional to the size of the changes of the input, rather than to the size of the entire dataset.
https://dl.acm.org/doi/10.1145/3665252.3665271
[d] “Learned Load Balancing”
Brian Chang, Aditya Akella, Loris D’Antoni, and Kausik Subramanian
Theoretical Computer Science, July ‘24
Effectively balancing traffic in datacenter networks is a crucial operational goal. Most existing load balancing approaches are handcrafted to the structure of the network and/or network workloads. Thus, new load balancing strategies are required if the underlying network conditions change, e.g., due to hard or grey failures, network topology evolution, or workload shifts. While we can theoretically derive the optimal load balancing strategy by solving an optimization problem given certain traffic and topology conditions, these problems take too much time to solve and makes the derived solution stale to deploy. In this paper, we describe a load balancing scheme Learned Load Balancing (LLB), which is a general approach to finding an optimal load balancing strategy for a given network topology and workload, and is fast enough in practice to deploy the inferred strategies. LLB uses deep supervised learning techniques to learn how to handle different traffic patterns and topology changes, and adapts to any failures in the underlying network. LLB leverages emerging trends in network telemetry, programmable switching, and “smart” NICs. Our experiments show that LLB performs well under failures and can be expanded to more complex, multi-layered network topologies. We also prototype neural network inference on smartNICs to demonstrate the workability of LLB.
[e] “Super Non-singular Decompositions of Polynomials and their Application to Robustly Learning Low-degree PTFs”
Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Sihan Liu, Nikos Zarifis
STOC ‘24
We study the efficient learnability of low-degree polynomial threshold functions (PTFs) in the presence of a constant fraction of adversarial corruptions. Our main algorithmic result is a polynomial-time PAC learning algorithm for this concept class in the strong contamination model under the Gaussian distribution with error guarantee Od,c(opt^(1−c)), for any desired constant c>0, where opt is the fraction of corruptions. In the strong contamination model, an omniscient adversary can arbitrarily corrupt an opt-fraction of the data points and their labels. This model generalizes the malicious noise model and the adversarial label noise model. Prior to our work, known polynomial-time algorithms in this corruption model (or even in the weaker adversarial label noise model) achieved error Õd(opt^(1/(d+1))), which deteriorates significantly as a function of the degree d.
Our algorithm employs an iterative approach inspired by localization techniques previously used in the context of learning linear threshold functions. Specifically, we use a robust perceptron algorithm to compute a good partial classifier and then iterate on the unclassified points. In order to achieve this, we need to take a set defined by a number of polynomial inequalities and partition it into several well-behaved subsets. To this end, we develop new polynomial decomposition techniques that may be of independent interest.
“Testing Closeness of Multivariate Distributions via Ramsey Theory”
Ilias Diakonikolas, Daniel M. Kane, Sihan Liu
STOC ‘24
We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions. Specifically, given sample access to two unknown distributions p,q on ℝd, we want to distinguish between the case that p=q versus ‖p−q‖Ak>ϵ, where ‖p−q‖Ak denotes the generalized Ak distance between p and q — measuring the maximum discrepancy between the distributions over any collection of k disjoint, axis-aligned rectangles. Our main result is the first closeness tester for this problem with {\em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound.
In more detail, we provide a computationally efficient closeness tester with sample complexity O((k6/7/polyd(ϵ))logd(k)). On the lower bound side, we establish a qualitatively matching sample complexity lower bound of Ω(k6/7/poly(ϵ)), even for d=2. These sample complexity bounds are surprising because the sample complexity of the problem in the univariate setting is Θ(k4/5/poly(ϵ)). This has the interesting consequence that the jump from one to two dimensions leads to a substantial increase in sample complexity, while increases beyond that do not.
As a corollary of our general Ak tester, we obtain dTV-closeness testers for pairs of k-histograms on ℝd over a common unknown partition, and pairs of uniform distributions supported on the union of k unknown disjoint axis-aligned rectangles.
Both our algorithm and our lower bound make essential use of tools from Ramsey theory.
[f] “Technical Perspective: Unicorn: A Unified Multi-Tasking Matching Model”
AnHai Doan
SIGMOD Record
Data integration has been a long-standing challenge for data management. It has recently received significant attention due to at least three main reasons. First, many data science projects require integrating data from disparate sources before analysis can be carried out to extract insights. Second, many organizations want to build knowledge graphs, such as Customer 360s, Product 360s, and Supplier 360s, which capture all available information about the customers, products, and suppliers of an organization. Building such knowledge graphs often requires integrating data from multiple sources. Finally, there is also an increasing need to integrate a massive amount of data to create training data for AI models, such as large language models.
https://dl.acm.org/doi/10.1145/3665252.3665262
[g]
“Bandit Profit Maximization for Targeted Marketing”
Joon Suk Huh, Ellen Vitercik, Kirthevasan Kandasamy.
ACM Conference on Economics and Computation (EC) 2024
We study a sequential profit-maximization problem, optimizing for both price and ancillary variables like marketing expenditures. Specifically, we aim to maximize profit over an arbitrary sequence of multiple demand curves, each dependent on a distinct ancillary variable, but sharing the same price. A prototypical example is targeted marketing, where a firm (seller) wishes to sell a product over multiple markets. The firm may invest different marketing expenditures for different markets to optimize customer acquisition, but must maintain the same price across all markets. Moreover, markets may have heterogeneous demand curves, each responding to prices and marketing expenditures differently. The firm’s objective is to maximize its gross profit, the total revenue minus marketing costs. Our results are near-optimal algorithms for this class of problems in an adversarial bandit setting, where demand curves are arbitrary non-adaptive sequences, and the firm observes only noisy evaluations of chosen points on the demand curves. We prove a regret upper bound of O(nT 3/4) and a lower bound of Ω ((nT) 3/4) for monotonic demand curves, and a regret bound of Θ(nT 2/3) for demands curves that are monotonic in price and concave in the ancillary variable
https://arxiv.org/html/2403.01361v1
“Nash Incentive-compatible Online Mechanism Learning via Weakly Differentially Private Online Learning”
Joon Suk Huh, Kirthevasan Kandasamy.
International Conference on Machine Learning (ICML) 2024
We study a multi-round mechanism design prob- lem, where we interact with a set of agents over a sequence of rounds. We wish to design an incentive-compatible (IC) online learning scheme to maximize an application-specific objective within a given class of mechanisms, without prior knowledge of the agents’ type distributions. Even if each mechanism in this class is IC in a single round, if an algorithm naively chooses from this class on each round, the entire learning process may not be IC against non-myopic buyers who appear over multiple rounds. On each round, our method randomly chooses between the rec- ommendation of a weakly differentially private online learning algorithm (e.g., Hedge), and a commitment mechanism which penalizes non- truthful behavior. Our method is IC and achieves O(T^{1+h/2} ) regret for the application-specific objective in an adversarial setting, where h quantifies the long-sightedness of the agents. When com- pared to prior work, our approach is conceptually simpler, it applies to general mechanism design problems (beyond auctions), and its regret scales gracefully with the size of the mechanism class.
https://openreview.net/pdf?id=QQkK6YH0Th
[h] “Enabling Tabular Data Exploration for Blind and Low-Vision Users”
Yanan Wang, Arjun Srinivasan, and Yea-Seul Kim
DIS ‘24
In a data-driven society, being able to examine data on one’s own terms is crucial for various aspects of well-being. However, current data exploration paradigms, such as Exploratory Data Analysis (EDA), heavily rely on visualizations to unveil patterns and insights. This visual-centric approach poses significant challenges for blind and low-vision (BLV) individuals. To address this gap, we built a prototype that supports non-visual data exploration and conducted an observational user study involving 18 BLV participants. Participants were asked to conduct various analytical tasks, as well as free exploration of provided datasets. The study findings provide insights into the factors influencing inefficient data exploration and characterizations of BLV participants’ analytical behaviors. We conclude by highlighting future avenues of research for the design of data exploration tools for BLV users.
https://dl.acm.org/doi/10.1145/3643834.3661609
https://pages.cs.wisc.edu/~yeaseulkim/assets/papers/2024_table.pdf
[i] “How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection?”
Soumya Suvra Ghosal, Yiyou Sun, Yixuan Li
AAAI ‘24
Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances in OOD detection rely on distance measures to distinguish samples that are relatively far away from the in-distribution (ID) data. Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high-dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. In training, our method regularizes the model and its feature representation by leveraging the most relevant subset of dimensions (i.e. subspace). Subspace learning yields highly distinguishable distance measures between ID and OOD data. We provide comprehensive experiments and ablations to validate the efficacy of SNN. Compared to the current best distance-based method, SNN reduces the average FPR95 by 15.96% on the CIFAR-100 benchmark.
[j] “Tangible Scenography as a Holistic Design Method for Human-Robot Interaction”
DIS ‘24
Amy Koike, Bengisu Cagiltay, Bilge Mutlu
Traditional approaches to human-robot interaction design typically examine robot behaviors in controlled environments and narrow tasks. These methods are impractical for designing robots that interact with diverse user groups in complex human environments. Drawing from the field of theater, we present the construct of scenes — individual environments consisting of specific people, objects, spatial arrangements, and social norms — and tangible scenography, as a holistic design approach for human-robot interactions. We created a design tool, Tangible Scenography Kit (TaSK), with physical props to aid in design brainstorming. We conducted design sessions with eight professional designers to generate exploratory designs. Designers used tangible scenography and TaSK components to create multiple scenes with specific interaction goals, characterize each scene’s social environment, and design scene-specific robot behaviors. From these sessions, we found that this method can encourage designers to think beyond a robot’s narrow capabilities and consider how they can facilitate complex social interactions.
https://arxiv.org/abs/2405.19449
“The AI-DEC: A Card-based Design Method for User-centered AI Explanations”
DIS ‘24
Christine P Lee, Min Kyung Lee, Bilge Mutlu
Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four dimensions of AI explanations that are critical for the integration of AI systems — communication content, modality, frequency, and direction — and offers design examples for end-users to design AI explanations that meet their needs. We evaluated this method through co-design sessions with workers in healthcare, finance, and management industries who regularly use AI systems in their daily work. Findings indicate that the AI-DEC effectively supported workers in designing explanations that accommodated diverse levels of performance and autonomy needs, which varied depending on the AI system’s workplace role and worker values. We discuss the implications of using the AI-DEC for the user-centered design of AI explanations in real-world systems.
https://arxiv.org/abs/2405.16711
“REX: Designing User-centered Repair and Explanations to Address Robot Failures”
DIS ‘24
Christine P Lee, Pragathi Praveena, and Bilge Mutlu
Robots in real-world environments continuously engage with multiple users and encounter changes that lead to unexpected conflicts in fulfilling user requests. Recent technical advancements (e.g., large-language models (LLMs), program synthesis) offer various methods for automatically generating repair plans that address such conflicts. In this work, we understand how automated repair and explanations can be designed to improve user experience with robot failures through two user studies. In our first, online study (n = 162), users expressed increased trust, satisfaction, and utility with the robot performing automated repair and explanations. However, we also identified risk factors—safety, privacy, and complexity—that require adaptive repair strategies. The second, in-person study (n = 24) elucidated distinct repair and explanation strategies depending on the level of risk severity and type. Using a design-based approach, we explore automated repair with explanations as a solution for robots to handle conflicts and failures, complemented by adaptive strategies for risk factors. Finally, we discuss the implications of incorporating such strategies into robot designs to achieve seamless operation among changing user needs and environments.
https://dl.acm.org/doi/10.1145/3643834.3661559
“Understanding On-the-Fly End-User Robot Programming”
DIS ‘24
Laura Stegner, Yuna Hwang, David Porfirio, Bilge Mutlu
Novel end-user programming (EUP) tools enable on-the-fly (i.e., spontaneous, easy, and rapid) creation of interactions with robotic systems. These tools are expected to empower users in determining system behavior, although very little is understood about how end users perceive, experience, and use these systems. In this paper, we seek to address this gap by investigating end-user experience with on-the-fly robot EUP. We trained 21 end users to use an existing on-the-fly EUP tool, asked them to create robot interactions for four scenarios, and assessed their overall experience. Our findings provide insight into how these systems should be designed to better support end-user experience with on-the-fly EUP, focusing on user interaction with an automatic program synthesizer that resolves imprecise user input, the use of multimodal inputs to express user intent, and the general process of programming a robot.
https://arxiv.org/abs/2406.00841
[k] “Does Compressing Activations Help Model Parallel Training?”
Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman
MLSys ‘24
Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression in a model-parallel setting is an understudied area. We have discovered that model parallelism has fundamentally different characteristics than data parallelism. In this work, we present the first empirical study on the effectiveness of compression methods for model parallelism. We implement and evaluate three common classes of compression algorithms – pruning-based, learning-based, and quantization-based – using a popular Transformer training framework. We evaluate these methods across more than 160 settings and 8 popular datasets, taking into account different hyperparameters, hardware, and both fine-tuning and pre-training stages. We also provide analysis when the model is scaled up. Finally, we provide insights for future development of model parallelism compression algorithms.
[l] “‘This really lets us see the entire world:’ Designing a conversational telepresence robot for homebound older adults”
Yaxin Hu, Laura Stegner, Yasmine Kotturi, Caroline Zhang, Yi-Hao Peng, Faria Huq, Yuhang Zhao, Jeffrey P. Bigham, Bilge Mutlu
DIS ‘24
In this paper, we explore the design and use of conversational telepresence robots to help homebound older adults interact with the external world. An initial needfinding study (N=8) using video vignettes revealed older adults’ experiential needs for robot-mediated remote experiences such as exploration, reminiscence and social participation. We then designed a prototype system to support these goals and conducted a technology probe study (N=11) to garner a deeper understanding of user preferences for remote experiences. The study revealed user interactive patterns in each desired experience, highlighting the need of robot guidance, social engagements with the robot and the remote bystanders. Our work identifies a novel design space where conversational telepresence robots can be used to foster meaningful interactions in the remote physical environment. We offer design insights into the robot’s proactive role in providing guidance and using dialogue to create personalized, contextualized and meaningful experiences.
https://arxiv.org/abs/2405.15037
June 2024
[1] “Distinguished Seminar in Optimization and Data” seminars
https://math.washington.edu/events/series/distinguished-seminar-optimization-data
The seminar is held on Mondays at 03:30 PM during the term. Previous speakers include E. Tardos, P. Rigollet, L. Levina, B. Sturmfels. This seminar series is an interdepartmental activity at UW organized by faculty in the departments of Applied Mathematics, Computer Science & Engineering, Electrical & Computer Engineering, Mathematics, and Statistics.
“Nonsmooth Optimization on a Finer Scale”
Abstract: Nonsmooth optimization problems are ubiquitous in industrial and machine learning applications, arising from the need to address tasks such as resource allocation, threat detection, and model training. Within the realm of mathematical optimization, nonsmooth problems stand as some of the most challenging; yet they offer the possibility of developing efficient algorithms with provable guarantees. The complexity of these problems, encompassing both lower and upper bounds, has historically typically been examined under generic assumptions, bounding the growth of the objective functions by assuming Lipschitz continuity of either the objective itself or its gradient. In this talk, I will argue that these traditional assumptions defining classes of nonsmooth optimization problems inadequately capture local properties of problems that may make them amenable to more efficient algorithms. I will introduce a notion of local bounded variation of the (sub)gradient and discuss how, under this notion, we can obtain a more fine-grained characterization of the complexity of nonsmooth problems. One consequence of these results is that, contrary to prior belief, the complexity of nonsmooth optimization problems, such as those with piecewise linear objectives with polynomially many pieces, can be improved using parallelization even in high dimensions, either if the problem is unconstrained or if the function is allowed to be queried outside the feasible set.
Based on joint work with Cristóbal Guzmán; preprint available at https://arxiv.org/abs/2403.16317
[2] https://bwyogatama.github.io/
[3] https://www.laurastegner.com/
[5] https://pages.cs.wisc.edu/~sujayyadalam/
[7] https://nikoszarifis.github.io/
[a] https://dl.acm.org/doi/abs/10.1002/aaai.12147
The National Science Foundation (NSF) Artificial Intelligence (AI) Institute for Edge Computing Leveraging Next Generation Networks (Athena) seeks to foment a transformation in modern edge computing by advancing AI foundations, computing paradigms, networked computing systems, and edge services and applications from a completely new computing perspective. Led by Duke University, Athena leverages revolutionary developments in computer systems, machine learning, networked computing systems, cyber‐physical systems, and sensing. Members of Athena form a multidisciplinary team from eight universities. Athena organizes its research activities under four interrelated thrusts supporting edge computing: Foundational AI, Computer Systems, Networked Computing Systems, and Services and Applications, which constitute an ambitious and comprehensive research agenda. The research tasks of Athena will focus on developing AI‐driven next‐generation technologies for edge computing and new algorithmic and practical foundations of AI and evaluating the research outcomes through a combination of analytical, experimental, and empirical instruments, especially with target use‐inspired research. The researchers of Athena demonstrate a cohesive effort by synergistically integrating the research outcomes from the four thrusts into three pillars: Edge Computing AI Systems, Collaborative Extended Reality (XR), and Situational Awareness and Autonomy. Athena is committed to a robust and comprehensive suite of educational and workforce development endeavors alongside its domestic and international collaboration and knowledge transfer efforts with external stakeholders that include both industry and community partnerships.
[b] https://arxiv.org/html/2403.04660v1
First responders (FRs) navigate hazardous, unfamiliar environments in the field (e.g., mass-casualty incidents), making life-changing decisions in a split second. AR head-mounted displays (HMDs) have shown promise in supporting them due to its capability of recognizing and augmenting the challenging environments in a hands-free manner. However, the design space have not been thoroughly explored by involving various FRs who serve different roles (e.g., firefighters, law enforcement) but collaborate closely in the field. We interviewed 26 first responders in the field who experienced a state-of-the-art optical-see-through AR HMD, as well as its interaction techniques and four types of AR cues (i.e., overview cues, directional cues, highlighting cues, and labeling cues), soliciting their first-hand experiences, design ideas, and concerns. Our study revealed both generic and role-specific preferences and needs for AR hardware, interactions, and feedback, as well as identifying desired AR designs tailored to urgent, risky scenarios (e.g., affordance augmentation to facilitate fast and safe action). While acknowledging the value of AR HMDs, concerns were also raised around trust, privacy, and proper integration with other equipment. Finally, we derived comprehensive and actionable design guidelines to inform future AR systems for in-field FRs.
[c] https://www.microsoft.com/en-us/research/uploads/prodnew/2024/03/nsdi24fall-final668-CloudLoRA.pdf
The Cloud Radio Access Network (CRAN) architecture has been proposed as a way of addressing the network throughput and scalability challenges of large-scale LoRa networks. CRANs can improve network throughput by coherently combining signals, and scale to multiple channels by implementing the receivers in the cloud. However, in remote LoRa deployments, a CRAN’s demand for high-backhaul bandwidths can be challenging to meet. Therefore, bandwidth-aware compression of LoRa samples is needed to reap the benefits of
CRANs. We introduce Cloud-LoRa, the first practical CRAN for LoRa, that can detect sub-noise LoRa signals and perform bandwidth-adaptive compression. To the best of our knowledge, this is the first demonstration of CRAN for LoRa operating in real-time. We deploy Cloud-LoRa in an agricultural field over multiple days with USRP as the gateway. A cellular backhaul hotspot is then used to stream the compressed samples to a Microsoft Azure server. We demonstrate
SNR gains of over 6 dB using joint multi-gateway decoding and over 2x throughput improvement using state-of-the-art receivers, enabled by CRAN in real-world deployments.
[d] https://ieeexplore.ieee.org/document/10330695
We present a novel framework, GreyLambda, to improve the scalability of traffic engineering (TE) systems. TE systems continuously monitor traffic and allocate network resources based on observed demands. The temporal requirement for TE is to have a time-to-solution in five minutes or less. Additionally, traffic allocations have a spatial requirement, which is to enable all traffic to traverse the network without encountering an over-subscribed link. However, the multi-commodity flow-based TE formulation cannot scale with increasing network sizes. Recent approaches have relaxed multi-commodity flow constraints to meet the temporal requirement but fail to satisfy the spatial requirement due to changing traffic demands, resulting in oversubscribed links or infeasible solutions. To satisfy both these requirements, we utilize optical topology programming (OTP) to rapidly reconfigure optical wavelengths in critical network paths and provide localized bandwidth scaling and new paths for traffic forwarding. GreyLambda integrates OTP into TE systems by introducing a heuristic algorithm that capitalizes on latent hardware resources at high-degree nodes to offer bandwidth scaling, and a method to reduce optical path reconfiguration latencies. Our experiments show that GreyLambda enhances the performance of two state-of-the-art TE systems, SMORE and NCFlow in real-world topologies with challenging traffic and link failure scenarios.
[e] https://dl.acm.org/doi/10.1145/3626778
We present a longitudinal study of intercontinental long-haul links (LHLs) – links with latencies significantly higher than that of all other links in a traceroute path. Our study is motivated by the recognition of these LHLs as a network-layer manifestation of critical transoceanic undersea cables. We present a methodology and associated processing system for identifying long-haul links in traceroute measurements. We apply this system to a large corpus of traceroute data and report on multiple aspects of long haul connectivity including country-level prevalence, routers as international gateways, preferred long-haul destinations, and the evolution of these characteristics over a 7 year period. We identify 85,620 layer-3 links (out of 2.7M links in a large traceroute dataset) that satisfy our definition for intercontinental long haul with many of them terminating in a relatively small number of nodes. An analysis of connected components shows a clearly dominant component with a relative size that remains stable despite a significant growth of the long-haul infrastructure.
[f] https://dl.acm.org/doi/10.1145/3613905.3652036
The ubiquitous use of technology by college students makes them vulnerable to harassment, harm, and intimidation via technological means. We evaluate the prevalence of such technology-facilitated abuse (TFA) among college students in the USA using a critical, feminist, and trauma-informed lens, which is essential to inform policymakers and advocates who support students. We surveyed 776 college students in a large R1 university located in the Midwest region of the USA to examine the prevalence of TFA faced by students marginalized by socio-economic factors, the support sought by student survivors, and the efficacy of support structures. Our findings indicate that 70% students experience TFA, but more than half of them do not seek support. Among the survivors who seek support, 93% students solely rely on informal resources like friends and family, and 6% solely seek support from formal networks such as survivor services or law enforcement. Therefore, we call on policymakers to direct attention to TFA, create tailored interventions to support marginalized students and propose campus-wide campaigns to spread awareness among college students.
[g] https://epubs.siam.org/doi/10.1137/1.9781611977912.115
We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where n sensors are measuring some common, ongoing phenomenon. At each time step t = 1, 2,. ., T, the ith sensor reports its readings for that time step. The algorithm must then commit to its estimate μt for the true mean value of the process at time t. We assume that most of the sensors observe independent samples from some common distribution X, but an ɛ-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation μ to the true mean μ* := E[X]. We note that if the algorithm is allowed to wait until time T to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation.
We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of (ɛ, δ)-stability, we give an efficient online algorithm that outputs estimates μt, t ∈ [T], such that with high probability it holds that ‖μ — μ*‖2 = O (δ log (T)), where μ = (μt)t ∈[T]. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve ℓ2-error of O(δ). Our second main result shows that with additional assumptions on the input (most notably that X is a product distribution) there are inefficient algorithms whose error does not depend on T at all.
[h] https://arxiv.org/abs/2403.05757
Teleoperation systems map operator commands from an input device into some coordinate frame in the remote environment. This frame, which we call a control coordinate system, should be carefully chosen as it determines how operators should move to get desired robot motions. While specific choices made by individual systems have been described in prior work, a design space, i.e., an abstraction that encapsulates the range of possible options, has not been codified. In this paper, we articulate a design space of control coordinate systems, which can be defined by choosing a direction in the remote environment for each axis of the input device. Our key insight is that there is a small set of meaningful directions in the remote environment. Control coordinate systems in prior works can be organized by the alignments of their axes with these directions and new control coordinate systems can be designed by choosing from these directions. We also provide three design criteria to reason about the suitability of control coordinate systems for various scenarios. To demonstrate the utility of our design space, we use it to organize prior systems and design control coordinate systems for three scenarios that we assess through human-subject experiments. Our results highlight the promise of our design space as a conceptual tool to assist system designers to design control coordinate systems that are effective and intuitive for operators.
[i] https://link.springer.com/chapter/10.1007/978-3-031-57728-4_3
Functional Encryption (FE) is a powerful notion of encryption which enables computations and partial message recovery of encrypted data. In FE, each decryption key is associated with a function f such that decryption recovers the function evaluation f(m) from an encryption of m. Informally, security states that a user with access to function keys (and so on) can only learn (and so on) but nothing more about the message. The system is said to be q-bounded collusion resistant if the security holds as long as an adversary gets access to at most decryption keys.
However, until very recently, all these works studied bounded collusion resistance in a static model, where the collusion bound q was a global system parameter. While the static model has led to many great applications, it has major drawbacks. Recently, Agrawal et al. (Crypto 2021) and Garg et al. (Eurocrypt 2022) introduced the dynamic model for bounded collusion resistance, where the collusion bound q was a fluid parameter, not globally set, but chosen by each encryptor. The dynamic model enabled harnessing many virtues of the static model, while avoiding its various drawbacks. In this work, we provide a generic compiler to upgrade any FE scheme from the static model to the dynamic model.
We also extend our techniques to multi-authority attribute-based encryption (MA-ABE). We show that bounded collusion MA-ABE supporting predicates that can be represented as an efficient computational secret sharing (CSS) scheme can be built from minimal assumptions. Efficient CSS schemes are known for access structures whose characteristic function can be computed by a polynomial-size monotone circuit under the existence of one-way functions [Yao89, unpublished]. Thus, our MA-ABE construction is the first MA-ABE scheme from standard assumptions for predicates beyond polynomial-size monotone boolean formula. Our construction also satisfies full adaptive security in the Random Oracle Model.
[j] https://arxiv.org/abs/2301.12357
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such optimal data collection strategy for policy evaluation involving heteroscedastic reward noise in the linear bandit setting. We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting that reduces the MSE of the value of the target policy. We then use this formulation to derive the optimal allocation of samples per action during data collection. We then introduce a novel algorithm SPEED (Structured Policy Evaluation Experimental Design) that tracks the optimal design and derive its regret with respect to the optimal design. Finally, we empirically validate that SPEED leads to policy evaluation with mean squared error comparable to the oracle strategy and significantly lower than simply running the target policy.
[k] https://arxiv.org/abs/2202.05687
Detecting diffusion-generated deepfake images remains an open problem. Current detection methods fail against an adversary who adds imperceptible adversarial perturbations to the deepfake to evade detection. In this work, we propose Disjoint Diffusion Deepfake Detection (D4), a deepfake detector designed to improve black-box adversarial robustness beyond de facto solutions such as adversarial training. D4 uses an ensemble of models over disjoint subsets of the frequency spectrum to significantly improve adversarial robustness. Our key insight is to leverage a redundancy in the frequency domain and apply a saliency partitioning technique to disjointly distribute frequency components across multiple models. We formally prove that these disjoint ensembles lead to a reduction in the dimensionality of the input subspace where adversarial deepfakes lie, thereby making adversarial deepfakes harder to find for black-box attacks. We then empirically validate the D4 method against several black-box attacks and find that D4 significantly outperforms existing state-of-the-art defenses applied to diffusion-generated deepfake detection. We also demonstrate that D4 provides robustness against adversarial deepfakes from unseen data distributions as well as unseen generative techniques.
[l] https://ieeexplore.ieee.org/abstract/document/10136161
Designing deep networks robust to adversarial examples remains an open problem. Recently, it was shown that adversaries relying on only top-1 feedback (i.e., the hard-label) from an image classification model can arbitrarily shift an image towards an intended target prediction. Likewise, these hard-label adversaries enjoy performance comparable to first-order adversaries relying on the full model gradient. It was also shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors. In this paper, we argue that query efficiency in the hard-label setting is also connected to an adversary’s traversal through the data manifold. To explain this behavior, we propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary’s distribution of gradient estimates. Through numerical experiments of manifold-gradient mutual information, we show this behavior acts as a function of the effective problem dimensionality. On high-dimensional real-world datasets and multiple hard-label attacks using dimension reduction, we observe the same behavior to produce samples closer to the data manifold. This can result in up to 10x decrease in the manifold distance measure, regardless of the model robustness. Our results suggest that our variant of hard-label attack can find a higher concentration of generalization errors than previous techniques, leading to improved worst-case analysis for model designers.
[m] https://dl.acm.org/doi/10.1145/3613904.3642188
In recent years, there has been a growing interest in enhancing the accessibility of visualizations for people with visual impairments. While much of the research has focused on improving accessibility for screen reader users, the specific needs of people with remaining vision (i.e., low-vision individuals) have been largely unaddressed. To bridge this gap, we conducted a qualitative study that provides insights into how low-vision individuals experience visualizations. We found that participants utilized various strategies to examine visualizations using the screen magnifiers and also observed that the default zoom level participants use for general purposes may not be optimal for reading visualizations. We identified that participants relied on their prior knowledge and memory to minimize the traversing cost when examining visualization. Based on the findings, we motivate a personalized tool to accommodate varying visual conditions of low-vision individuals and derive the design goals and features of the tool.
[o] https://arxiv.org/abs/2307.15255
This paper presents predicate transfer, a novel method that optimizes join performance by pre-filtering tables to reduce the join input sizes. Predicate transfer generalizes Bloom join, which conducts pre-filtering within a single join operation, to multi-table joins such that the filtering benefits can be significantly increased. Predicate transfer is inspired by the seminal theoretical results by Yannakakis, which uses semi-joins to pre-filter acyclic queries. Predicate transfer generalizes the theoretical results to any join graphs and use Bloom filters to replace semi-joins leading to significant speedup. Evaluation shows predicate transfer can outperform Bloom join by 3.1x on average on TPC-H benchmark.
In precision livestock farming, the individual identification of cattle is crucial to inform the decisions made to enhance animal welfare, health, and productivity. In literature, models exist that can read ear tags; however, they are not easily portable to real-world cattle production environments and make predictions mainly on still images. We propose a video-based cattle ear tag reading system, called ReadMyCow, which takes advantage of the temporal characteristics in videos to accurately detect, track, and read cattle ear tags at 25 FPS on edge devices. For each frame in a video, ReadMyCow functions in two steps. 1) Tag detection: a YOLOv5s Object Detection model and NVIDIA Deepstream Tracking Layer detect and track the tags present. 2) Tag reading: the novel WhenToRead module decides whether to read each tag, using a TRBA Scene Text Recognition model, or to use the reading from a previous frame. The system is implemented on an edge device, namely the NVIDIA Jetson AGX Orin or Xavier, making it portable to cattle production environments without external computational resources. To attain real-time speeds, ReadMyCow only reads the detected tag in the current frame if it thinks it will get a better reading when a decision metric is significantly improved in the current frame. Ideally, this means the best reading of a tag is found and stored throughout a tag’s presence in the video, even when the tag becomes occluded or blurry. While testing the system at a real Midwestern dairy farm housing 9,000 cows, 96.1% of printed ear tags were accurately read by the ReadMyCow system, demonstrating its real-world commercial potential. ReadMyCow opens opportunities for informed data-driven decision-making processes on commercial cattle farms.
[q] https://arxiv.org/abs/2201.08984
Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL — representation learning and label disambiguation — in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.
[r] https://arxiv.org/abs/2201.08984
Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL — representation learning and label disambiguation — in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.
[s] https://arxiv.org/abs/2209.10365
Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods.
[t] https://arxiv.org/abs/2403.14041
Learning companion robots for young children are increasingly adopted in informal learning environments. Although parents play a pivotal role in their children’s learning, very little is known about how parents prefer to incorporate robots into their children’s learning activities. We developed prototype capabilities for a learning companion robot to deliver educational prompts and responses to parent-child pairs during reading sessions and conducted in-home user studies involving 10 families with children aged 3-5. Our data indicates that parents want to work with robots as collaborators to augment parental activities to foster children’s learning, introducing the notion of parent-robot collaboration. Our findings offer an empirical understanding of the needs and challenges of parent-child interaction in informal learning scenarios and design opportunities for integrating a companion robot into these interactions. We offer insights into how robots might be designed to facilitate parent-robot collaboration, including parenting policies, collaboration patterns, and interaction paradigms.
[u] https://dl.acm.org/doi/10.1145/3613904.3642950
Assistive technologies for adults with Down syndrome (DS) need designs tailored to their specific technology requirements. While prior research has explored technology design for individuals with intellectual disabilities, little is understood about the needs and expectations of adults with DS. Assistive technologies should leverage the abilities and interests of the population, while incorporating age- and context-considerate content. In this work, we interviewed six adults with DS, seven parents of adults with DS, and three experts in speech-language pathology, special education, and occupational therapy to determine how technology could support adults with DS. In our thematic analysis, four main themes emerged, including (1) community vs. home social involvement; (2) misalignment of skill expectations between adults with DS and parents; (3) family limitations in technology support; and (4) considerations for technology development. Our findings extend prior literature by including the voices of adults with DS in how and when they use technology.
[v] https://dl.acm.org/doi/10.1145/3620665.3640367
We are in age of AI, with rapidly changing algorithms and a somewhat synergistic change in hardware. MLPerf is a recent benchmark suite that serves as a way to compare and evaluate hardware. However it has several drawbacks – it is dominated by CNNs and does a poor job of capturing the diversity of AI use cases, and only represents a sliver of production AI use cases. This paper performs a longitudinal study of state-of-art AI applications spanning vision, physical simulation, vision synthesis, language and speech processing, and tabular data processing, across three generations of hardware to understand how the AI revolution has panned out. We call this collection of applications and execution scaffolding the CaSiO suite. The paper reports on data gathered at the framework level, device API level, and hardware and microarchitecture level. The paper provides insights on the hardware-software revolution with pointers to future trends.
[w] https://ieeexplore.ieee.org/document/10476447
Graphics processing units (GPUs) are an important class of parallel processors that offer high compute throughput and memory bandwidth. GPUs are used in a variety of important computing domains, such as machine learning, high performance computing, sparse linear algebra, autonomous vehicles, and robotics. However, some applications from these domains can underperform due to sensitivity to memory latency and bandwidth. Some of this sensitivity can be reduced by better overlapping memory access with compute. Current GPUs often leverage pipeline parallelism in the form of warp specialization to enable better overlap. However, current warp specialization support on GPUs is limited in three ways. First, warp specialization is a complex and manual program transformation that is out of reach for many applications and developers. Second, it is limited to coarse-grained transfers between global memory and the shared memory scratchpad (SMEM); fine-grained memory access patterns are not well supported. Finally, the GPU hardware is unaware of the pipeline parallelism expressed by the programmer, and is unable to take advantage of this information to make better decisions at runtime. In this paper we introduce WASP, hardware and compiler support for warp specialization that addresses these limitations. WASP enables fine-grained streaming and gather memory access patterns through the use of warp-level register file queues and hardware-accelerated address generation. Explicit warp to pipeline stage naming enables the GPU to be aware of pipeline parallelism, which WASP capitalizes on by designing pipeline-aware warp mapping, register allocation, and scheduling. Finally, we design and implement a compiler that can automatically generate warp specialized kernels, reducing programmer burden. Overall, we find that runtime performance can be improved on a variety of important applications by an average of 47% over a modern GPU baseline.
[x] https://arxiv.org/abs/2401.16677
Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed techniques can overlap, and thus, hide this communication with independent computations, techniques such as Tensor Parallelism (TP) inherently serialize communication with model execution. One approach to hide this serialized communication is to interleave it with the producer operation (of the communicated data) in a fine-grained manner. However, this fine-grained interleaving of communication and computation in software can be difficult. Furthermore, as with any concurrent execution, it requires compute and memory resources to be shared between computation and communication, causing resource contention that reduces overlapping efficacy.
To overcome these challenges, we propose T3 which applies hardware-software co-design to transparently overlap serialized communication while minimizing resource contention with compute. T3 transparently fuses producer operations with the subsequent communication via a simple configuration of the producer’s output address space and requires minor software changes. At the hardware level, T3 adds a lightweight track and trigger mechanism to orchestrate the producer’s compute, and communication. It further uses compute-enhanced memories for communication’s attendant compute. As a result, T3 reduces resource contention, and efficiently overlaps serialized communication with computation. For important Transformer models like T-NLG, T3 speeds up communication-heavy sublayers by 30% geomean (max 47%) and reduces data movement by 22% geomean (max 36%). Furthermore, T3’s benefits persist as models scale: geomean 29% for sublayers in ∼500-billion parameter models, PALM and MT-NLG.
[y] https://dl.acm.org/doi/10.1145/3617232.3624854
Modern storage devices, such as Optane NVMe SSDs, offer ultra-low latency of a few microseconds and high bandwidth of multiple gigabytes per second. At these speeds, the kernel software I/O stack is a substantial source of overhead. Userspace approaches avoid kernel software overheads but face challenges in supporting shared storage without major changes to file systems, the OS or the hardware.
We propose a new I/O architecture, BypassD, for fast, userspace access to shared storage devices. BypassD takes inspiration from virtual memory: it uses virtual addresses to access a device and relies on hardware for translation and protection. Like memory-mapping a file, the OS kernel constructs a mapping for file contents in the page table. Userspace I/O requests then use virtual addresses from these mappings to specify which file and file offset to access. BypassD extends the IOMMU hardware to translate file offsets into device Logical Block Addresses. Existing applications require no modifications to use BypassD. Our evaluation shows that BypassD reduces latency for 4KB accesses by 42% compared to standard Linux kernel and performs close to userspace techniques like SPDK that do not support device sharing. By eliminating software overheads, BypassD improves performance of real workloads, such as the WiredTiger storage engine, by ~20%.
[z] https://dl.acm.org/doi/10.1145/3650203.3663336
Recent research has shown the importance of tuning DBMS configuration knobs to achieve high performance. As a result, a large number of search-based and learning-based auto-tuning methods have been proposed. However, despite the promising results, we observe that (1) developing new auto-tuning methods, and (2) comparing them to existing methods is cumbersome and requires extensive engineering effort. This effort is compounded by the fact that auto-tuning methods typically need to be evaluated on multiple systems (PostgreSQL, MySQL, etc.) and benchmarks (TPC-C, TPC-H, etc.). In this paper we describe Nautilus, a platform that automates benchmarking for DBMS configuration tuning. Our insight in building Nautilus is that the process of developing auto-tuners can be separated into a tuning algorithm layer and a benchmarking layer. We focus on automating the benchmarking layer and thus make it easier for researchers to develop new tuning methods and compare with existing methods. We describe the design and implementation of Nautilus, highlighting its main features and advantages to the end-user. Furthermore, we are open-sourcing this project, so that the research community can freely use and benefit from it.
[aa] https://arxiv.org/abs/2312.12621
Deep Learning (DL) workloads have rapidly increased in popularity in enterprise clusters and several new cluster schedulers have been proposed in recent years to support these workloads. With rapidly evolving DL workloads, it is challenging to quickly prototype and compare scheduling policies across workloads. Further, as prior systems target different aspects of scheduling (resource allocation, placement, elasticity etc.), it is also challenging to combine these techniques and understand the overall benefits. To address these challenges we propose Blox, a modular toolkit which allows developers to compose individual components and realize diverse scheduling frameworks. We identify a set of core abstractions for DL scheduling, implement several existing schedulers using these abstractions, and verify the fidelity of these implementations by reproducing results from prior research. We also highlight how we can evaluate and compare existing schedulers in new settings: different workload traces, higher cluster load, change in DNN workloads and deployment characteristics. Finally, we showcase Blox’s extensibility by composing policies from different schedulers, and implementing novel policies with minimal code changes.
[ab] https://dl.acm.org/doi/10.1145/3639286
The scaling of per-GB DRAM cost has slowed down in recent years. Recent research has suggested that adding remote memory to a system can further reduce the overall memory cost while maintaining good performance. Remote memory (i.e., tiered memory), connected to host servers via high-speed interconnect protocols such as RDMA and CXL, is expected to deliver 100x (less than 1µs) lower latency than SSD and be more cost-effective than local DRAM through pooling or adopting cheaper memory technologies.
Tiered memory opens up a large number of potential use cases within database systems. But previous work has only explored limited ways of using tiered memory. Our study provides a systematic study for DBMS to build tiered memory buffer management with respect to a wide range of hardware performance characteristics. Specifically, we study five different indexing designs that leverage remote memory in different ways and evaluate them through a wide range of metrics including performance, tiered-memory latency sensitivity, and cost-effectiveness. In addition, we propose a new memory provisioning strategy that allocates an optimal amount of local and remote memory for a given workload. Our evaluations show that while some designs achieve higher performance than others, no design can win in all measured dimensions.
[ac] https://dl.acm.org/doi/10.1145/3613904.3642829
Autism Spectrum Disorder (ASD) presents challenges in social interaction skill development, particularly in turn-taking. Digital interventions offer potential solutions for improving autistic children’s social skills but often lack addressing specific collaboration techniques. Therefore, we designed a prototype of a turn-taking collaborative tablet game, StarRescue, which encourages children’s distinct collaborative roles and interdependence while progressively enhancing sharing and mutual planning skills. We further conducted a controlled study with 32 autistic children to evaluate StarRescue’s usability and potential effectiveness in improving their social skills. Findings indicated that StarRescue has great potential to foster turn-taking skills and social communication skills (e.g., prompting, negotiation, task allocation) within the game and also extend beyond the game. Additionally, we discussed implications for future work, such as including parents as game spectators and understanding autistic children’s territory awareness in collaboration. Our study contributes a promising digital intervention for autistic children’s turn-taking social skill development via a scaffolding approach and valuable design implications for future research.
[ad] https://arxiv.org/html/2402.12772v1
Reading is a challenging task for low vision people. While conventional low vision aids (e.g., magnification) offer certain support, they cannot fully address the difficulties faced by low vision users, such as locating the next line and distinguishing similar words. To fill this gap, we present GazePrompt, a gaze-aware reading aid that provides timely and targeted visual and audio augmentations based on users’ gaze behaviors. GazePrompt includes two key features: (1) a Line-Switching support that highlights the line a reader intends to read; and (2) a Difficult-Word support that magnifies or reads aloud a word that the reader hesitates with. Through a study with 13 low vision participants who performed well-controlled reading-aloud tasks with and without GazePrompt, we found that GazePrompt significantly reduced participants’ line switching time, reduced word recognition errors, and improved their subjective reading experiences. A follow-up silent-reading study showed that GazePrompt can enhance users’ concentration and perceived comprehension of the reading contents. We further derive design considerations for future gaze-based low vision aids.
[ae] https://arxiv.org/html/2402.07300v2
Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce Spica, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, Spica offers interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, Spica augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of Spica and explored user behaviors, preferences, and mental models when interacting with augmented ADs.
[af] https://arxiv.org/abs/2311.05511
We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no longer sufficient, we show that there exist optimal deterministic policies augmented with cumulative costs. In fact, we present a fixed-parameter tractable reduction from anytime-constrained cMDPs to unconstrained MDPs. Our reduction yields planning and learning algorithms that are time and sample-efficient for tabular cMDPs so long as the precision of the costs is logarithmic in the size of the cMDP. However, we also show that computing non-trivial approximately optimal policies is NP-hard in general. To circumvent this bottleneck, we design provable approximation algorithms that efficiently compute or learn an arbitrarily accurate approximately feasible policy with optimal value so long as the maximum supported cost is bounded by a polynomial in the cMDP or the absolute budget. Given our hardness results, our approximation guarantees are the best possible under worst-case analysis