Notes for Weekly/Monthly Good News

For the Good News list, visit here. The notes below begin with the June 2024 Good News. For notes previous to June 2024, visit the Good News list.

September/October 2025

[1] SOHI 65 event in honor of Prof Guri Sohi
https://www.linkedin.com/posts/prof-todd-austin_universityofwisconsin-computerarchitecture-activity-7379506778926936064-V-5Q/

From Todd Austin’s linkedin post:

“Last Saturday I had the honor of joining the celebration of Prof. Guri Sohi’s 40 years at University of Wisconsin-Madison–Madison.

The event—Seminars in High-impact Innovations (SOHI)—brought together Guri’s PhD students, mentees, and colleagues from across the globe. The talks were inspiring and wide-ranging: from optimized floating-point formats, to new microarchitectural defenses against Spectre, to machine learning deployed in the field (literally!) to track crop growth and health.

I left with a deep appreciation for how profoundly one scholar’s vision can shape an entire discipline. Here’s to the next SOHI event (soon I hope)—and to celebrating the impact of mentors who elevate us all.”

[2] ACM CCS Test-of-Time Award

https://www.sigsac.org/ccs/CCS2025/awards/

Matt Fredrikson, Somesh Jha, Thomas Ristenpart

On behalf of the ACM CCS 2025 Organizing Committee, it is my great pleasure to inform you that your paper: “Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures” has been selected as a recipient of the ACM CCS Test-of-Time Award. This award recognizes research that has had a lasting impact on the field of computer and communications security, and your work has been chosen for its outstanding influence and contribution to the community.

https://dl.acm.org/doi/10.1145/2810103.2813677

Abstract: Machine-learning (ML) algorithms are increasingly utilized in privacy-sensitive applications such as predicting lifestyle choices, making medical diagnoses, and facial recognition. In a model inversion attack, recently introduced in a case study of linear classifiers in personalized medicine by Fredrikson et al., adversarial access to an ML model is abused to learn sensitive genomic information about individuals. Whether model inversion attacks apply to settings outside theirs, however, is unknown. We develop a new class of model inversion attack that exploits confidence values revealed along with predictions. Our new attacks are applicable in a variety of settings, and we explore two in depth: decision trees for lifestyle surveys as used on machine-learning-as-a-service systems and neural networks for facial recognition. In both cases confidence values are revealed to those with the ability to make prediction queries to models. We experimentally show attacks that are able to estimate whether a respondent in a lifestyle survey admitted to cheating on their significant other and, in the other context, show how to recover recognizable images of people’s faces given only their name and access to the ML model. We also initiate experimental exploration of natural countermeasures, investigating a privacy-aware decision tree training algorithm that is a simple variant of CART learning, as well as revealing only rounded confidence values. The lesson that emerges is that one can avoid these kinds of MI attacks with negligible degradation to utility.

CS article here: Somesh Jha wins ACM CCS Test-of-time award

[3] The IEEE VGTC Visualization Academy

https://tc.computer.org/vgtc/awards/visualization-academy/

The Visualization Academy is a prestigious academy that highlights the accomplishments of the leaders in the field. Since awards are limited in number, the field should recognize significant contributions and the academy is a mechanism for recognizing such contributions. The goal is to recognize those in the field who exceed or augment the criteria of existing VGTC awards.

Prof Gleicher is set to be inducted this year (2025).

[4] https://arxiv.org/abs/2510.18212

Hacker news discussion thread: https://news.ycombinator.com/item?id=45713959

Authors: Dan Hendrycks, Dawn Song, Christian Szegedy, Honglak Lee, Yarin Gal, Erik Brynjolfsson, Sharon Li, Andy Zou, Lionel Levine, Bo Han, Jie Fu, Ziwei Liu, Jinwoo Shin, Kimin Lee, Mantas Mazeika, Long Phan, George Ingebretsen, Adam Khoja, Cihang Xie, Olawale Salaudeen, Matthias Hein, Kevin Zhao, Alexander Pan, David Duvenaud, Bo Li, Steve Omohundro, Gabriel Alfour, Max Tegmark, Kevin McGrew, Gary Marcus, Jaan Tallinn, Eric Schmidt, Yoshua Bengio

Abstract: The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today’s specialized AI and human-level cognition. This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition. The framework dissects general intelligence into ten core cognitive domains-including reasoning, memory, and perception-and adapts established human psychometric batteries to evaluate AI systems. Application of this framework reveals a highly “jagged” cognitive profile in contemporary models. While proficient in knowledge-intensive domains, current AI systems have critical deficits in foundational cognitive machinery, particularly long-term memory storage. The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 57%) concretely quantify both rapid progress and the substantial gap remaining before AGI.

[5] https://www.sigsac.org/award/

ACM Special Interest Group on Security, Audit and Control (SIGSAC) awards page.

Prof Jha was awarded this year’s SIGSAC Excellence in Service Award.

[6] AMD AUP (AMD University Program) grant

PI: Matt Sinclair

Title: Characterizing, Harnessing, & Managing Power Excursions in GPUs

Amount: $97,500

Abstract: As SoCs grow in size and complexity, managing power across their diverse components is becoming increasingly challenging. Their power management architecture must manage not just DVFS states to achieve best performance, but also thermals and power stability issues. For large SoCs, this challenge is further exacerbated by applications with rapidly varying compute demand requirements. Applications with varying behavior can cause significant power swings that may impact the power stability of the SoC. We refer to these as power excursions and find that they can result in voltage instability across components, nodes, and, if enough nodes synchronize at scale, even the power grid. Although power excursions can occur in many applications, they are especially common in generative AI (Gen AI) large language models (LLMs) such as LLaMA. Previous work uses a variety of solutions to manage their impact, ranging from managing instruction execution characteristics to power delivery network solutions such as deep trench capacitors. However, all of these solutions have drawbacks that increase cost, increase provisioned and/or operational power, or impact performance. Instead, we propose to safely, temporarily modulate system parameters to improve efficiency for workloads with a variety of power excursion magnitudes. First, we proposed to characterize the magnitude and frequency of power excursions. Our preliminary data (in attached paper submission) suggests that modern HPC and ML workloads suffer from frequent, significant power excursions on AMD MI200- and 300-class GPUs. Second, we will leverage this characterization data to create a novel, systematic classification mechanism that identifies similar application characteristics via low-cost profiling for power and performance. We plan to do this by grouping similarly behaving workloads into a finite number of distinct classes, thereby reducing the overheads of extensively profiling new workloads. Next, we propose to develop a power prediction methodology that works across AMD GPUs using event-based models to provide accurate, fine-grained temporal and structural power estimates within 10% of power measurements. Finally, we propose a novel excursion management mechanism that uses both hardware and software to improve efficiency across HPC and ML workloads compared to systems without any excursion management. Specifically, this mechanism will use the characterization and prediction information from the previous steps to identify when a workload is likely to experience power excursions, and modulate the GPUs behavior just in time – therefore providing the efficiency gains without the downsides of excursions.

[7] (a) SONY Faculty Innovation Award

Title: Computer Vision in Extreme Low-Light

PI: Mohit Gupta

This is the fifth year in a row that Prof Gupta has received this award (probably the only lab so far to have received this five years running)

(b) NSF Core Small (Robust Intelligence)

Title: Weak 3D Cameras

PI: Mohit Gupta

Amount: $600K, Duration: 3 years

Abstract: This project proposes a new class of vision systems called weak 3D cameras, which are designed to recover simplified geometric representations of the environment that are sufficient for fast decision-making but far more efficient than conventional dense 3D reconstructions. While traditional 3D cameras aim to build detailed models of a scene, many real-world applications—such as drone navigation, robotic manipulation, and augmented reality—often require only partial or approximate geometry to function effectively. These applications also operate under tight constraints on power, compute, and latency, making dense 3D sensing impractical. This project explores design of vision systems that can perceive their surroundings through simpler, faster, and more robust representations. The research builds on two new kinds of geometric representations—Inertial Safety Maps (ISM) and Blocks World Sets (BWS)—which can be captured directly using low-cost hardware and lightweight algorithms. The project is organized into three thrusts: building compact cameras that work reliably in challenging real-world conditions; enhancing them with additional low-cost sensors and modern geometric priors and world models to improve performance; and extending their use beyond static geometry to understand motion and semantics. The goal is to create fast, affordable 3D vision for the next generation of intelligent machines, with potential uses in robotics, wearable tech, disaster response, and more. The outcome of this work will enable fast, low-power 3D perception on compact platforms, with potential impact across embedded vision, AR, and assistive technologies.

[8] Balancing Performance and Costs in Best Arm Identification

Michael O Harding, Kirthevasan Kandasamy

Accepted at NeurIPS

https://arxiv.org/abs/2505.20583

Abstract: We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners as to how to choose an approach and corresponding budget or confidence parameter. We propose a new formalism to avoid this dilemma altogether by minimizing a risk functional which explicitly balances the performance of the recommended arm and the cost incurred by learning this arm. In this framework, a cost is incurred for each observation during the sampling phase, and upon recommending an arm, a performance penalty is incurred for identifying a suboptimal arm. The learner’s goal is to minimize the sum of the penalty and cost. This new regime mirrors the priorities of many practitioners, e.g. maximizing profit in an A/B testing framework, better than classical fixed budget or confidence settings. We derive theoretical lower bounds for the risk of each of two choices for the performance penalty, the probability of misidentification and the simple regret, and propose an algorithm called DBCARE to match these lower bounds up to polylog factors on nearly all problem instances. We then demonstrate the performance of DBCARE on a number of simulated models, comparing to fixed budget and confidence algorithms to show the shortfalls of existing BAI paradigms on this problem.

A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing

Alex Clinton, Thomas Zeng, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy

Accepted at NeurIPS

https://www.arxiv.org/abs/2506.07272

Abstract: Modern data marketplaces and data sharing consortia increasingly rely on incentive mechanisms to encourage agents to contribute data. However, schemes that reward agents based on the quantity of submitted data are vulnerable to manipulation, as agents may submit fabricated or low-quality data to inflate their rewards. Prior work has proposed comparing each agent’s data against others’ to promote honesty: when others contribute genuine data, the best way to minimize discrepancy is to do the same. Yet prior implementations of this idea rely on very strong assumptions about the data distribution (e.g. Gaussian), limiting their applicability. In this work, we develop reward mechanisms based on a novel, two-sample test inspired by the Cramér-von Mises statistic. Our methods strictly incentivize agents to submit more genuine data, while disincentivizing data fabrication and other types of untruthful reporting. We establish that truthful reporting constitutes a (possibly approximate) Nash equilibrium in both Bayesian and prior-agnostic settings. We theoretically instantiate our method in three canonical data sharing problems and show that it relaxes key assumptions made by prior work. Empirically, we demonstrate that our mechanism incentivizes truthful data sharing via simulations and on real-world language and image data.

[9] “A Breath of Fresh Air: Visualizing How Networks “Breathe”” was accepted for publication in ACM HotNets ’25.

Stephen Jasina, Loqman Salamatian, Paul Barford, Mark Crovella and Walter Willinger. “A

Abstract: Computer networking research greatly benefits from the wealth of the techniques developed by the data visualization community. Nearly every article contains some type of bar chart, histogram, line graph, or scatter plot. There appears, however, to be some stagnation in the types of graphics to the point that they are some of the least memorable parts of papers. We argue that because humans are inherently vision-oriented creatures, understanding and memory retention of larger datasets would both improve with more innovative visualizations. Therefore, we make a call to the networking community to modernize its approach to producing visualizations. To that end, we describe a novel method to animate a network’s delay space across time. We demonstrate this method by analyzing data collected from ESnet, and we compare the newer method to more traditional visualizations.

[10] CHTC joined nearly 40 colleagues from 10 countries at the HTCondor European Workshop in Prague, celebrating global collaboration and innovation in high throughput computing

The Center for High Throughput Computing (CHTC), home of the HTCondor Software Suite, was proud to take part in the recent HTCondor European Workshop, held September 16–19, 2025, at the Institute of Physics of the Czech Academy of Sciences in Prague. Organized by the European HTCondor user community, the workshop brought together nearly 40 participants from about 10 countries—spanning government labs, universities, and industry—for a lively exchange of expertise and ideas. With 26 presentations and spirited discussions, the event showcased both technical advances and real-world applications of HTCondor. CHTC team members contributed to the program, offered office-hours style support for participants managing their own HTCondor installations, and were recognized among the organizers and developers whose efforts continue to drive the community forward. This international gathering highlighted not only the global reach of HTCondor but also CHTC’s central role in advancing distributed computing for research worldwide.

Workshop URL:

https://indico.cern.ch/event/1459943/overview

[11] https://dl.acm.org/doi/10.1145/3703403

Quanta Computer Vision

Authors: Varun Sundar, Mohit Gupta

ACM XRDS

Light impinges on a camera’s sensor as a collection of discrete quantized elements, or photons. An emerging class of devices, called single-photon sensors, offers the unique capability of detecting individual photons with high-timing precision. With the increasing accessibility of high-resolution single-photon sensors, we can now explore what computer vision would look like if we could operate on light, one photon at a time.

[12] NSF Safe-OSE: Scalably Detecting Inconsistencies Between Git Commit Messages and Code in Open-source Projects

https://www.nsf.gov/awardsearch/show-award?AWD_ID=2533192

PIs: Profs Ethan Cechetti, Matt Sinclair, Kassem Fawaz, Joshua San Miguel

Abstract: Computer systems research relies on software simulation to perform early-stage design-space exploration, quickly iterate on new hardware-software changes, and explore the viability of new ideas before committing to expensive hardware prototyping and fabrication. In computer architecture, gem5 is the most widely used open-source simulation tool, heavily employed by academia, industry, and national laboratories. Like many other open-source projects, gem5 relies on a small number of volunteer maintainers to manage the ever-growing numbers of users and proposed changes. These maintainers typically have no formal training in identifying potential security vulnerabilities. Moreover, computer architecture designs prototyped in gem5 are often incorporated directly into real hardware. Vulnerabilities in gem5 could thus cause practitioners to inadvertently introduce security vulnerabilities into hardware that cannot easily be patched, impacting future commercial hardware and threatening national infrastructure. This project provides a novel framework to strengthen the security of the gem5 codebase and any future hardware designed with it. This project will also increase the impact of results generated with gem5 for U.S. industry and national lab practitioners by allowing them to more safely use the cutting-edge, top-secret designs they prototype in gem5.

This project combines expertise across simulator design, computer architecture, programming languages, security, and language modeling to develop and deploy VULCAN (VULnerability Consistency ANalyzer). VULCAN is a framework that detects and mitigates security vulnerabilities before they are introduced to the gem5 repository. Specifically, VULCAN detects inconsistencies between commit messages and code changes, and discovers known hardware vulnerabilities. Overall, VULCAN comprises four components: 1) leverage of Large Language Models (LLMs) and static analysis to synthesize formal specifications from code diffs; 2) utilization of Natural Language Inference (NLI) to check consistency between the commit message and the synthesized code specification; 3) integration of a taxonomy of hardware vulnerabilities, which it uses to automatically detect if code changes introduce known issues; and 4) the design and integration of a user interface into gem5’s GitHub code review process, enabling maintainers to seamlessly consume findings from the other components.

[13] To be presented at HotNet’25.

Title: Server Chiplet Networking

Authors: Seunghyun An, Joontaek Oh, and Ming Liu

Abstract: Emerging chiplet-based server platforms and the resulting server chiplet networking present a fundamental shift in the (intra-)host network. Unlike conventional monolithic servers, compute chiplets, I/O chiplets, off-chip memory, and peripheral devices are communicating through a collection of heterogeneous interconnects and links, formalizing the new server chiplet network substrate, which has not been explored before. This paper makes an initial step by characterizing two generations of AMD EPYC chiplet servers, identifying four communication idiosyncrasies, and summarizing the design implications. We outline some future directions under server chiplet networking and discuss how to build next-generation server systems and applications.

[14] https://ls.wisc.edu/news/research-across-the-ai-spectrum

“Research Across the AI Spectrum”

Backed by funding from the National Science Foundation, College of Letters & Science faculty are tackling a red-hot research topic.

Article includes Prof Somesh Jha and Prof Bilge Mutlu

[15] https://www.nsf.gov/awardsearch/show-award?AWD_ID=2449768

PI: Prof Junjie Hu

Title: “Collaborative Research: III Small: Towards Large Open-World Foundation Models: Construction, Adaptation, and Deployment”

Abstract: Large language models (LLMs), such as GPT-4, also known as foundation models, represent a groundbreaking advancement in artificial intelligence, powering diverse applications such as chatbots, information retrieval engines, and scientific discovery tools. The success of LLMs is primarily attributed to their vast internal knowledge learned from massive unstructured texts. However, LLMs’ internal knowledge is inherently constrained by static snapshots of closed-world texts used for training. This limitation presents two key challenges when deploying such a closed-world LLM in an open-world environment where knowledge (including unstructured texts and structured data) is rapidly evolving. First, closed-world LLMs have a limited and static view of open-world knowledge, often leading to unfaithful yet overconfident hallucinations. Second, since they primarily process unstructured text, they struggle with tasks that require reasoning over structured data, such as databases or scientific records.

This project addresses key challenges in artificial intelligence by developing new algorithms, theorems, and systems to ensure the reliability of advanced foundation models. At its core, the research focuses on an open-world foundation model (OWFM), a powerful AI model designed to interact with ever-changing real-world information. This model is built on an open-world knowledge network (OKN), a flexible and expandable data structure that organizes semi-structured information from diverse sources. Moving beyond unstructured knowledge and closed-world LLMs, this project establishes a new paradigm of knowledge organization and foundation models in an open-world environment through three key thrusts. The first thrust creates a highly expandable OKN across domains and builds an OWFM with plug-and-play modular components. The second thrust develops adaptation methods that enable OWFMs to accurately answer questions on rapidly evolving topics. The third thrust deploys the OWFM in two knowledge-intensive, high-stakes applications: medical diagnostic reasoning and financial portfolio generation. Beyond advancing scientific research, this project also integrates its findings into educational activities, ensuring broad dissemination and real-world impact.

[16] Arpaci-Dusseau @ 25

https://docs.google.com/document/d/1-hF3Qh_KcrhofaeiYQizRcuxnRbk2lufwB5jFBM-SOc/

Arpaci-Dusseau students, organized from current (near) and oldest (at the top)

The AD reunion had nearly 50 people in attendance: most former PhDs (over ⅔ of the 32), a few masters and undergrad alumni who had published research with Andrea and Remzi, and all current students. Each PhD or postdoc gave a short presentation on their time here and their career since.

[17] “Radiance Fields from Photons”

Sacha Jungerman, Aryan Garg, and Mohit Gupta

ACM Transactions on Graphics (to be presented at SIGGRAPH 2026)

https://arxiv.org/abs/2407.09386

Abstract: Neural radiance fields, or NeRFs, have become the de facto approach for high-quality view synthesis from a collection of images captured from multiple viewpoints. However, many issues remain when capturing images in-the-wild under challenging conditions, such as low light, high dynamic range, or rapid motion leading to smeared reconstructions with noticeable artifacts. In this work, we introduce quanta radiance fields, a novel class of neural radiance fields that are trained at the granularity of individual photons using single-photon cameras (SPCs). We develop theory and practical computational techniques for building radiance fields and estimating dense camera poses from unconventional, stochastic, and high-speed binary frame sequences captured by SPCs. We demonstrate, both via simulations and a SPC hardware prototype, high-fidelity reconstructions under high-speed motion, in low light, and for extreme dynamic range settings.

[18] https://ieeevis.org/year/2025/welcome

Connor Bailey and Michael Gleicher. Anchoring and Alignment: Data Factors in Part-to-Whole Visualization. IEEE Vis ’25 Short Papers.

https://arxiv.org/abs/2508.01881

Abstract: We explore the effects of data and design considerations through the example case of part-to-whole data relationships. Standard part-to-whole representations like pie charts and stacked bar charts make the relationships of parts to the whole explicit. Value estimation in these charts benefits from two perceptual mechanisms: anchoring, where the value is close to a reference value with an easily recognized shape, and alignment where the beginning or end of the shape is aligned with a marker. In an online study, we explore how data and design factors such as value, position, and encoding together impact these effects in making estimations in part-to-whole charts. The results show how salient values and alignment to positions on a scale affect task performance. This demonstrates the need for informed visualization design based around how data properties and design factors affect perceptual mechanisms.

Manfred Klaffenboeck, Michael Gleicher, Johannes Sorger, Michael Wimmer, and Torsten Moeller. RSVP for VPSA : A Meta Design Study on Rapid Suggestive Visualization Prototyping for Visual Parameter Space Analysis. IEEE Transactions on Visualization and Computer Graphics, 1–18.

https://doi.org/10.1109/TVCG.2024.3431930

Abstract: Visual Parameter Space Analysis (VPSA) enables domain scientists to explore input-output relationships of computational models. Existing VPSA applications often feature multi-view visualizations designed by visualization experts for a specific scenario, making it hard for domain scientists to adapt them to their problems without professional help. We present RSVP, the Rapid Suggestive Visualization Prototyping system encoding VPSA knowledge to enable domain scientists to prototype custom visualization dashboards tailored to their specific needs. The system implements a task-oriented, multi-view visualization recommendation strategy over a visualization design space optimized for VPSA to guide users in meeting their analytical demands. We derived the VPSA knowledge implemented in the system by conducting an extensive meta design study over the body of work on VPSA. We show how this process can be used to perform a data and task abstraction, extract a common visualization design space, and derive a task-oriented VisRec strategy. User studies indicate that the system is user-friendly and can uncover novel insights.

[19] “Systems Research in the AI Era”

Prof Remzi Arpaci-Dusseau
Invited talk at KAIST (Republic of Korea)

Abstract: In this talk, I will discuss the “AI moment” we are in, how it will affect universities (more broadly), and how it will shape systems research in the years to come. These changes will not only affect our broad research mission, but also how we think about CS education (and, indeed, education at large).

[20] Adapting Frame-Based Networks for Stable and Robust Video Inference

Matthew Dutson, Nathan Labiosa, Yin Li, Mohit Gupta

NeurIPS 2025

https://neurips.cc/virtual/2025/poster/118835

Frame-based neural networks often exhibit temporal inconsistency, i.e., flickering or incoherent predictions, when applied to videos. This problem is amplified when the input frames contain time-varying corruptions such as noise. Many prior works propose stable video-optimized networks for individual tasks or groups of related tasks. In this work, we introduce a general approach for adapting frame-wise networks for stable inference on video data. We describe a class of stabilization adapters that can be inserted into virtually any architecture and a resource-efficient training process that can be performed with a frozen base network. We introduce a unified conceptual framework for describing temporal stability and corruption robustness, centered on a proposed accuracy-stability-robustness loss. We analyze the theoretical properties of this loss, identifying the conditions where it produces well-behaved training. Our experiments validate our approach on several vision tasks including denoising (NAFNet), image enhancement (HDRNet), and monocular depth (Depth Anything v2). Our method improves temporal stability and robustness against a range of image corruptions, while generally preserving or improving task accuracy.

[21] Title: Characterizing and Mitigating Performance Variability in Accelerator-Rich Systems

Prof Matt Sinclair, Talk at Google

Abstract: In recent years, to reach performance goals modern computing systems are increasingly turning to using large numbers of compute accelerators, which offer greater power efficiency and thus enable higher performance within a constrained power budget. However, using accelerators increases heterogeneity at multiple levels, including the architecture, resource allocation, competing user needs, and manufacturing variability. Accordingly, current and future systems need to efficiently handle many simultaneous jobs while balancing PM and multiple levels of heterogeneity. In recent work, we have demonstrated the extent of this variability in modern accelerator-rich systems (SC’22) and shown how to embrace variability in cluster-level job schedulers (SC’24). This work significantly improves the efficiency of modern systems for a range of ML workloads. However, scheduling jobs more efficiently at the software and runtime layers is limited in its ability to quickly, dynamically change policies as cluster conditions evolve. A major limiter to further improving efficiency is the lack of standards for exposing power information in modern accelerators. Thus, for future systems we propose to build on the insights generated by our optimizations for current systems, and apply co-design that makes the hardware, software, and runtime layers aware of the variance in the systems. To do this, we will design a standard for accelerators to expose PM information from the hardware to the software and runtime. Using this information, instead of performing PM locally, we plan to develop a global power management scheme to enable optimal PM decisions across accelerators and further reduce performance variability. In this talk, I will discuss our ongoing efforts at harnessing variability in accelerator-rich systems.

Title: Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems

Prof Matt Sinclair, Invited talk at University of Michigan

Abstract: In recent years, system designers have increasingly been turning to heterogeneous systems to improve performance and energy efficiency. Specialized accelerators are frequently used to improve the efficiency of computations that run inefficiently on conventional, general-purpose processors. As a result, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly using large numbers of accelerators (including GPUs) while providing better efficiency than CPU-based solutions. In particular, GPUs are widely used in these systems due to their combination of programmability and efficiency. Traditionally, GPUs are throughput-oriented, focused on data parallelism, and assume synchronization happens at a coarse granularity. However, programmers have begun using these systems for a wider variety of applications which exhibit different characteristics, including latency-sensitivity, mixes of both task and data parallelism, and fine-grained synchronization. Thus, future heterogeneous systems must evolve and make deadline-aware scheduling, more intelligent data movement, efficient fine-grained synchronization, and effective power management first-order design constraints. In the first part of this talk, I will discuss our efforts to apply hardware-software co-design to help future heterogeneous systems overcome these challenges and improve performance, energy efficiency, and scalability. Then, in the second part I will discuss how the on-going transition to chiplet-based heterogeneous systems exacerbates these challenges and how we address these challenges in chiplet-based heterogeneous systems by rethinking the control plane.

[22] Carter Sifferman, Mohit Gupta, and Michael Gleicher. 2025. Efficient Detection of Objects Near a Robot Manipulator via Miniature Time-of-Flight Sensors. IEEE Robotics and Automation Letters 10, 11 (November 2025), 11682–11689.

https://doi.org/10.1109/LRA.2025.3615037

We provide a method for detecting and localizing objects near a robot arm using arm-mounted miniature time-of-flight sensors. A key challenge when using arm-mounted sensors is differentiating between the robot itself and external objects in sensor measurements. To address this challenge, we propose a computationally lightweight method which utilizes the raw time-of-flight information captured by many off-the-shelf, low-resolution time-of-flight sensor. We build an empirical model of expected sensor measurements in the presence of the robot alone, and use this model at runtime to detect objects in proximity to the robot. In addition to avoiding robot self-detections in common sensor configurations, the proposed method enables extra flexibility in sensor placement, unlocking configurations which achieve more efficient coverage of a radius around the robot arm. Our method can detect small objects near the arm and localize the position of objects along the length of a robot link to reasonable precision. We evaluate the performance of the method with respect to object type, location, and ambient light level, and identify limiting factors on performance inherent in the measurement principle. The proposed method has potential applications in collision avoidance and in facilitating safe human-robot interaction.

[23] Getting the MOST out of your Storage Hierarchy with Mirror-Optimized Storage Tiering

Kaiwei Tu, University of Wisconsin–Madison; Kan Wu, Google; Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau

To appear, FAST ‘26

https://www.usenix.org/conférence/fast26/presentation/tu

We present Mirror-Optimized Storage Tiering (MOST), a novel tiering-based approach optimized for modern storage hierarchies. The key idea of MOST is to combine the load-balancing advantages of mirroring with the space-efficiency advantages of tiering. Specifically, MOST dynamically mirrors a small amount of hot data across storage tiers to efficiently balance load, avoiding costly migrations. As a result, MOST is as space-efficient as classic tiering while achieving better bandwidth utilization under I/O-intensive workloads. We implement MOST in Cerberus, a user-level storage management layer based on CacheLib. We show the efficacy of Cerberus through a comprehensive empirical study: across a range of static and dynamic workloads, Cerberus achieves better throughput than competing approaches on modern storage hierarchies especially under I/O-intensive and dynamic workloads.

[24] Yeping Wang and Michael Gleicher. 2025. Anytime Planning for End-Effector Trajectory Tracking. IEEE Robot. Autom. Lett. 10, 4 (April 2025), 3246–3253. https://doi.org/10.1109/LRA.2025.3540633

An IEEE RAL paper, selected for presentation at IROS ’26.

Abstract: End-effector trajectory tracking algorithms find joint motions that drive robot manipulators to track reference trajectories. In practical scenarios, anytime algorithms are preferred for their ability to quickly generate initial motions and continuously refine them over time. In this letter, we present an algorithmic framework that adapts common graph-based trajectory tracking algorithms to be anytime and enhances their efficiency and effectiveness. Our key insight is to identify guide paths that approximately track the reference trajectory and strategically bias sampling toward the guide paths. We demonstrate the effectiveness of the proposed framework by restructuring two existing graph-based trajectory tracking algorithms and evaluating the updated algorithms in three experiments.

[25] “Adapting Frame-Based Networks for Stable and Robust Video Inference”

Matthew Dutson, Nathan Labiosa, Yin Li, and Mohit Gupta

Proc. NeurIPS 2025

https://neurips.cc/virtual/2025/poster/118835

Abstract: Frame-based neural networks often exhibit temporal inconsistency, i.e., flickering or incoherent predictions, when applied to videos. This problem is amplified when the input frames contain time-varying corruptions such as noise. Many prior works propose stable video-optimized networks for individual tasks or groups of related tasks. In this work, we introduce a general approach for adapting frame-wise networks for stable inference on video data. We describe a class of stabilization adapters that can be inserted into virtually any architecture and a resource-efficient training process that can be performed with a frozen base network. We introduce a unified conceptual framework for describing temporal stability and corruption robustness, centered on a proposed accuracy-stability-robustness loss. We analyze the theoretical properties of this loss, identifying the conditions where it produces well-behaved training. Our experiments validate our approach on several vision tasks including denoising (NAFNet), image enhancement (HDRNet), and monocular depth (Depth Anything v2). Our method improves temporal stability and robustness against a range of image corruptions, while generally preserving or improving task accuracy.

August 2025

[1] https://www.vldb.org/awards_early_career.html

This award recognizes a researcher who has demonstrated research impact through a specific technical contribution of high significance since completing the Ph.D. The award is given for a specific technical contribution and not for the researcher’s accumulative contributions across diverse topics. The technical contribution must have been made after completing the Ph.D. work. In other words, research prior to obtaining the Ph.D. cannot be used towards this award. The candidate should have been awarded the Ph.D. no earlier than June 15, 8 years ago.

[2] https://www.ams.org/meetings/sectional/2322_program_ss25.html#title

This session welcomes talks in any area related to the work of Eric Bach. This primarily includes number theory and algorithms and related areas, but anything connected to Professor Bach’s work is welcome.

[3] “CAREER: Game-theoretic Foundations for Incentive-aware Data Sharing and Collaborative Machine Learning”
Kirthi Kandasamy

https://www.nsf.gov/funding/opportunities/career-faculty-early-career-development-program

Abstract: With recent advances in artificial intelligence, organizations increasingly recognize data as a vital resource for advancing scientific, economic, and societal progress. Many modern machine learning systems rely on large and complex datasets to learn patterns, generate insights, create new content, and support intelligent decision-making. However, the data needed to develop these models is often scattered across different organizations, with each holding only a portion of the necessary information. By sharing data with each other, instead of relying only on their own data, organizations can unlock new opportunities for discovery and innovation. Despite this potential, organizations often hesitate to share data unless there are clear incentives to do so. This raises important questions: Is it always beneficial for an organization to share data? Do participants receive benefits that reflect their contributions? What mechanisms can encourage data sharing while preventing participants from benefiting without contributing? When several parties collaborate to train a shared model, how should the benefits be allocated? This project addresses these questions by investigating the incentives that influence participation in data sharing and collaborative machine learning. The project aims to create environments where data sharing supports collaboration and innovation, ultimately enabling breakthroughs in areas such as health care, public policy, and education.

This project will develop new theoretical foundations for the design of protocols that govern data sharing and collaborative machine learning, focusing on two central incentive-related challenges: the balanced allocation of responsibilities and benefits, and reducing the potential for strategic behavior. The research will design protocols that assign data collection responsibilities in a balanced way, allocate shared data in line with contributions, and ensure that exchanges among competing participants are mutually beneficial. These protocols will also be designed to prevent strategic behaviors to exploit the system, such as avoiding data collection, withholding contributions, or providing incorrect data. The project will further explore multi-round collaborations, where participants exchange data over time, adapt their strategies, and operate under uncertainty about the value of others? data. The proposed methods will be evaluated through simulations and partnerships with Alzheimer?s disease research consortia. The project is expected to contribute broadly to the fields of machine learning and game theory. Education efforts will focus on the development of new courses, workshops, and research opportunities for undergraduate and high-school students.

[4] Systems, Methods, and Media for Estimating Depths in a Scene using Single-Photon Detectors and Binary Structured Light Patterns

Varun Sundar and Mohit Gupta

Patent details (WARF link)

Abstract: In accordance with some embodiments, systems, methods, and media for estimating depths in a scene using single-photon detectors and binary structured light patterns are provided. In some embodiments, a system comprises: a light source; an image sensor comprising an array including single-photon detectors; a processor programmed to: cause the light source to emit a sequence of n binary light patterns toward the scene, each comprising at least L columns, each of the patterns has a minimum stripe width of 8 columns; and n>logiL); cause, for each of then binary light patterns, the image sensor to capture a binary image of the scene, the binary image comprises a binary value for each of single DMD projector (Binary patterns) 20,000 Hz photon detectors such that each of the single-photon detectors generates a set of binary values; and estimate, for each single-photon detector, a depth value based on the set of binary values.

Systems, methods, and media for high dynamic range imaging using single-photon and conventional image sensor data

Felipe Gutierrez Barragan, YuHao Liu, Atul Ingle, Mohit Gupta, Andreas Velten

Patent details

Abstract: In accordance with some embodiments, systems, methods, and media for high dynamic range imaging using single-photon and conventional image sensor data are provided. In some embodiments, the system comprises: first detectors configured to detect a level of photons proportional to incident photon flux; second detectors configured to detect arrival of individual photons; a processor programmed to: receive, from the first detectors, first values indicative of photon flux from a scene with a first resolution; receive, from the second detectors, second values indicative of photon flux from the scene with a lower resolution; provide a first encoder of a trained machine learning model first flux values based on the first values, provide the second encoder of the model second flux values; receive, as output, values indicative of photon flux from the scene; and generate a high dynamic range image based on the third plurality of values.

[5] https://operations.access-ci.org/step

This program is intended for undergraduates who are interested in pursuing careers in cyberinfrastructure! Our goal is to provide students with exposure to these careers in an effort to grow the cyberinfrastructure workforce.

https://nationalsciencedatafabric.org/nsdf-ahm-2025-01

Abstract: The PATh project offers a variety of distributed data handling services. Researchers who place their workloads at PATh Access Points leverage these services to bring input data to their applications and to ship results back to archival sites. They also employ them to copy the applications, in most cases in the form of container images, to execution sites provided by the Open Science Pool (OSPool). Owners of Object Stores who federate datasets they host via the Open Science Data Federation (OSDF) use these services to make locally stored objects available to remote consumers for frequent reuse.

In the past 12 months PATh services have performed more than 3B Object transfers across more than 100 institutions. The distributed nature of the infrastructure requires objects to be copied across institution boundaries and to be buffered in memory or disk at multiple locations. These operations clearly introduce concerns about data integrity and access control. Do the researchers and Object providers who use these services believe that the PATh services are trustworthy?

The talk will review our ongoing effort to develop data handling services that we believe to be trustworthy. A key element of our current development work is transitioning our software tools and infrastructure into capability-based authorization. We hope that facilitating trust relationships through capabilities will help researchers and data providers to entrust their objects with the PATh services. Namely, to believe that the OSDF and the OSPool data handling services are trustworthy.

[6] GRC Annual Review, Celebration & Transition to SMART USA

https://www.src.org/program/grc/esh/events/

“Evaluating Accelerator Performance Under Advanced Cooling”

Authors: Akansha Chaudhari & Matthew D. Sinclair

Abstract: Escalating compute demands are driving power-dense heterogeneous architectures, making advanced cooling essential. Moreover, modern computing systems are becoming increasingly heterogeneous, yet simulation and modeling tools struggle to effectively model these systems. In this work, we have added support to the state-of-the-art gem5 to model systems with CPUs, GPUs, and accelerator. This enables their early-stage exploration through high-fidelity simulation. Then, we demonstrate the value of this support by studying the amenability of these accelerators to advanced cooling techniques.

“Simulating and Modeling Hardware for Machine Learning Workloads at Scale”

Authors: Akanksha Chaudhari, Vishnu Ramadas, & Matthew D. Sinclair

Abstract: As both the size of Machine Learning (ML) and Artificial Intelligence (AI) hardware and the data they operate on scale-up, there is a growing challenge in performing early-system exploration based on sound simulation methodology. Using existing tools and infrastructure, a detailed simulation of a contemporary deep neural network can take years. In addition, the difficult task of modeling energy at scale, without working RTL for a newly proposed accelerator, or accelerator component (i.e., Tensor Cores), remains an important research challenge. In particular modeling energy in a way that is flexible and can be extended to both different types of programmable accelerators (like GPUs) and to more bespoke discrete accelerators (like TPUs) is needed. The broad goals of this proposal are to (1) reduce detailed simulation time, particularly on ML training and inference workloads, by characterizing execution in silicon, validated with detailed simulation, to find representative program slices, (2) develop a flexible energy model for accelerators, and (3) develop infrastructure to model the multi-GPU and multi-accelerator system of the future. To tackle these problems, we propose to extend existing simulation infrastructure, such as gem5, to create a scalable, parallel simulation framework that models diverse system components with varying levels of detail. In this talk, I discuss our on-going efforts to identify the most important parts of the applications to simulate at high fidelity while reducing simulation time.

[7] Prof Abedia led a measurement campaign at the VLA telescope in New Mexico, which conducted the first-ever active transmission experiment at the site. This was a joint effort between NRAO, NTIA, and SpectrumX, with me serving as the lead PI for the experiment on behalf of SpectrumX.

Summary: The 7.125–8.4 GHz band has recently emerged as a promising candidate for future 6G mobile networks due to its favorable propagation characteristics and global harmonization potential. Although not formally protected for radio astronomy, this band is currently one of the cleanest available and plays a critical role in high-sensitivity observations of short-duration and non-thermal cosmic phenomena. Given the extreme sensitivity of radio telescopes, even low-power mobile devices operating near observatories can introduce interference, potentially compromising scientific data. To investigate this, we conducted the first-ever active transmission experiment at the Very Large Array (VLA), using a custom-built platform to emulate 6G-like signals and study their impact on real astronomical observations. To explore potential mitigation strategies, we also introduced a narrow-band beacon signal to assist with interference detection, particularly when interference is near or below the telescope’s noise floor. This work aims to inform future spectrum-sharing efforts by providing experimental insights into both the risks and opportunities associated with the coexistence of 6G networks and radio astronomy.

[8] “Take The Long Way Home – Distant Peering to the Cloud”

Esteban Carisimo, Mia Weaver, Fabian Bustamante and Paul Barford

To Appear in IEEE/ACM Transactions on Networking.

Abstract: The emergence of large cloud providers in the last decade has transformed the Internet, resulting in a seemingly ever-growing set of datacenters, points of presence, and network peers. Despite the availability of closer peering locations, some networks continue to peer with cloud providers at distant locations, traveling thousands of kilometers. In this paper, we employ a novel cloud-based traceroute campaign to characterize the distances networks travel to peer with the cloud. This unique approach allows us to gain unprecedented insights into the peering patterns of networks. Our findings reveal that 50% of the networks peer within 300 kilometers of the nearest datacenter. However, our analysis also reveals that over 20% of networks travel at least 6,700 kilometers beyond the proximity of the nearest computing facility, and some as much as 18,791 kilometers! While these networks connect with the cloud worldwide, from South America to Europe and Asia, many come to peer with cloud providers in North America, even from Oceania and Asia. We explore possible motivations for the persistence of distant peering, discussing factors such as cost-effective routes, enhanced peering opportunities, and access to exclusive content.

[9] “LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics”

Xiangpeng Hao, Andrew Lamb, Yibo Wu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

VLDB

We present LiquidCache, a novel pushdown-based disaggregated caching system that evaluates filters on cache servers before transmitting data to compute nodes. Our key observation is that data decoding, not filter evaluation, is the primary CPU bottleneck in existing systems. To address this challenge, we propose caching logical data rather than its physical representation by transcoding upstream files on-the-fly into an optimized “Liquid” format. This format is co-designed with filter evaluation semantics to enable selective decoding, late materialization, and encoding-aware filter evaluation, delivering efficient filter evaluation while preserving compression benefits. The “Liquid” format exists exclusively in the cache, allowing easy adoption without changing existing storage formats, and enabling LiquidCache itself to evolve and incorporate future encodings while maintaining ecosystem compatibility. Through integration with Apache DataFusion and evaluation with ClickBench, we demonstrate that LiquidCache achieves up to 10× lower cache server CPU usage compared to state-of-the-art systems without increasing memory footprint.

“PANGOLIN: a Comprehensive Testing Framework for Configuration-Rich Key-Value Stores”

Shaohua Duan, Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

ACM SYSTOR

In this paper, we present PANGOLIN, a comprehensive testing framework for configuration-rich key-value stores. To better understand bugs in modern key-value stores and explore domain knowledge for efficiently identifying new ones, we first comprehensively study historical bugs in five mature key-value stores during the last eight years. Then, we design and implement PANGOLIN, which is motivated by insights from our bug study, which indicated most bugs could be identified by systematically testing a small sequence of operations and configurations. Specifically, PANGOLIN practices these insights by introducing a bounded testing strategy into a spectrum of black-box and fuzzing test procedures. Finally, we utilize PANGOLIN to find 20 bugs and reproduce 443 historical bugs in five mature key-value stores (RocksDB, LevelDB, HyperlevelDB, BadgerDB, and Redis), making it an attractive supplement to handwritten test suites.

[10] https://cdis.wisc.edu/preparing-students-for-the-ai-era-cdis-and-openai-launch-sail/

As AI continues to reshape the global job market, universities are developing new programs to help students navigate the changing landscape. At the University of Wisconsin–Madison, the School of Computer, Data & Information Sciences (CDIS) launched forward-looking initiative, supported by OpenAI, designed to prepare students to lead in the AI era.

Read the rest above…

[11] “Switched or Switchless: An Empirical Study of SAN Architecture for Disaggregated Storage”

Chendong Wang, Joontaek Oh, and Ming Liu
NSDI ‘26

Disaggregated storage is a pivotal component of today’s cluster infrastructures. With the advent of high-bandwidth server interconnects and new NVMe form factors, commodity storage appliances are becoming denser, delivering tens of millions of IOPS. This calls for today’s storage area network (SAN) fabric to expand the bandwidth capacity drastically. Industry practices tackle this issue via either (i) a scale-up approach, upgrading the per-port bandwidth in a switched SAN; or (ii) a scale-out strategy, integrating more paths in a switchless SAN. However, it is unclear which network architecture is more suitable for scaling storage disaggregation.

This paper presents an empirical study that compares a switched and switchless SAN from several angles. We first design an experimental methodology–consisting of small-scale real-system prototypes and large-scale simulations–that provides adequate flexibility when exploring SAN architecture. Next, we characterize NVMe-oF I/O flows and co-design the SAN traffic control with these characteristics to optimize I/O transmission performance in both architectures. Our evaluations draw several findings and show that the switchless SAN entails several advantages. For example, it delivers the same throughput on par with the switched SAN, albeit enclosing more routing hops, but with lower latency, because of the multiple load-aware I/O paths, mitigating I/O interference. Further, the switchless SAN saves capital costs by replacing expensive high-radix switches, scales with heterogeneous I/O flows, and eliminates the single-point ToR switch failure.

[12] Papers to appear in Science Robotics.

These are in collaboration with people at U Chicago and U Illinois Chicago.

Wright, L., Vegesna, P., Michaelis, J., Mutlu, B., and Sebo, S. Robotic Reading Companions Can Mitigate Oral Reading Anxiety in Children. Science Robotics, In press.

Michaelis, J., & Mutlu, B., How can educational robots enhance family life? Through careful integration. Science Robotics, In press.

[13] “Choreographic Quick Changes: First-Class Location (Set) Polymorphism”

Ashley Samuelson, Andrew K. Hirsch, Ethan Cecchetti

OOPSLA

Choreographic programming is a promising new paradigm for programming concurrent systems where a developer writes a single centralized program that compiles to individual programs for each node. Existing choreographic languages, however, lack critical features integral to modern systems, like the ability of one node to dynamically compute who should perform a computation and send that decision to others. This work addresses this gap with λQC, the first typed choreographic language with \emph{first class process names} and polymorphism over both types and (sets of) locations. λQC also improves expressive power over previous work by supporting algebraic and recursive data types as well as multiply-located values. We formalize and mechanically verify our results in Rocq, including the standard choreographic guarantee of deadlock freedom.

Paper details on arxiv: https://arxiv.org/abs/2506.10913

[14] Collaborative Research: CIRC: ENS: Enabling Detailed, Open-Source Accelerator Modeling

Prof Matt Sinclair and colleagues

Abstract: The rapid end of Moore’s Law and Dennard scaling has driven computing systems — from smartphones to supercomputers—to embrace heterogeneous architectures for continued efficiency gains. As specialized, compute-intensive workloads such as machine learning become increasingly prominent, there is a critical need for accurate, open-source simulation tools to model and evaluate the next generation of hardware accelerators. However, the pace of innovation in accelerator architectures, particularly graphics processing units (GPUs), has outstripped the capabilities of existing public simulation frameworks, limiting the research community’s ability to explore new ideas and validate results. This project addresses these challenges by enhancing the widely used Accel-Sim simulation infrastructure, enabling detailed, validated modeling of modern and future accelerators. The proposed enhancements will empower a broad community of researchers—including those from underrepresented groups — to advance innovations in computer architecture, improve system efficiency, and support the development of emerging applications that rely on high-performance accelerators.

This award will significantly extend Accel-Sim’s capabilities through three major technical thrusts. First, the project will modernize and expand Accel-Sim’s performance and energy models to support the latest GPU architectures (including NVIDIA’s Ampere, Hopper, and Blackwell), incorporating features such as transformer engines, sparse tensor cores, and support for asynchronous execution. Second, the project will broaden the diversity of accelerators and workloads modeled by Accel-Sim, adding support for GPUs from additional vendors (such as AMD and Intel) and integrating with broader system simulation frameworks like gem5. Third, the project will develop advanced workload sampling and telescopic level-of-detail modeling to enable scalable, accurate simulation of long-running, compute-heavy workloads. These enhancements will be delivered as robust, open-source tools, accompanied by extensive documentation, community outreach, and training resources — including tutorials and summer boot camps — to ensure broad accessibility and long-term sustainability. Collectively, these efforts will provide the research community with essential infrastructure to drive the next decade of accelerator innovation and foster a more inclusive and collaborative ecosystem for computer systems research.

[15] In July, a team of students and Prof Josiah Hanna participated in the international RoboCup competition and placed 3rd among the 12 teams invited to participate in the standard platform soccer competition in Salvador, Brazil. Two Wisconsin graduate students, Adam Labiosa and Abhinav Harish, led the effort at the competition, supported by many contributions from several other undergraduate and graduate students over the preceding months.

https://2025.robocup.org/

[16] Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution

Alex Clinton, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy

ICML 2025

https://arxiv.org/abs/2407.15881

Abstract: We study a collaborative learning problem where m agents aim to estimate a vector µ = (µ1, . . . , µd) ∈ R d by sampling from associated univariate normal distributions {N (µk, σ2 )}k∈[d] . Agent i incurs a cost ci,k to sample from N (µk, σ2 ). Instead of working independently, agents can exchange data, collecting cheaper samples and sharing them in return for costly data, thereby reducing both costs and estimation error. We design a mechanism to facilitate such collaboration, while addressing two key challenges: ensuring individually rational (IR) and fair outcomes so all agents benefit, and preventing strategic behavior (e.g., non-collection, data fabrication) to avoid socially undesirable outcomes. We design a mechanism and an associated Nash equilibrium (NE) which minimizes the social penalty–sum of agents’ estimation errors and collection costs–while being IR for all agents. We achieve a O( √ m)–approximation to the minimum social penalty in the worst case and an O(1)– approximation under favorable conditions. Additionally, we establish three hardness results: no nontrivial mechanism guarantees (i) a dominant strategy equilibrium where agents report truthfully, (ii) is IR for every strategy profile of other agents, (iii) or avoids a worst-case Ω(√ m) price of stability in any NE. Finally, by integrating concepts from axiomatic bargaining, we demonstrate that our mechanism supports fairer outcomes than one which minimizes social penalty.

[17] “Pretrained Hybrids with MAD Skills”

Nicholas Roberts, Samuel Guo, Zhiqi Gao, Srinath Namburi, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala Conference on Language Modeling (COLM), 2025

Abstract: While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed hybrid architectures seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose Manticore, 1 a framework that addresses these challenges. Manticore automates the design of hybrid architectures while reusing pretrained models to create pretrained hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by incorporating simple projectors that translate features between pretrained blocks from different architectures. We then finetune hybrids that combine pretrained models from different architecture families—such as the GPT series and Mamba—end-to-end. With Manticore, we enable LM selection without training multiple models, the construction of pretrained hybrids from existing pretrained models, and the ability to program pretrained hybrids to have certain capabilities. Manticore hybrids outperform existing manually-designed hybrids, achieve strong performance on Long Range Arena (LRA) tasks, and can improve on pretrained transformers and state space models.

https://arxiv.org/abs/2406.00894

“Theoretical Physics Benchmark (TPBench) – a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics”

Daniel J.H. Chung, Zhiqi Gao, Yurii Kvasiuk, Tianyi Li, Moritz Münchmeyer, Maja Rudolph, Frederic Sala, Sai Chaitanya Tadepalli

Machine Learning: Science and Technology 2025

Abstract: We introduce a benchmark to evaluate the capability of AI to solve problems in theoretical physics, focusing on high-energy theory and cosmology. The first iteration of our benchmark consists of 57 problems of varying difficulty, from undergraduate to research level. These problems are novel in the sense that they do not come from public problem collections. We evaluate our data set on various open and closed language models, including o3-mini, o1, DeepSeek-R1, GPT-4o and versions of Llama and Qwen. While we find impressive progress in model performance with the most recent models, our research-level difficulty problems are mostly unsolved. We address challenges of auto-verifiability and grading, and discuss common failure modes. While currently state-of-the art models are still of limited use for researchers, our results show that AI assisted theoretical physics research may become possible in the near future. We discuss the main obstacles towards this goal and possible strategies to overcome them. The public problems and solutions, results for various models, and updates to the data set and score distribution, are available on the website of the dataset.

https://arxiv.org/abs/2502.15815

[18] “Deploying RL with Confidence via Active and Offline Policy Evaluation”
Josiah Hanna

IJCAI Early Career Spotlight Invited Talk

Abstract: Recent years have seen a surge of interest in reinforcement learning (RL) as a powerful method for enabling AI agents to learn how to act so as to achieve the goals set by their designers. In practice, a crucial question in RL applications is how to decide when a learned policy is performant enough for deployment and, just as importantly, when a learned policy should not be deployed. In this talk, I will describe recent work from my group on methods that aim to enable RL practitioners to answer this question and thus to enable the use of RL in domains where extensive testing of learned policies is difficult or impossible. I will first talk about a line of work in my group on offline policy evaluation (OPE), or predicting the performance of an untested policy using data from previously used policies. The key novelty in these works is to leverage state abstraction and representation learning to scale OPE methods to more complex domains such as robot control. Then I will discuss a line of work on active data collection for data-efficient evaluation of RL policies. In this line of work, we have shown how to adaptively collect data in order to effectively evaluate an RL policy with as few real-world interactions as possible. Taken together, these lines of work are an important step toward instilling confidence in decision-making trained with RL.

“Thinking is Another Form of Control”

Josiah Hanna

“Most Thought-Provoking” paper award at the Finding the Frame Workshop at the Reinforcement Learning Conference (RLC)

Abstract: Dual process theory divides cognitive processing into a fast, intuitive System 1 and a slow, deliberative System 2. In reinforcement learning (RL), model-free learning, in which the agent takes actions with a reactive policy, is reminiscent of System 1, whereas model-based decision-time planning is reminiscent of System 2. This paper presents the view that deliberative, System 2 behaviors (“thinking”) can be considered a form of mental action that an agent performs before taking an action that influences its external environment. Under this view, we hypothesize that model-free RL alone would be sufficient to produce deliberation if these mental actions ultimately led to higher value actions being selected. We formalize the notion of a controllable “thought” state, then prove conditions under which “thinking” emerges as a strategy for reward maximization, and discuss how large language models serve as a proof-of-concept for thinking as mental action. Finally, we conclude by discussing new opportunities for research on model-free RL agents that learn both to think and act.

[19] “The Internet Visualization Exhibition”

Paul Barford and Loqman Salamatian

To appear at ACM SIGCOMM ’25.

Overview: As the Internet grows more complex, opaque, and deeply embedded in society, visualization plays an increasingly vital role in understanding its structure, behavior, and evolution. From classic time series to immersive interactive maps, visual tools help expose anomalies, reveal hidden geometries, and make abstract patterns tangible. The Internet Visualization Exhibition (IVE) celebrates this creative dimension of Internet measurement and systems research. It invites researchers, practitioners, and artists to share novel visualizations, images, animations, and interactive artifacts that distill insight from data and make Internet infrastructure more legible. IVE aims to foster dialogue between disciplines, promote visual thinking as a research skill, and inspire new ways of representing complexity at Internet scale. For more information, or to submit visuals to be featured in the session, please visit: https://burdantes.github.io/ive

[20] An undergraduate student of Prof Kirthi Kandasamy, Shangyuan (Peter) Yang, received an honorable mention for a CRA Outstanding Undergraduate Researcher Award.

[21] Morgridge Hall opening:

CapTimes: https://captimes.com/news/education/check-out-morgridge-hall-uw-madison-s-newest-building-on-campus/article_beac6e1d-0d8a-440a-9dc5-4cb10eb1d388.html

Milwaukee Journal Sentinel: https://www.jsonline.com/story/news/education/2025/09/05/university-of-wisconsin-madison-debuts-new-computer-sciences-building/83793528007/

Milwaukee Journal Sentinel (photo essay): https://www.jsonline.com/picture-gallery/news/2025/09/04/uw-madison-unveils-new-morgridge-hall-on-the-first-day-of-classes/85974982007/

Wisconsin State Journal (PDF attached if you run into the paywall): https://madison.com/news/local/education/university/article_d1b94aab-b39f-433b-86e8-b36303acf0a7.html

(also appeared as the front page story in print)

And on TV! WKOW (ABC affiliate): https://www.wkow.com/news/education/doors-open-for-uw-madisons-new-school-of-computer-data-information-sciences/article_e7ab8248-491d-4f94-bbdb-85d3828a081d.html

[22] Welcome to new professors!

Gavin Brown (dbrown5@wisc.edu) – PhD at Boston U – “I work on machine learning and data privacy. I am interested in understanding when and why machine learning models memorize large amounts of training examples. I also study this topic through the lens of differential privacy.”

Mike Hagenow (mhagenow@wisc.edu) – PhD at UW-Madison – “I work on developing methods for effective human-robot teaming with a focus on contact-rich tasks that are physically demanding on people. My main research interests lie in the shared autonomy and robot learning.”

Junjie Hu (jhu@cs.wisc.edu) – PhD at CMU – “I have a broad interest in natural language processing and machine learning. In particular, my research focuses on algorithmic design and fundamental understanding of machine learning models in NLT that enable safe deployment in the wild”

Misha Khodak (khodak@wisc.edu) – PhD at CMU – “My body of work includes fundamental theory for modern meta-learning (scalable methods that “learn-to-learn” using multiple learning tasks as data) and end-to-end guarantees for learning-augmented algorithms (algorithms that incorporate learned predictions about their instances to improve performance).”

Charles Yuan (charlesyuan@cs.wisc.edu) – PhD at MIT – “I study how to program a quantum computer to practically realize quantum algorithms. My research thus builds a new software stack of languages, libraries, and compilers to manipulate and reason about quantum information.”

Jiawei Zhang (jzhang2924@wisc.edu) – PhD at Chinese University of Hong Kong- “Research interests include: Nonlinear and convex optimization: theory and algorithms; Optimization, generalization, and robustness of machine learning, reinforcement learning, generative models (including diffusion models, large models, foundation models); Data-driven decision-making under uncertainty.”

June-July 2025

[1] https://blog.google/products/google-cloud/ml-systems-junior-faculty-awards/

From the blog post: “As part of our ongoing commitment to the academic research community, we’re announcing a new program, the Google ML and Systems Junior Faculty Awards, which we are presenting today to more than 50 assistant professors in 27 U.S. universities. These professors are leading the analysis, design and implementation of efficient, scalable, secure and trustworthy computing systems. Their work crosses the technology stack, from algorithms to software and hardware, enabling machine learning and cloud computing at increasingly massive scale. The recipients, who were selected by a distinguished group of Google engineers and researchers, will receive grants of $100,000 in unrestricted funding.”

[2] https://www.cs.wisc.edu/2025/07/23/jin-yi-cai-awarded-warf-named-professorship

“Computer Sciences Professor Jin-Yi Cai has been awarded a WARF Named Professorship. The professorship honors “faculty who have made contributions to the advancement of knowledge” through their research, teaching, and service activities. Nine other UW-Madison faculty members received the award, a distinguishing feature of which is that recipients choose the name for their professorship.”

[3] https://morgridge.org/story/htcondor-celebrates-40-years-powered-by-community/

“The high-throughput computing community — including scientists, researchers, developers, administrators and facilitators — gathered at UW–Madison on June 3 to celebrate 40 years of ‘The Condor Project,’ now known as HTCondor.

The brainchild of Miron Livny, lead investigator of research computing at the Morgridge Institute and director of the Center for High-Throughput Computing, HTCondor uses a vast network of computing resources to power a wide range of research projects across the globe.”

May 2025

[1] https://awards.acm.org/hopper

About ACM Grace Murray Hopper Award

Awarded to the outstanding young computer professional of the year, selected on the basis of a single recent major technical or service contribution. This award is accompanied by a prize of $35,000. The candidate must have been 35 years of age or less at the time the qualifying contribution was made, though allowance may be made for interrupted or second careers. Financial support of the Grace Murray Hopper Award is provided by Microsoft.

Article: https://www.cs.wisc.edu/2025/05/21/ilias-diakonikolas-wins-2024-acm-grace-hopper-award/

[2] H.I. Romnes Fellowship

This program, funded by WARF in recognition of the leadership of the late WARF Trustee President H. I. Romnes, is designed to bridge the gap between the Research Committee’s initial research support for new faculty and the Mid-Career Award for Faculty Research. This award is intended to recognize and support faculty up to SIX years past their first promotion to a tenured position. This award cannot be awarded more than one time to any faculty member.

https://research.wisc.edu/professorships-and-faculty-fellowships/h-i-romnes-faculty-fellowships/

[3, 4] https://cdis.wisc.edu/cdis-endowed-chairs-professorships/

[5] https://www.cs.wisc.edu/2025/04/22/the-spirit-of-madhacks-podcast/

What is Madhacks? How did it start? What goes into creating a hackathon? Meet the organizers of MadHacks 2024 – Grace Steinmetz, Chris Gottwaldt, Kevin Wang, Kayley Seow, Nico Salm, and Michael Noguera with Computer Sciences professor Manolis Vlatakis – and hear about their process and the impact of MadHacks. “Participants come to MadHacks from a variety of backgrounds, bringing their own knowledge of problems affecting their communities,” say the creators. “We are enablers — we provide a positive environment and concentrate resources—but hackers themselves are the agents using what they know and what they learn to develop solutions to their own real-world problems.”

[6] “Solving Zero-Sum Convex Markov Games”

(with Foivos Kalogiannis, Ian Gemp, and Georgios Piliouras)

https://icml.cc/virtual/2025/poster/44636

We provide the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) using independent policy gradient methods. Convex Markov games, recently described by (Gemp et. al 2024) , extend Markov decision processes to multi-agent settings with convex preferences over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity even under direct policy parametrization, the absence of Bellman equations, and the complexities of infinite horizons.Our results follow a two-step approach. First, leveraging the properties of hidden-convex–hidden-concave functions, we show that a carefully crafted regularization transforms the underlying min-max optimization problem into a nonconvex–proximal Polyak-Lojasiewicz (NC-pPL) structure.Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria.Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.

“A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games”

(with Francisca Vasconcelos, Panayotis Mertikopoulos, Georgios Piliouras, and Michael I. Jordan)

Recent advances in quantum information have revived interest in quantum zero-sum games, where computing Nash equilibria plays a central role. In 2008, Jain and Watrous introduced the first classical algorithm (MMWU) achieving a\mathcal{O}(d/\epsilon^2) convergence rate to \epsilon-Nash equilibria over the 4d-dimensional spectraplex.In our work:• We introduce a hierarchy of quantum optimization algorithms that generalize MMWU using an extra-gradient mechanism.• Notably, our Optimistic Matrix Multiplicative Weights Update (OMMWU) achieves \mathcal{O}(d/\epsilon) iteration complexity—providing a quadratic speedup over previous methods.

[7] Three papers accepted by Prof Barford and colleagues: Two in the IFIP Networking Conference [a, b] and one in the Network Traffic Measurement and Analysis Conference [c].

[a] Mia Weaver, Paul Barford, Fabian Bustamante, Esteban Carisimo, Weili Wu, Lynne Stokes. “Unsteady Underwater – On the Constancy of Submarine Path Properties”

Abstract—The world-wide submarine cable network (SCN) is one of the most costly, complex, and critical components in the Internet. In this paper, we describe a measurement study of packet dynamics on SCN paths. To conduct this study we built an active probe-based measurement system that harnesses looking glass nodes in strategic locations in service provider networks to gather data from 134 SCN paths. During the 18 month period of our study, we also tracked outage reports from online sources on a weekly basis. Our main finding is that key dynamic characteristics of the SCN including latency, loss, path stability, and outages vary widely. To provide perspective, we assess the operational constancy (i.e., remaining within bounds considered operationally equivalent) of SCN path properties and find a wide range of behavior. To further clarify these characteristics, we report the results of a cluster analysis and find that paths separate into distinct groups where behavior is either relatively stable or highly variable. Our findings have implications for operational practices, applications that utilize these paths, and for future measurement and monitoring of the SCN.

[b] Wei-Shiang Wung, Calvin Kranig, Eric Pauley, Paul Barford, Mark Crovella, Joel Sommers. “Squatspotting: Towards the Systematic Measurement of Typosquatting Techniques”

Abstract— Typosquatting—the practice of registering a domain name similar to another, usually well-known, domain name—is typically intended to drive traffic to a website for malicious or profit-driven purposes. In this paper we assess the current state of typosquatting, both broadly (across a wide variety of techniques) and deeply (using an extensive and novel dataset). Our breadth derives from the application of eight different candidate-generation techniques to a selection of the most popular domain names. Our depth derives from probing the resulting name set via a unique corpus comprising over 3.3B Domain Name System (DNS) records. We find that over 2.3M potential typosquatting names have been registered that resolve to an IP address. We then assess those names using a framework focused on identifying the intent of the domain from the perspectives of DNS and webpage clustering. Using the DNS information, HTTP responses, and Google SafeBrowsing, we classify the candidate typosquatting names as resolved to private IP, malicious, defensive, parked, legitimate, or unknown intents. Our findings provide the largest-scale and most-comprehensive perspective to date on typosquatting, exposing potential risks to users. Further, our methodology provides a blueprint for tracking and classifying typosquatting on an ongoing basis.

[c] Calvin Kranig, Eric Pauley, Wei-Shiang Wung, Paul Barford, Mark Crovella, Joel Sommers. “Toward a Representative DNS Data Corpus: A Longitudinal Comparison of Collection Methods”

Abstract—Domain Name System (DNS) records are frequently used to investigate a wide variety of Internet phenomena including topology, malicious activity, resource allocations and user behavior. The scope and utility of the results of these studies depends intrinsically on the representativeness of the DNS data. In this paper, we compare and contrast five different

DNS datasets from four different providers collected over a period of 3 months that in total comprise over 10.1B total Fully Qualified Domain Names (FQDNs) of which 3.7B are unique.

[8] https://academicaffairs.rutgers.edu/2024-2025-faculty-year-end-excellence-award-recipients

The Board of Trustees Research Fellowship for Scholarly Excellence

The fellowship honors faculty members who have made truly outstanding contributions to research during their early years at Rutgers.

Prof Sudarsun Kannan’s web page: here.

[9] Prof Livny and Ramakrishnan’s BIRCH Algorithm paper discovered in the wild:

A seminar visit by a computational chemist, Ramon Miranda Quintana, from the University of Florida, took place last week here at UW-Madison in the Chemistry department. His group is using your BIRCH algorithm to cluster millions of drug compounds.

[10] The paper “Decoding Speculative Decoding” got the SAC Award for Generation at NAACL 2025.

Award details:

https://2025.naacl.org/blog/best-papers/

Paper available here:

https://arxiv.org/abs/2402.01528

Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality. When performing inference, speculative decoding uses a smaller draft model to generate speculative tokens and then uses the target LLM to verify those draft tokens. The speedup provided by speculative decoding heavily depends on the choice of the draft model. In this work, we perform a detailed study comprising over 350 experiments with LLaMA-65B and OPT-66B using speculative decoding and delineate the factors that affect the performance gain provided by speculative decoding. Our experiments indicate that the performance of speculative decoding depends heavily on the latency of the draft model, and the draft model’s capability in language modeling does not correlate strongly with its performance in speculative decoding. Based on these insights we explore a new design space for draft models and design hardware-efficient draft models for speculative decoding. Our newly designed draft model can provide 111% higher throughput than existing draft models and our approach generalizes further to all LLaMA models (1/2/3.1) and supervised fine-tuned models.

[11] Two papers accepted to ICML 2025 by Prof Shivaram Venkataraman and colleagues.

“LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models” with student Tzu-Tao Chang.

https://arxiv.org/abs/2502.02406

Cross-attention is commonly adopted in multimodal large language models (MLLMs) for integrating visual information into the language backbone. However, in applications with large visual inputs, such as video understanding, processing a large number of visual tokens in cross-attention layers leads to high memory demands and often necessitates distributed computation across multiple GPUs. Existing distributed attention mechanisms face significant communication overheads, making cross-attention layers a critical bottleneck for efficient training and inference of MLLMs. To address this, we propose LV-XAttn, a distributed, exact cross-attention mechanism with minimal communication overhead. We observe that in applications involving large visual inputs, the size of the query block is typically much smaller than that of the key-value blocks. Thus, in LV-XAttn we keep the large key-value blocks locally on each GPU and exchange smaller query blocks across GPUs. We also introduce an efficient activation recomputation technique to support longer visual context. We theoretically analyze the communication benefits of LV-XAttn and show that it can achieve speedups for a wide range of models. Our evaluations with Llama 3-V, mPLUG-Owl3 and OpenFlamingo models find that LV-XAttn achieves up to 10.62× end-to-end speedup compared to existing approaches.

“Scaling Inference-Efficient Language Models” with students Song Bian and Minghao Yan. https://arxiv.org/abs/2501.18107

Scaling laws are powerful tools to predict the performance of large language models. However, current scaling laws fall short of accounting for inference costs. In this work, we first show that model architecture affects inference latency, where models of the same size can have up to 3.5x difference in latency. To tackle this challenge, we modify the Chinchilla scaling laws to co-optimize the model parameter count, the number of training tokens, and the model architecture. Due to the reason that models of similar training loss exhibit gaps in downstream evaluation, we also propose a novel method to train inference-efficient models based on the revised scaling laws. We perform extensive empirical studies to fit and evaluate our inference-aware scaling laws. We vary model parameters from 80M to 1B, training tokens from 1.6B to 30B, and model shapes, training a total of 63 models. Guided by our inference-efficient scaling law and model selection method, we release the Morph-1B model, which improves inference latency by 1.8x while maintaining accuracy on downstream tasks compared to open-source models, pushing the Pareto frontier of accuracy-latency tradeoff.

[12] Wenxuan Tan, advised by Prof Shivaram Venkataraman, received the Hilldale Fellowship and the Dewitt Scholarship.

[13] SIGCOMM ’25 paper acceptance:

Understanding and profiling CXL.mem using PathFinder

Xiao Li, Zerui Guo, Yuebin Bai, Mahesh Ketkar, Hugh Willkinson, and Ming Liu

CXL.mem and the resulting memory pool are promising and attracting great attention. Unlike local memory, CXL DIMMs stay at the I/O subsystem (i.e., FlexBus), whose inferior performance can easily impact the processor pipeline and memory subsystem, yielding performance interference, hardware contention, obscure behaviors, and underutilized communication and computing resources. However, our community lacks a tool to understand and profile the CXL.mem protocol execution end-to-end between CPU and remote DIMM.

This paper fills the gap by designing and implementing PathFinder, a systematic, informative, and lightweight CXL.mem profiler. PathFinder leverages the capabilities of existing hardware performance monitors (PMUs) and dissects the CXL.mem protocol at adequate granularities. Our key idea is to view the server processor and its chipset as a multi-stage Clos network, equip each architectural module with a PMU-based telemetry engine, track different CXL.mem paths, and apply conventional traffic analysis techniques. PathFinder performs snapshot-based path-driven profiling and introduces four techniques, i.e., path construction, stall cycle breakdown, interference analyzer, and cross-snapshot analysis. We build PathFinder atop Linux Perf and apply it to six case studies. PathFinder will be open-sourced.

[14] Invited talk at AMD Research & Advanced Development

“Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems”
Matt Sinclair

[15] https://captimes.com/news/education/uw-madison-computer-science-prepares-to-relocate-meet-ai-moment/

UW-Madison computer science prepares to relocate, meet ‘AI moment’

[16] Prof Jin-Yi Cai (or students or both) had the following paper acceptances:

A joint paper with Jin Soo Ihm

Holant* Dichotomy on Domain Size 3: A Geometric Perspectiveto appear in ICALP 2025 (The 52nd EATCS International Colloquium on Automata, Languages, and Programming)

A singly authored paper by his student Benjamin Young

The Converse of the Real Orthogonal Holant Theorem to appear in ICALP 2025 (The 52nd EATCS International Colloquium on Automata, Languages, and Programming)

A joint paper with Benjamin Young

Quantum Algorithms for Discrete Log Require Precise Rotations to appear in ACM Transactions on Quantum Computing (TQC)

[17] CRYPTO ’25 papers

“A Note on Adaptive Security in Hierarchical Identity-Based Encryption”

Authors: Rishab Goyal (UW-Madison); Venkata Koppula (IIT Delhi); Mahesh Sreekumar Rajasree (CISPA)

“Succinct Arguments for BatchQMA and Friends under 8 rounds”

Authors: Rishab Goyal (University of Wisconsin, Madison); Aditya Jain (University of Wisconsin, Madison); Shashwatha Mitra G B (University of Wisconsin, Madison)

ICALP’ 25 paper:

Boosting SNARKs and Rate-1 Barrier in Arguments of Knowledge

Authors: Jiaqi Cheng and Rishab Goyal (UW-Madison)

[18] Benjamin Young

The Converse of the Real Orthogonal Holant Theorem

Winner: Best Student Paper at ICALP ‘25.

[19] Condor Pool Usage update:

For the first time we crossed the 600 number of users over the past 12 months. Supporting the education mission definitely contributes to these increases.

Also, the first time we crossed the 220 number of users for one month.

We see a significant increase in the number of jobs served over the past two months (30M and 34M respectively). For the first time we crossed the 240M jobs in 12 months. A year ago, we had 187M jobs for the past 12 months …

With 243M jobs and 2.8 B file transfers the OSPool is “humming” at 7.7 HZ and 88.8 HZ respectively. Amazing!

[20] Master’s Projects and Theses advised this past semester (and summer):

David Parra, advised by Mohit Gupta
Practical 3D Time-of-flight imaging with single-photon cameras.

Alexander Peseckis, advised by Michael Gleicher
Occlusion-Conscious Viewpoint Optimization

Hanxiu Zhu, advised by Yuhang Zhao
Design Technology to Make Video Watching Experiences More Inclusive for People with ADHD

Casey Ford, advised by Michael Gleicher
Stereo SPAD sensing

Siddharth Suresh, advised by Robert Nowak
Understanding Computation Humor: From Crowd-Sourced Preferences to Expert-Level AI Alignment

Sriram Ashokkumar, advised by Dan Negrut
Multi Angent Exploration with Vision-Language Models

Tara Bobinac, advised by Junjie Hu
Leveraging AI as a Deliberative Conversation Partner to Foster Intercultural Empathy

Rose Ceccio, advised by Rahul Chatterjee
The technology used in intimate partner surveillance

Aditya Das Sarma, advised by Rahul Chatterjee
Multi-bit flip-flop clustering for power optimization

Prakriti Garg, advised by Sushmita Roy
Graph representation learning framework for principled analysis of orthologous regulatory regions

Shashwatha Mitra Ghante Banuprakash, advised by Rishab Goyal
Succinct Classical Verification For BatchQMA in under 8 Rounds

Suenggwan Jo, advised by Fred Sala
Foundation Model usage for forecasting

Keith Johnson, advised by Aws Albarghouthi
Tools and Techniques for Semantics-Guided Synthesis

Jaden Park, advised by Yong Jae Lee
On the Analysis and Detection of Data Contamination in Vision Language Models

Kassie Povinelli, advised by Yuhang Zhao
Using Voice-Changer Technologies for Gender-Affirming Voice Exploration

Lakshika Rathi, advised by Swamit Tannu
Physics Aware Neural Networks for Estimating Application Fidelity

Muhammad Danial Saleem, advised by Xiangyao Yu
Evaluating and Improving Bioinformatics Tools Using DBMS Techniques

Anshuman Senapati, advised by Josiah Hanna
On state abstractions for reinforcement learning

Shengyuan Yang, advised by Thomas Reps
Precise Pointer Analysis for Java Streams

Binwei Yao, advised by Junjie Hu
No Preference Left Behind: Group Distributional Preference optimization

Karl Vachuska, advised by Hanbaek Lyu

Network Motifs

Hadi Khader, advised by Mohit Gupta
Latent Diffusion for Efficient SPAD Image Reconstruction in Low-Light and High-Motion Settings

Longbang Liu, advised by Sharon Li
Enhancing AlphaFold 3 for Molecular Glue Prediction via Local Chemical Environment Integration

April 2025

[1] Prof Ilias Diakonikolas’s former PhD student, Ankit Pensia, will start as a tenure-track assistant professor at CMU, in the department of Statistics and Data Science. Ankit obtained a PhD from our department in 2023. After graduation, Ankit spent a year in Boston as a Goldstine postdoctoral fellow at IBM. Currently, he is a research fellow at the Simons Institute in Berkeley. He will be starting his faculty position in the coming Fall. Ankit is the first PhD graduate under Diakonikolas’s supervision at UW Madison.

https://ankitp.net/

[2] Prof Steve Wright has been invited as one of the 21 plenary speakers in the International Congress of Mathematicians (ICM) in July 2026. ICM is the largest conference in mathematics, held every four years, and organized by the International Mathematical Union. The Fields Medals are awarded there.

https://en.wikipedia.org/wiki/International_Congress_of_Mathematicians

[3] Comparing Vibrotactile and Skin-Stretch Haptic Feedback for Conveying Spatial Information of Virtual Objects to Blind VR Users

Jiangsheng Li, Zining Zhang, Zeyu Yan, Yuhang Zhao, and Huaishu Peng.

IEEE VR 2025.

https://jsli.phd/assets/pdf/IEEE_hand_haptic.pdf

Perceiving spatial information of a virtual object (e.g., direction, distance) is critical yet challenging for blind users seeking an im- mersive virtual reality (VR) experience. To facilitate VR accessibility for blind users, in this paper, we investigate the effectiveness of two types of haptic cues—vibrotactile and skin-stretch cues—in conveying the spatial information of a virtual object when applied to the dorsal side of a blind user’s hand. We conducted a user study with 10 blind users to investigate how they perceive static and mov- ing objects in VR with a custom-made haptic apparatus. Our results reveal that blind users can more accurately understand an object’s location and movement when receiving skin-stretch cues, as op- posed to vibrotactile cues. We discuss the pros and cons of both types of haptic cues and conclude with design recommendations for future haptic solutions for VR accessibility.

[4] Prof Yuhang Zhao has recently received the Grant Accelerator Program (GAP) Award from the McPherson Eye Research Institute to work on “Intent-aware Systems to Enhance Low Vision People’s Visual Perception”

https://vision.wisc.edu/funding_opportunities/grant-accelerator-program/recipients/

[5] Yeping Wang and Michael Gleicher. 2025. Anytime Planning for End-Effector Trajectory Tracking. IEEE Robot. Autom. Lett. 10, 4 (April 2025), 3246–3253. https://doi.org/10.1109/LRA.2025.3540633

https://arxiv.org/abs/2502.03676

End-effector trajectory tracking algorithms find joint motions that drive robot manipulators to track reference trajectories. In practical scenarios, anytime algorithms are preferred for their ability to quickly generate initial motions and continuously refine them over time. In this paper, we present an algorithmic framework that adapts common graph-based trajectory tracking algorithms to be anytime and enhances their efficiency and effectiveness. Our key insight is to identify guide paths that approximately track the reference trajectory and strategically bias sampling toward the guide paths. We demonstrate the effectiveness of the proposed framework by restructuring two existing graph-based trajectory tracking algorithms and evaluating the updated algorithms in three experiments.

[6] “Quake: Adaptive Indexing for Vector Search”

Authors: Jason Mohoney (University of Wisconsin, Madison);

Devesh Sarda (University of Wisconsin, Madison);

Mengze Tang (University of Wisconsin, Madison);

Shihabur Rahman Chowdhury (Apple);

Anil Pacaci (Apple);

Ihab F. Ilyas (University of Waterloo);

Theodoros Rekatsinas (Apple);

Shivaram Venkataraman (University of Wisconsin, Madison)

https://github.com/marius-team/quake

[7] https://edbticdt2025.upc.edu/?contents=invited_keynotes.html

Abstract: Joins are the cornerstone of relational databases. Surprisingly, even after several decades of research in the systems and theory database community, we still lack an understanding of how to design the fastest possible join algorithm. In this talk, we will present the exciting progress the database theory community has achieved in join algorithms over the last two decades. The talk will revolve around five key ideas fundamentally shaping this research area: tree decompositions, data partitioning, leveraging statistical information, enumeration, and algebraic techniques.

Paris Koutris is an associate professor in Computer Sciences at the University of Wisconsin-Madison. His research lies in the intersection of data management theory and practice, focusing on data processing for massively parallel systems, data markets, efficient query processing, and managing uncertain data. He has won the 2016 SIGMOD Jim Gray Dissertation Award for his work on the foundations of parallel data processing, as well as the 2023 PODS Alberto O. Mendelzon Test-of-Time award.

[8] https://arxiv.org/abs/2406.07847

Output-sensitive Conjunctive Query Evaluation

Shaleen Deep, Hangdong Zhao, Austen Z. Fan, Paraschos Koutris

Join evaluation is one of the most fundamental operations performed by database systems and arguably the most well-studied problem in the Database community. A staggering number of join algorithms have been developed, and commercial database engines use finely tuned join heuristics that take into account many factors including the selectivity of predicates, memory, IO, etc. However, most of the results have catered to either full join queries or non-full join queries but with degree constraints (such as PK-FK relationships) that make joins \emph{easier} to evaluate. Further, most of the algorithms are also not output-sensitive.

In this paper, we present a novel, output-sensitive algorithm for the evaluation of acyclic Conjunctive Queries (CQs) that contain arbitrary free variables. Our result is based on a novel generalization of the Yannakakis algorithm and shows that it is possible to improve the running time guarantee of the Yannakakis algorithm by a polynomial factor. Importantly, our algorithmic improvement does not depend on the use of fast matrix multiplication, as a recently proposed algorithm does. The upper bound is complemented with matching lower bounds conditioned on two variants of the k-clique conjecture. The application of our algorithm recovers known prior results and improves on known state-of-the-art results for common queries such as paths and stars.

[9] https://teachingacademy.wisc.edu/

[10] (a) Title: Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems

(b) Title: Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems

Abstract: The session “The Role of Non-GPU Accelerators and Chiplets in Future HPC Systems” at SOS27 will explore the evolving landscape of high-performance computing (HPC) architectures, focusing on the integration of unique accelerators and chiplet technology. As the demand for computational power continues to surge, traditional GPU-centric approaches are being complemented by innovative solutions that leverage specialized accelerators, FPGAs, ASICs, and Dataflow solutions, all alongside modular chiplet designs. This session will delve into the advantages of these technologies, including enhanced performance, energy efficiency, and scalability, while addressing the challenges of programming, cost, and system integration. Attendees will engage in discussions on real-world applications, case studies, and future trends, equipping them with insights to navigate a potential Cambrian explosion of new HPC architectures more effectively.

Link: https://sos27.cscs.ch/

Abstract: Multi-fidelity simulation accelerates hardware–software (HW/SW) co-design through design space exploration for next-generation HPC systems. By leveraging open-source simulators like gem5 and SST, early-stage prototyping enables the exploration of system architectures capable of achieving 100 exaFLOPs while consuming only 20 MW of power—a critical target for DOE’s future leadership-class computing. Traditional modeling lacks the scalability and fidelity needed for effective co-design at exascale. A Region of Interest (ROI)-driven methodology optimizes simulation efficiency by dynamically adjusting fidelity based on application characteristics. The integration of post-CMOS accelerators, power models, and fine-grained thermal modeling addresses key performance and power trade-offs. By pinpointing representative regions of interest, significant simulation speedups are achieved, enabling practical large-scale exploration. This work aligns with DOE’s moonshot goals, emphasizing disruptive, cross-layer innovations spanning architecture, runtime, OS, compilers, networking, and batch scheduling. By leveraging insights from prior system co-design efforts and lessons learned, this approach enables agile, adaptable designs that meet the evolving demands of HPC+AI, quantum computing, and scientific computing.

Link: https://orau.gov/2025CSPI

[11] Zhang, S., Li, J., Cagiltay, B., Kirkorian, H., Mutlu, B., & Fawaz, K. (2025). “A qualitative exploration of parents and their children’s uses and gratifications of ChatGPT”. Family Relations.

https://onlinelibrary.wiley.com/doi/full/10.1111/fare.13171

Objective

This Emerging Ideas report explores families’ (parents and their children) uses and gratification for ChatGPT.

Background

Generative artificial intelligence–based conversational agents, such as ChatGPT, can be used to accomplish a variety of tasks, yet little is known about how and why parents and their children may use these technologies.

Methods

We conducted semistructured qualitative and exploratory interviews with 12 U.S.-based families that had experience sharing a ChatGPT account. Families were recruited using social media advertisements, and at least one child and one parent joined the interview. We asked families about what they used ChatGPT for and why they used the platform.

Results

Families reported four main motivators for using ChatGPT: (a) information seeking, (b) enhancing productivity, (c) entertainment, and (d) social bonding. Potential barriers to use included concerns about (a) ChatGPT’s credibility and capabilities, (b) being less familiar with using ChatGPT, (c) the platform’s ethical implications, and (d) possible privacy risks.

Conclusion

Families use ChatGPT for various purposes, but their uses and gratifications sometimes may differ depending on their perceptions of and experiences with the platform.

Implications

Our findings suggest that with some improvements, ChatGPT has the potential to be a useful tool for both individual and shared use in families.

[12] Prof Jha and Prof Chajed recently received Darpa grant through the TRACTOR program The grant involves UW(lead), Berkeley, UIUC, and Edinburgh. The total grant amount is $5M.

https://www.darpa.mil/research/programs/translating-all-c-to-rust

[13] https://arxiv.org/abs/2411.18479

SoK: Watermarking for AI-Generated Content

Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, Dawn Song

As the outputs of generative AI (GenAI) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI-generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAI, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of watermarking techniques for GenAI, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. Practical evaluation strategies are also explored, providing insights into the development of robust watermarking techniques capable of resisting various attacks. Additionally, we review recent representative works, highlight open challenges, and discuss potential directions for this emerging field. By offering a thorough understanding of watermarking in GenAI, this work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.

[14] https://dl.acm.org/doi/10.1145/235968.233324

BIRCH: an efficient data clustering method for very large databases

Authors: Tian Zhang, Raghu Ramakrishnan, Miron Livny

Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle “noise” (data points that are not part of the underlying pattern) effectively.We evaluate BIRCH’s time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.

[15] TitletownTech (a VC firm created through a partnership of Green Bay Packers and Microsoft) announced their first round picks in their inaugural “startup draft” — happening at the same time as the NFL draft in Green Bay. One of the picks is a Madison (and Cambrige MA) based company, called Ubicept, which includes Prof. Mohit Gupta as a co-founding advisor and a few others from UW-Madison. They receive $1 million in investment + $350K in Microsoft Azure credits (plus a lot of visibility).

https://www.titletowntech.com/post/ateios-systems-and-ubicept-selected-as-first-round-picks-in-inaugural-titletowntech-startup-draft

March 2025

[1] https://www.cs.wisc.edu/2025/03/06/sharon-li-2025-sloan-fellow/

One of just 126 recipients across the US and Canada, the Sloan Foundation counts Sharon Li among the next generation of scientific leaders.

Assistant Professor Sharon Li, a researcher in algorithmic and theoretical foundations of safe and reliable AI, is one of two UW–Madison researchers to be named a 2025 Sloan Fellow. Conferred by the Alfred P. Sloan Foundation, the awards recognize promising early-career scientists “whose creativity, innovation, and research accomplishments make them stand out as the next generation of leaders.”

[2] https://www.computer.org/press-room/computer-pioneer-award-recipients-2025

The IEEE Computer Society (CS) today announced the recipients of the 2025 Computer Pioneer Award, which recognizes and honors those whose long-standing efforts have resulted in the advancement and continued vitality of the computer industry. This year, the “IEEE Computer Society Computer Pioneer Award in honor of the Women of ENIAC” recognized two individuals: Gurindar (Guri) Sohi, Vilas Research Professor, John P. Morgridge Professor, and E. David Cronon Professor of Computer Sciences, Computer Science Department, University of Wisconsin-Madison, Madison, Wis., U.S.A.; and Moshe Y. Vardi, University Professor and the George Distinguished Service Professor in Computational Engineering at Rice University, Houston, Texas, U.S.A.

Also see our CDIS write up:

https://cdis.wisc.edu/a-legacy-of-innovation-professor-gurindar-sohi-wins-prestigious-computer-pioneer-award/

[3] https://research.wisc.edu/professorships-and-faculty-fellowships/vilas-associates/

The Vilas Associates Competition recognizes new and ongoing research of the highest quality and significance. Recipients are chosen competitively by the divisional Research Committees on the basis of a detailed proposal. Winners receive up to two-ninths of research salary support (including the associated fringe costs) for both summers 2025 and 2026, as well as a $12,500 flexible research fund in each of the two fiscal years. Faculty paid on an annual basis are not eligible for the summer salary support but are eligible for the flexible fund portion of this award. This award cannot be awarded more than one time to any faculty member.

[4] https://conf.researchr.org/news/icse-2025

On behalf of the ICSE 2025 Organizing Committee and ICSE 2015 PC Chair, we are delighted to inform you that your paper, “IccTA: Detecting Inter-Component Privacy Leaks in Android Apps”, has been selected for the ICSE Most Influential Paper N-10 Award.

https://ieeexplore.ieee.org/document/7194581

“IccTA: Detecting Inter-Component Privacy Leaks in Android Apps”

Li Li, Alexandre Bartel, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, Patrick McDaniel

Abstract: Shake Them All is a popular “Wallpaper” application exceeding millions of downloads on Google Play. At installation, this application is given permission to (1) access the Internet (for updating wallpapers) and (2) use the device microphone (to change background following noise changes). With these permissions, the application could silently record user conversations and upload them remotely. To give more confidence about how Shake Them All actually processes what it records, it is necessary to build a precise analysis tool that tracks the flow of any sensitive data from its source point to any sink, especially if those are in different components. Since Android applications may leak private data carelessly or maliciously, we propose IccTA, a static taint analyzer to detect privacy leaks among components in Android applications. IccTA goes beyond state-of-the-art approaches by supporting inter- component detection. By propagating context information among components, IccTA improves the precision of the analysis. IccTA outperforms existing tools on two benchmarks for ICC-leak detectors: DroidBench and ICC-Bench. Moreover, our approach detects 534 ICC leaks in 108 apps from MalGenome and 2,395 ICC leaks in 337 apps in a set of 15,000 Google Play apps.

[5] https://www.usenix.org/conference/fast25/presentation/liu-jing

“Fast, Transparent Filesystem Microkernel Recovery with Ananke”

Jing Liu, Microsoft Research; Yifan Dai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin–Madison

Awarded Best Paper!

Abstract: We introduce Ananke, a high-performance filesystem microkernel service that provides transparent recovery from unexpected filesystem failures. Ananke does so by leveraging the unique opportunity of the microkernels, running a small amount of recovery code coordinated by the host OS at the moment of a process crash. Ananke can record key pieces of information not usually available during full-system crash recovery, enabling fast and transparent recovery for applications. Through over 30,000 fault-injection experiments, we demonstrate that Ananke achieves lossless recovery; we also show that Ananke recovers quickly, usually in a few hundred milliseconds. Through real application workloads, we show that Ananke delivers high performance in the common case; the extra work needed to detect faults and prepare for recovery incurs minimal overheads.

[6] http://tab.computer.org/tcde/tcdeawards.html#tcde_ea

On behalf of the TCDE Awards Committee, I am delighted to inform you that you are the recipients of the 2025 IEEE TCDE Ramez Elmasri Outstanding Database Education Award. The citation is as follows:
For authoring a widely used textbook accompanied by an open-source software package that has significantly influenced global education in database systems.

[7] Prof Sharon Li’s student Sean Xuefeng Du was awarded the 2024-2025 Ivanisevic Award is Sean Xuefeng Du. Xuefeng’s research focuses on developing algorithms and providing theoretical understandings that enable safe and reliable machine learning in the open world. He is being advised by Professor Sharon Li.

https://d12306.github.io

Hi! I am a fifth-year CS Ph.D. student at UW-Madison advised by Prof. Sharon Yixuan Li. Previously, I was an undergraduate student at Xi’an Jiaotong University (XJTU) majoring in electronic engineering.

My current research focus is to develop theory and algorithms for reliable machine learning, foundation model reliability, and related applications, specifically I work on:

Out-of-distribution (OOD) research: designing adaptive and interpretable learning algorithms that help ML models detect and generalize on OOD samples, such as semantic/covariate shifts, adversarial and noisy samples.
Reliability of foundation models: understanding the blindspots of LLMs and VLMs for improved safeguarding efforts, such as model hallucination and harmful prompt detection.
Applications: Graph embedding, object detection and segmentation.
Interdisciplinary research: Biometrics, AI-aided cryo-microscopy/protein structural analysis.

[8] https://nick11roberts.science

I am a Ph.D. student in CS at University of Wisconsin–Madison where I am advised by Fred Sala along with my many talented colleagues in the Sprocket Lab.

This past Fall, I interned with the Llama Generative AI team at Meta in London, where I worked with Dieuwke Hupkes on language model pretraining and evaluation. This past Summer, I interned at Together AI in San Francisco with Tri Dao, where I worked on hybrid language models. In Summer 2023, I interned with the Physics of AGI research group at Microsoft Research led by Sébastien Bubeck, where I also worked on language models.

Before starting my Ph.D., I had the pleasure of working with Ameet Talwalkar and Zack Lipton during my M.S. in the Machine Learning Department at Carnegie Mellon University. As an undergraduate, I was extremely fortunate to work with both Sanjoy Dasgupta and Gary Cottrell at the University of California, San Diego. Prior to all of this, I was a community college student at Fresno City College, where I was lucky enough to learn calculus, linear algebra, and C++ from Greg Jamison.

In 2025 I received an honorable mention for the Jane Street Graduate Research Fellowship and in 2023, I was named an MLCommons Rising Star. I have also been awarded the Prove AI and UnifyID AI Fellowships in 2021 and 2019, respectively.

[9] Product Manifold Representations for Learning on Biological Pathways

Daniel McNeela, Frederic Sala, Anthony Gitter

Abstract: Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality embeddings for biological pathway graphs is important for researchers looking to understand the underpinnings of disease and train high-quality predictive models on these networks. In this work, we investigate the effects of embedding pathway graphs in non-Euclidean mixed-curvature spaces and compare against traditional Euclidean graph representation learning models. We then train a supervised model using the learned node embeddings to predict missing protein-protein interactions in pathway graphs. We find large reductions in distortion and boosts on in-distribution edge prediction performance as a result of using mixed-curvature embeddings and their corresponding graph neural network models.

[10] https://pages.cs.wisc.edu/~pb/

Paul Barford is a professor of Computer Sciences at the University of Wisconsin – Madison where he is a member of the networking group and the founder and director of the Wisconsin Advanced Internet Laboratory. He received his B.S. in Electrical Engineering from the University of Illinois and his Ph.D. in Computer Science from Boston University. He was a visiting professor at the Computer Laboratory at the University of Cambridge, a visiting fellow at Wolfson College and an EPSRC visiting fellow. He was also a visiting scholar at Wolfson College at University of Oxford.

Prof. Barford worked in industry for 8 years prior to his academic career. He has been a consultant to companies and frequently gives talks and seminars on computer networking and security issues. He was the founder and CEO of Nemean Networks LLC., which developed network intrusion detection technology. Nemean was acquired by Qualys Inc. where he served as Chief Scientist. Prof. Barford was also co-founder and Chief Scientist of Mdotlabs LLC., which developed fraud detection services for on-line advertising. Mdot was acquired by comScore Inc. where he served as Chief Scientist.

Prof. Barford’s research interests are in computer networking and communications with a focus on measurement and analysis of Internet data, protocols and topological structure. He also investigates Internet security including detection and identification of fraud, traffic anomalies, malicious attacks and intrusions. His research activities have been supported by the NSF, DHS, ARO, Cisco, and Intel among others.

Prof. Barford has served on numerous government panels. He has also served on executive, organizing and technical program committees of many different networking and IT security workshops and conferences including ACM SIGCOMM (’13 PC chair), ACM SIGMETRICS (’10 PC chair) and ACM IMC (’06 PC chair). He served on the board of directors of the National Lambdarail, as an associate editor for IEEE/ACM Transactions on Networking and as co-chair of the GENI Instrumentation and Measurement working group.

[11] MAP: Multi-user Personalization with Collaborative LLM-powered Agents

Christine Lee, Jihye Choi, Bilge Mutlu

CHI 2025.

LearningMate: An LLM-Powered Adaptive Teaching System with Personalized Learning Plans and Tutoring

Jessica Wang, Christine Lee, Bilge Mutlu

CHI 2025.

[12] VisiMark: Characterizing and Augmenting Landmarks for People with Low Vision in Augmented Reality to Support Indoor Navigation

Ruijia Chen, Junru Jiang, Pragati Maheshwary, Brianna R. Cochran, and Yuhang Zhao.

CHI 2025.

Inclusive Avatar Guidelines for People with Disabilities: Supporting Disability Representation in Social Virtual Reality

Kexin Zhang, Edward Glenn Scott Spencer, Andric Li, Ang Li, Yaxing Yao, and Yuhang Zhao. CHI 2025.

Beyond the “Industry Standard”: Focusing Gender-Affirming Voice Training Technologies on Individualized Goal Exploration

Kassie Povinelli, Hanxiu “Hazel” Zhu, and Yuhang Zhao.

CHI 2025.

[13] Understanding Mixed Reality Drift Tolerance

Daniel Killough, Daniel Killough, Yuhang Zhao, Bilge Mutlu

CHI 2025.

[14] “Demonstration of VRSight: AI-Driven Real-Time Descriptions to Enhance VR Accessibility for Blind People”

Daniel Killough, Justin Feng, Rithvik Dyava, Zheng Xue Ching, Daniel Wang, Yapeng Tian, Yuhang Zhao.

CHI 2025 Interactivity.

[15] Checking observational correctness of database systems

OOPSLA 25 · Pick, Xu, Desai, Seshia, Albarghouthi

[16] Dependency-aware compilation for surface code quantum architectures

OOPSLA 25 · Molavi, Xu, Tannu, Albarghouthi

[17] Carter Sifferman, William Sun, Mohit Gupta, and Michael Gleicher. 2024. Using a Distance Sensor to Detect Deviations in a Planar Surface. IEEE Robotics and Automation Letters 9, 10 (2024), 8515–8522. https://doi.org/10.1109/LRA.2024.3445665

https://arxiv.org/abs/2408.03838

We investigate methods for determining if a planar surface contains geometric deviations (e.g., protrusions, objects, divots, or cliffs) using only an instantaneous measurement from a miniature optical time-of-flight sensor. The key to our method is to utilize the entirety of information encoded in raw time-of-flight data captured by off-the-shelf distance sensors. We provide an analysis of the problem in which we identify the key ambiguity between geometry and surface photometrics. To overcome this challenging ambiguity, we fit a Gaussian mixture model to a small dataset of planar surface measurements. This model implicitly captures the expected geometry and distribution of photometrics of the planar surface and is used to identify measurements that are likely to contain deviations. We characterize our method on a variety of surfaces and planar deviations across a range of scenarios. We find that our method utilizing raw time-of-flight data outperforms baselines which use only derived distance estimates. We build an example application in which our method enables mobile robot obstacle and cliff avoidance over a wide field-of-view.

[18] Yeping Wang, Alexander Peseckis, Zelong Jiang, and Michael Gleicher. 2024. Motion Comparator: Visual Comparison of Robot Motions. IEEE Robot. Autom. Lett. 9, 9 (September 2024), 7699–7706. https://doi.org/10.1109/LRA.2024.3430649

https://arxiv.org/abs/2407.02746

Roboticists compare robot motions for tasks such as parameter tuning, troubleshooting, and deciding between possible motions. However, most existing visualization tools are designed for individual motions and lack the features necessary to facilitate robot motion comparison. In this paper, we utilize a rigorous design framework to develop Motion Comparator, a web-based tool that facilitates the comprehension, comparison, and communication of robot motions. Our design process identified roboticists’ needs, articulated design challenges, and provided corresponding strategies. Motion Comparator includes several key features such as multi-view coordination, quaternion visualization, time warping, and comparative designs. To demonstrate the applications of Motion Comparator, we discuss four case studies in which our tool is used for motion selection, troubleshooting, parameter tuning, and motion review.

[19] Yeping Wang and Michael Gleicher. Hierarchically Accelerated Coverage Path Planning for Redundant Manipulators. ICRA 2025.

https://arxiv.org/abs/2502.19591

Many robotic applications, such as sanding, polishing, wiping and sensor scanning, require a manipulator to dexterously cover a surface using its end-effector. In this paper, we provide an efficient and effective coverage path planning approach that leverages a manipulator’s redundancy and task tolerances to minimize costs in joint space. We formulate the problem as a Generalized Traveling Salesman Problem and hierarchically streamline the graph size. Our strategy is to identify guide paths that roughly cover the surface and accelerate the computation by solving a sequence of smaller problems. We demonstrate the effectiveness of our method through a simulation experiment and an illustrative demonstration using a physical robot.

[20] Georgia Tech CRNCH Summit https://crnch.gatech.edu/crnch-summit-2025/

“Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems”
Matt Sinclair

“ModSim Will Play a Crucial Role in Enabling the Next 15 Years of Computer Architecture Research … But Needs a Lot of TLC”

Matt Sinclair

My Talk’s Abstract: As modern computing systems become increasingly complex and heterogeneous, modeling and simulation (ModSim) infrastructure is straining to keep up. Nevertheless, ModSim plays a crucial role in enabling rapid, early-stage co-design for these systems. In this talk, I discuss how ModSim will be crucial to enabling the next 15 years of computer architecture research, as well as (some of) the challenges for us to get there.

[21] Here is a CS article about students from the Wisconsin AI Safety Initiative (WAISI) who traveled to DC to present to Congressional staffers, journalists on risks associated with the proliferation of AI:

https://www.cs.wisc.edu/2025/03/13/waisi-presents-caip-advanced-ai-expo/

[22] https://it.wisc.edu/cio-blog/computing-infrastructure-research-rise-initiative/

February 2025

[1] Vilas Distinguished Achievement Professor awarded to Prof Jerry Zhu.

https://provost.wisc.edu/vilas-distinguished-achievement-professorship/

Vilas Distinguished Achievement Professorships (VDAP) recognize UW–Madison faculty members whose distinguished scholarship has advanced the confines of knowledge, and whose excellence also includes teaching or service. Faculty members receiving this award keep the Vilas Distinguished Achievement Professorship title for the duration of their careers.

[2] Prof Jelena Diakonikolas wins NSF CAREER award.

https://www.cs.wisc.edu/2025/01/29/jelena-diakonikolas-receives-nsf-career-award/

Assistant Professor Jelena Diakonikolas, a researcher in the field of large-scale optimization, recently received a National Science Foundation (NSF) CAREER Award for her proposal “Optimization and Learning with Changing Distributions.” The award comes with $700,000 in funding to support research and education initiatives through 2029.

[3] Prof Paul Barford wins Chancellor’s Teaching Innovation Award.

https://news.wisc.edu/2025-distinguished-teaching-award-recipients-announced/

Twelve faculty members have been chosen to receive this year’s Distinguished Teaching Awards, an honor given out since 1953 to recognize some of the university’s finest educators.

[4] Prof Patrick McDaniel gives Distinguished Lecture at Iowa State

Adversarial Machine Learning: A 10-year Perspective, Distinguished Lecture,

Iowa State University, Ames, IA, January, 2025.

Older video of said talk from UIUC: https://mediaspace.illinois.edu/media/t/1_19ei0f5b/180208991

[5] Prof Miron Livny gives talk on OSG and PATh

OSG (Open Science Grid) and PATh (Partnership to Advance Throughput Computing). PATh is a partnership launched by the NSF in 2020 between UW-Madison Center for High Throughput Computing (CHTC) and the OSG Consortium to advance Throughput Computing. Established in 2005, the OSG Consortium operates a fabric of distributed High Throughput Computing (dHTC) services in support of the National Science & Engineering community. The research collaborations, campuses, national laboratories, and software providers that form the consortium are unified in their commitment to advance open science via these services.

[6] https://www.acm.org/media-center/2025/january/fellows-2024

Dr. Satish Chandra (Google), who did his Ph.D. under former Prof Jim Larus, is one of this year’s ACM Fellows. Congratulations Satish! Satish received our department’s Outstanding Graduate Student Research Award in 1997 (see here). To our knowledge (thanks Prof Tom Reps!), Satish is the third UW CS Ph.D. from the PL group who has been named an ACM Fellow, along with Tom Ball (Microsoft) and G. Ramalingam (Microsoft).

[7] The ACM India Council is delighted to announce Dr. Arkaprava (Arka) Basu of IISc as the recipient of the ACM India Early Career Researcher Award for 2024.

https://www.acm.org/articles/acm-india-bulletins/2025/ecr-award-2024

The ACM India Early Career Researcher (ECR) Award recognizes individuals in early stages of their careers, who have made fundamental, innovative, and impactful contributions to the computing field while primarily working in India. The ECR award carries a prize of ₹15 lakhs. Financial support for this award is provided by Persistent Foundation.

Basu is being recognized “for creative and impactful contributions towards efficient memory management for GPUs and CPUs through innovations spanning the software and hardware stacks.” Arkaprava is a leading systems researcher in India with a track record of consistently publishing impactful research in flagship conferences and exhibiting leadership roles at the top conferences of his field. Arkaprava’s research focuses on the software and the hardware of Graphics Processing Units (GPUs), which is the workhorse for most AI tasks in today’s data-driven world. His work takes a holistic view of a system design by cutting across the software and hardware stack from the compiler, runtimes, and operating system to the hardware architecture. Besides top academic publications, he also has deep collaborations with and long term consultancy for tech giants in the field.

Basu is an associate professor at the Indian Institute of Science (IISc), Bangalore. Before joining IISc in 2018, he worked as a full-time member of the research staff at AMD Research in Austin, USA, for four years. Earlier, Arkaprava obtained his PhD in computer science from the University of Wisconsin-Madison, USA in December 2013.

Dissertation: https://research.cs.wisc.edu/multifacet/theses/arka_basu_phd.pdf

The ECR award recipient is chosen by a jury comprising eminent personalities in the computing community. The jury follows a rigorous process wherein each nomination is individually reviewed by all the committee members, then collated and deliberated upon in multiple evaluation rounds, and finally decided in the presence of a Steering Committee-appointed observer.

[8] SET-PAiREd: Designing for Parental Involvement in Learning with an AI-Assisted Educational Robot

Hui-Ru Ho, Nitigya Kargeti, Ziqi Liu, Bilge Mutlu

https://arxiv.org/pdf/2305.02525

AI-assisted learning companion robots are increasingly used in early education. Many parents express concerns about content appropriateness, while they also value how AI and robots could supplement their limited skill, time, and energy to support their children’s learning. We designed a card-based kit, SET, to systematically capture scenarios that have different extents of parental involvement. We developed a prototype interface, PAiREd, with a learning companion robot to deliver LLM-generated educational content that can be reviewed and revised by parents. Parents can flexibly adjust their involvement in the activity by determining what they want the robot to help with. We conducted an in-home field study involving 20 families with children aged 3–5. Our work contributes to an empirical understanding of the level of support parents with different expectations may need from AI and robots and a prototype that demonstrates an innovative interaction paradigm for flexibly including parents in supporting their children.

VeriPlan: Integrating Formal Verification and LLMs into End-User Planning

Christine Lee, David Porfirio, Jessica Wang, Kevin Chenkai Zhao, Bilge Mutlu

Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features—a rule translator, flexibility sliders, and a model checker—that engage users in the verification process. Through a user study ($n=12$), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

Bridging Generations using AI-Supported Co-Creative Activities

Callie Kim, Arissa Sato, Nathan White, Hui-Ru Ho, Christine Lee, Yuna Hwang, Bilge Mutlu

Intergenerational collaboration between grandparents and grandchildren can be challenging due to differences in technological familiarity. AI has emerged as a promising tool to support collaborative activities, offering flexibility and creative assistance, but its role in facilitating intergenerational collaboration remains underexplored. In this study, we conducted a user study with 29 grandparent-grandchild groups engaged in AI-supported story creation to examine how AI-assisted co-creation can facilitate collaboration. Our findings show that grandchildren managed the technical aspects, while grandparents contributed creative ideas and guided the storytelling. AI played a key role in structuring the activity, facilitating brainstorming, enhancing storytelling, and balancing the contributions of both generations. The process fostered mutual appreciation, with each generation recognizing the strengths of the other, leading to an engaging and cohesive co-creation process. We offer design implications for integrating AI into intergenerational co-creative activities, emphasizing how AI can enhance collaboration across skill levels and technological familiarity.

[9] “GOLDYLOC: Global Optimizations & Lightweight Dynamic Logic for Concurrency”

Authors: Suchita Pati, Shaizeen Aga, Nuwan Jayasena, Matthew D. Sinclair

Venue: ACM Transactions on Architecture & Code Optimization (ACM TACO)

Abstract: Modern accelerators like GPUs are increasingly executing independent operations concurrently to improve the device’s compute utilization. However, effectively harnessing it on GPUs for important primitives such as general matrix multiplications (GEMMs) remains challenging. Although modern GPUs have significant hardware and software support for GEMMs, their kernel implementations and optimizations typically assume each kernel executes in isolation and can utilize all GPU resources. This approach is highly efficient when kernels execute in isolation, but causes significant resource contention and slowdowns when kernels execute concurrently. Moreover, current approaches often only statically expose and control parallelism within an application, without considering runtime information such as varying input size and concurrent applications — often exacerbating contention. These issues limit performance benefits from concurrently executing independent operations. Accordingly, we propose GOLDYLOC , which considers the global resources across all concurrent operations to identify performant GEMM kernels, which we call globally optimized (GO)-Kernels. Moreover, GOLDYLOC introduces a lightweight dynamic logic which considers the dynamic execution environment for available parallelism and input sizes to execute performant combinations of concurrent GEMMs on the GPU. Overall, GOLDYLOC improves performance of concurrent GEMMs on a real GPU by up to 2× (18% geomean per workload) and provides up to 2.5x (43% geomean per workload) speedups over sequential execution.

[10] Prof Matt Sinclair participated in a panel entitled “AI’s Energy Future: Balancing Growth, the Grid, and the Environment” and gave a talk “Designing Hardware to Meet the Demands of Machine Learning & Artificial Intelligence” at a WEI event on AI and energy.

Abstract: As use of artificial intelligence grows, so does the demand for electricity and water to power and cool hyper-scale data centers. Many utilities have proposed building additional fossil fuel power plants or keeping older ones running to meet this new demand. Meanwhile, most major computing companies have made commitments to power their operations with clean energy. In this Forum, we’ll explore the potential demands that the growth of AI and data centers may place on the grid and what can be done to mitigate its environmental impact (e.g. improving energy efficiency of computing, or locating and operating data centers in ways that maximize use of renewable energy). How much will electricity demand from data centers truly grow? How might that growth affect both the environment and other electricity customers? And what can we do to ensure that we address that growth in the most sustainable way?

Link: https://energy.wisc.edu/events/ais-energy-future-balancing-growth-grid-and-environment-forward-energy-forum

[11] PhD Theses recently completed include:

Rishabh Khandelwal

Assessing and Enhancing Users’ Online Data Privacy

Advisor: Fawaz, Kassem

Committee: Ramanathan, Parameswaran (Parmesh); Fawaz, Kassem; Chatterjee; Rahul; Zhao, Yuhang

https://search.library.wisc.edu/digital/AGUNKPQPZ23VHP8Y

Anthony Lawrence George Rebello

Correctness and Performance with Declarative System Calls

Advisor: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi

Committee: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi; Swift, Michael; Venkataraman, Shivaram; Sinclair, Matthew

https://search.library.wisc.edu/digital/AWAYYAKZRTR7KC8U

Roger Waleffe

Algorithms and Systems for Scalable Machine Learning over Graphs

Advisor: Wright, Steve

Committee: Wright, Steve; Rekatsinas, Theodoros; Venkataraman, Shivaram; Papailiopoulos, Dimitris

https://asset.library.wisc.edu/1711.dl/ZVOS5EFYR2LHA8L/R/file-000e8.pdf

Edward Barton

Practical Single Shot High Dynamic Range Imaging

Advisor: Hu, Yu Hen

Committee: Hu, Yu Hen; Lee, Yong Jae; Li, Yin; Chen, Yudong

https://www.linkedin.com/in/edwardabarton/

Zhenmei Shi

Understanding Training and Adptation in Feature Learning: From Two-layer Networks to Foundation Models

Advisor: Liang, Yingyu

Committee: Liang, Yingyu; Zhu, Xiaojin; Sala, Frederic; Li, Yin

https://search.library.wisc.edu/digital/ASBHFA7AP6WTKM8K

Jing Liu

Scale, Performance, and Fault-Tolerance in a Filesystem Semi-Microkernel

Advisor: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi

Committee: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi; Swift, Michael; Yu, Xiangyao; Fawaz, Kassem; Zhang, Irene (MSR)

https://asset.library.wisc.edu/1711.dl/4GNA6MNYXHEDM8Y/R/file-2b9e2.pdf

[12] Masters Theses (or Projects)

Tommy Chang

Shivaram Venkataraman

EVA: Cost-Efficient Cloud-Based Cluster Scheduling

Neeraj Surawar

Matt Sinclair

(Project) NUMA-Aware Queue Scheduler for Multi-Chiplet GPUs

Jean-Charles Noirot Ferrand

Patrick McDaniel

Extracting the Harmfulness Classifier of Aligned LLMs

Kunyang Li

Patrick McDaniel

On Adversarial and Common Robustness of Parameter-Efficient Fine-Tuning Strategies

Jingxin Du

Prof. Kevin Ponto

Towards the Recreation of More Realistic 3D Virtual Museum Tours

Muhammad Danial Saleem

Thesis (6 total credits)

Xiangyao Yu

Evaluating and Improving Bioinformatics Tools Using DBMS Techniques

Sriram Ashokkumar

Thesis (6 total credits)

Dan Negrut

Project Topic: Multimodal Models and LLMs with robotics using Simulation for Training

Shashwatha Mitra Ghante Banuprakash

Thesis (6 total credits)

Rishab Goyal

Lattice based Cryptography

Binwei Yao

Thesis (6 total credits)

Junjie Hu

Multi-Community Preference Alignment of LLMs

Aditya Das Sarma

Thesis (6 total credits)

Rahul Chatterjee

Multi-bit flip-flop clustering for power optimization

Jaden Park

Thesis (6 total credits)

Yong Jae Lee

Data Contamination in Vision Language Models

Rose Ceccio

Thesis (6 total credits)

Rahul Chatterjee

The technology used in intimate partner surveillance

Shengyuan Yang

Thesis (6 total credits)

Thomas Reps

Precise Pointer Analysis for Java Streams

Tara Bobinac

Thesis (6 total credits)

Junjie Hu

Leveraging LLMs to Gauge Perspectives on Social Issues in Latinx and American Communities

Prakriti Garg

Thesis (6 total credits)

Sushmita Roy

Graph representation learning framework for principled analysis of orthologous regulatory regions

David Parra

Project (3 total credits)

Mohit Gupta

Practical 3D Time-of-flight imaging with single-photon cameras.

Keith Johnson

Thesis (6 total credits)

Aws Albarghouthi

Tools and Techniques for Semantics-Guided Synthesis

Alexander Peseckis

Project (3 total credits)

Michael Gleicher

Occlusion-Conscious Viewpoint Optimization

Binwei Yao

Thesis (6 total credits)

Junjie Hu

Pluralistic Alignment of Large Language Models

Suenggwan Jo

Thesis (6 total credits)

Fred Sala

Foundation Model usage for forecasting

[13] ICRA 2025 paper

“Shape-programming Robotic Reflectors for Wireless Networks”

Yawen Liu, Akarsh Prabhakara, Jiangyifei Zhu, Shenyi Qiao, Swarun Kumar

Shape-programming Robotic Reflectors for Wireless Networks
(link not working, yet?)

[14] We had one paper accepted at ICLR and one at NAACL. The latter was led by Yijing Zhang, who was an undergraduate in our program.

Changho Shin, John Cooper, Frederic Sala, “Weak-to-Strong Generalization Through the Data-Centric Lens,” ICLR 2025

https://arxiv.org/abs/2412.03881

Weak-to-Strong Generalization Through the Data-Centric Lens: The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.

Yijing Zhang, Dyah Adila, Changho Shin, Frederic Sala, “Personalize Your LLM: Fake It Then Align It,” NAACL 2025 Findings

https://yijingz02.github.io/files/CHAMELEON_NAACL_.pdf

Personalize Your LLM: Fake It Then Align It: Personalizing large language models (LLMs) is essential for delivering tailored interactions that improve user experience. Many existing personalization methods require fine-tuning LLMs for each user, rendering them prohibitively expensive for widespread adoption. Although retrieval-based approaches offer a more compute-efficient alternative, they still depend on large, high-quality datasets that are not consistently available for all users. To address this challenge, we propose CHAMELEON, a scalable and efficient personalization approach that uses (1) self-generated personal preference data and (2) representation editing to enable quick and cost-effective personalization. Our experiments on various tasks, including those from the LaMP personalization benchmark, show that CHAMELEON efficiently adapts models to personal preferences, improving instruction-tuned models and outperforms two personalization baselines by an average of 40% across two model architectures.

December 2024

[1] “From Foe to Friend: The Surprising Turn of Mega Constellations in Radio Astronomy”

HotNets ‘25

Cheap spaceflight has ushered in an explosive growth era for Low Earth Orbit (LEO) satellites. While this has brought us LEO satellite megaconstellations for ubiquitous highspeed data, it has also enabled a proliferation of nanosatellites (e.g. CubeSats) launched by diverse organizations. An unfortunate side-effect is harmful interference to sensitive receivers like those of radio astronomy — no place on Earth is safe. How can we enjoy the fruits of the satellite revolution without blinding ourselves to the secrets of the universe?

Networking is the key. This paper proposes InOrbitNet, which aggregates and backhauls traffic from low-capability nanosatellites using highly-capable LEO megaconstellations. By simulating LEO and nanosatellite orbit transitions, we show that orders-of-magnitude reductions in latency and significant increases in capacity are possible as compared to the current non-networked direct-to-ground approach. But more importantly, because LEO megaconstellations are highly capable and tightly managed, this consolidation of RF footprints also allows radio astronomy to be protected from interference.

https://conferences.sigcomm.org/hotnets/2024/papers/hotnets24-62.pdf

and

“WiFi Physical Layer Stays Awake and Responds When it Should Not”

WF-IoT

WiFi communication should be possible only between devices inside the same network. However, we find that all existing WiFi devices send back acknowledgments (ACKs) to even fake packets received from unauthorized WiFi devices outside of their network. Moreover, we find that an unauthorized device can manipulate the power-saving mechanism of WiFi radios and keep them continuously awake by sending specific fake beacon frames to them. Our evaluation of over 5000 devices from 186 vendors confirms that these are widespread issues. We believe these loopholes cannot be prevented, and hence they create privacy and security concerns. Finally, to show the importance of these issues and their consequences, we implement and demonstrate two attacks where an adversary performs battery drain and WiFi sensing attacks just using a tiny WiFi module which costs less than ten dollars.

https://arxiv.org/abs/2301.00269

[2] https://chtc.cs.wisc.edu/unravelling-antibiotic-resistance.html

“I have stuck with the OSPool because the access is unprecedented—the amount of compute time you can get, the number of jobs you can run at a time”

Erik Wright, Associate Professor of Biomedical Informatics at the University of Pittsburgh

As bacterial infections evolve to develop resistance to drugs, it is critical to investigate methodologies to slow resistance and, potentially, even reverse it. In his quest to discover new antibiotics to counter resistance, Erik Wright’s project, BioMedInfo, aims to compare bacteria with resistant genomes by grouping them into a network of genes with related functions.

November 2024

[1] ACSAC 2024 Test of Time Award

Year: ACSAC 2009

Title: Semantically Rich Application-Centric Security in Android

Authors: Ongtang, Machigar; McLaughlin, Stephen; Enck, William; McDaniel, Patrick

“On the occasion of ACSAC’s 35th Anniversary, the Steering Committee instituted the inaugural Test of Time Awards. These awards provide an opportunity to honor papers that have been published at ACSAC that have had enduring significance and impact to the security community. The committee considered papers published more than 15 years ago, and discussed their impact on academia, industry, and government.”

From: https://www.acsac.org/2023/program/test_of_time/

[2] Today was the regional round of this year’s International Collegiate Programming Contest (ICPC). UW-Madison participated with 6 teams at a site on the Epic campus. Our teams placed 1st, 4th, 7th, 8th, 10th, and 20th among the 96 teams in our region. Our top team was the only to solve 9 of the problems. See the scoreboard for more details.

This is the 6th year in a row that UW-Madison places first in the regional competition, and the 16th year since we started participating in 2001. Our top team advances to the North America Championship, where about 15 teams qualify for the world finals. Given our silver medal at the world finals last year, we’re going for gold this year! 😊

Congratulations to all our teams and thanks to the people at Epic for running a regional site on their campus!

The UW-Madison teams that participated in the regional competition:
– Gryffindor (1st place:Anton Cheng, Harry Huang, Tuan Tai Nguyen

– .sort()[-1] (4th place): Tan Bui, HuaiYuan Jing, Merlin Morton

– Mini Soda (7th place): Guanlin Chen, Xinchen Shen, Enze Zhang

– Apples are great (8th place): Brandon Han, Ishan Joshi, Xuan Nghia Nguyen

– JQKA (10th place): Anda He, Kalyan Sriram, Jierui Xu

– chr(sum(range(ord(min(str(not())))))) (20th place): Ansh Aggarwal, Anna Sun, Austin Tsai

[3] Four papers accepted at NeurIPS:

The AlCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators (Spotlight)

Tzu-Heng Huang, Catherine Cao, Vaishnavi Bhargava, Frederic Sala

Abstract: Abstract: Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, Alchemist, obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a 12.9% enhancement while the total labeling costs across all datasets are reduced by a factor of approximately 500×.

https://arxiv.org/abs/2407.11004

OTTER: Effortless Label Distribution Adaptation of Zero-shot Models

Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala

Abstract: Popular zero-shot models suffer due to artifacts inherited from pretraining. One particularly detrimental issue, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have mismatching requirements, such as needing access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like prior matching—often by significant margins—in 17 out of 21 datasets.

https://arxiv.org/abs/2404.08461

Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks

Tianyi Zhang, Linrong Cai, Jeffrey Li, Nicholas Roberts, Neel Guha, Frederic Sala

Abstract: Weak supervision (WS) is a popular approach for label-efficient learning, leveraging diverse sources of noisy but inexpensive weak labels to automatically annotate training data. Despite heavy usage, the value of WS is challenging to benchmark due to its complexity: the knobs involved include data sources, labeling functions (LFs), aggregation techniques, called label models (LMs), and end model pipelines. Existing evaluation suites tend to be limited, focusing on particular components or specialized use cases, or relying on simplistic benchmark datasets with poor LFs, producing insights that may not generalize to real-world settings. We address these by introducing a new benchmark, BoxWRENCH, designed to more accurately reflect real-world usage of WS. This benchmark features (1) higher class cardinality and imbalance, (2) substantial domain expertise requirements, and (3) linguistic variations found in parallel corpora. We improve upon existing benchmark LFs using a rigorous procedure aimed at mimicking real-world settings. In contrast to existing WS benchmarks, we show that in many practical settings supervised learning requires substantial amounts of labeled data to match WS performance.

https://openreview.net/forum?id=c7SApXZz4b#discussion

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Harit Vishwakarma, Yi Chen, Sui Jiet Tay, Satya Sai Srinath Namburi, Frederic Sala, Ramya Korlakai Vinayak

Abstract: Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual annotation. A prominent variant, threshold-based auto-labeling (TBAL), works by finding thresholds on a model’s confidence scores above which it can accurately automatically label unlabeled data. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, we show that such methods fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the optimal TBAL confidence function. We develop a tractable version of the framework to obtain Colander (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of Colander and compare it against methods designed for calibration. Colander achieves up to 60% improvement on coverage over the baselines while maintaining error level below 5% and using the same amount of labeled data.

https://arxiv.org/abs/2404.16188

[4] On Friday, Nov. 1, CS Assistant Professor Sharon Li gave a “Friday Lunch” talk with the UW-Madison Center for the Humanities, titled: “Steering Large Language Models by Human Preferences”.

These informal, dynamic (and popular!) Friday Lunch discussions are hosted over a catered lunch at Memorial Union to foster academic discussion and provide a space for interdisciplinary dialogue and community building across campus. Seats are limited – registration is required and available on a first-come basis.

Abstract: Large language models (LLMs) trained on massive datasets exhibit remarkable abilities, but at the same time, these models can inadvertently generate misinformation and harmful outputs. This concern underscores the urgent challenge of language model alignment: ensuring these models’ behaviors agree with human preferences and safety considerations. In recent years, a spectrum of alignment strategies have emerged, with prominent methods showcasing the effectiveness of reinforcement learning with human feedback (RLHF). RLHF has gained widespread adoption among state-of-the-art models, including OpenAI’s GPT-4, Anthropic’s Claude, Google’s Bard, and Meta’s Llama 2-Chat. A pivotal component within RLHF is proximal policy optimization, which employs an external reward model that mirrors human preferences for its optimization process. Despite the promise, RLHF suffers from unstable and resource-intensive training. Furthermore, the need to repeat PPO training when altering the reward model hinders rapid customization to evolving datasets and emerging needs. In this Friday Lunch talk, Assistant Professor Sharon Li will discuss alternative paradigms on how we can achieve alignment without RL training.

Bio: Sharon Yixuan Li is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. She received a Ph.D. from Cornell University in 2017, advised by John E. Hopcroft. Subsequently, she was a postdoctoral scholar in the Computer Science department at Stanford University. Her research focuses on the algorithmic and theoretical foundations of learning in open worlds. She is the recipient of the AFOSR Young Investigator Program (YIP) award, the NSF CAREER award, the MIT Technology Review TR-35 Award, Forbes “30 Under 30” in Science, and multiple faculty research awards from Google, Meta, and Amazon. Her works received a NeurIPS Outstanding Paper Award, and an ICLR Outstanding Paper Award Honorable Mention in 2022.

[5] “Three Chairs and a Research Booth: How One PhD Advisor Shaped the SC Committee”

by Melyssa Franklin

October, 2024

“An affable, earnest guy, Dr. Barton Miller, the Vilas Distinguished Achievement Professor in the Computer Sciences Department at the University of Wisconsin-Madison (UW), has made an oversized impact on the SC Conference. Since 2012, three of Miller’s graduate students have served as Conference Chair: Dr. Jeffrey Hollingsworth (SC12), Dr. Dorian Arnold (SC23), and now Dr. Phillip Roth (SC24). In addition, many of his advisees have served on the committee in other roles, and their PhD students are now coming along as SC volunteers as well.

Is there something in the water in Madison? What magic words of wisdom has Dr. Miller imparted to his students to spur them to participate in the SC committee and aspire to leadership roles?

Speaking to Dr. Miller, it’s clear his graduate students are some of his favorite people. They become good friends – almost family. And to him, SC is like a family reunion. ‘SC is the center of the HPC universe,” he enthused. “Everything that’s happening in the field is there. It creates community.’”

Read the rest here: https://sc24.supercomputing.org/2024/10/three-chairs-and-a-research-booth-how-one-phd-advisor-shaped-the-sc-committee/

[6] “Apple Day” @ Wisc

November 13th, 2024

“I wanted to share that we hosted an Apple Day on campus next week Wednesday, November 13th. They will provide an information session and then invite students for a 1:1 if they email their resumes to the email provided on the flyer.

Please encourage all students to attend to learn more. Although software development background is their focus for interns, there are other opportunities also.

This is the first engagement since we secured a partnership with their CoreOS team and have lots of opportunity to grow this collaboration. Thanks for sharing and let me know if you have any questions.” -Justin Hines

[7] ‘Trial by fire’: UW-Madison hackathon generates tech ideas in 24 hours
Becky Jacobs, Nov 12 2024

Cap Times

“After 24 hours of programming and coding — fueled by plenty of snacks and energy drinks — the organizers of an annual hackathon event at the University of Wisconsin-Madison announced which of the 55 projects would move on as finalists.”

Read the rest:

https://captimes.com/news/education/trial-by-fire-uw-madison-hackathon-generates-tech-ideas-in-24-hours/article_c5c0e150-a06a-11ef-bb55-7f08b2a130ec.html

MadHacks:
https://www.madhacks.io/

[8] “Contracting with a Learning Agent”

Guru Guruganesh · Yoav Kolumbus · Jon Schneider · Inbal Talgam-Cohen · Emmanouil-Vasileios Vlatakis-Gkaragkounis · Joshua Wang · S. Weinberg

Abstract: Real-life contractual relations typically involve repeated interactions between the principal and agent, where, despite theoretical appeal, players rarely use complex dynamic strategies and instead manage uncertainty through learning algorithms.In this paper, we initiate the study of repeated contracts with learning agents, focusing on those achieving no-regret outcomes. For the canonical setting where the agent’s actions result in success or failure, we present a simple, optimal solution for the principal: Initially provide a linear contract, then switch to a zero-scalar contract. This shift causes the agent to “free-fall” through their action space, yielding non-zero rewards for the principal at zero cost. Interestingly, despite the apparent exploitation, there are instances where our dynamic contract can make \emph{both} players better off compared to the best static contract. We then broaden the scope of our results to general linearly-scaled contracts, and, finally, to the best of our knowledge, we provide the first analysis of optimization against learning agents with uncertainty about the time horizon.

Paper: https://arxiv.org/abs/2401.16198

[9] PRIME grant

PI: Ming LiuAbstract: EFabric provides on-demand supercomputing solutions to empower AI innovation and deployment. We offer agile, flexible, and low-cost access to customized supercomputing clusters for startups, research labs, grad students, and enterprises. EFabric is built atop emerging load-store interconnects, instead of traditional Ethernet/Infiniband. It organizes computation resources into different pools and independently scales them based on users’ needs. You can just focus on building AI applications, tuning AI models, and exploring AI deployments without worrying about the underlying computing infrastructure.

https://d2p.wisc.edu/funding/prime/

[10] “Adversarial Machine Learning: A 10-year Perspective”

Patrick McDaniel

Distinguished Lecture, UIUC

Abstract: Machine learning is revolutionizing technology and society. However, like any technology, it can be misused by adversaries and our understanding of the potential threats remains limited. If machine learning is to part of any critical decision making process, we must be able to detect and mitigate threats at both training and inference phases. Over the past decade, I have studied a myriad of threats to machine learning at all stages of its lifecycle and proposed a variety of methods that bring us closer towards trustworthy machine learning systems. In this talk, I will discuss the arc of adversarial machine learning, how threat models have evolved, and how the community has gained new insights into the root causes of the vulnerabilities present in machine learning systems today.

Bio: Patrick McDaniel is the Tsun-Ming Shih Professor of Computer Sciences in the School of Computer, Data & Information Sciences at the University of Wisconsin-Madison. Professor McDaniel is a Fellow of IEEE, ACM and AAAS, a recipient of the SIGOPS Hall of Fame Award and SIGSAC Outstanding Innovation Award, and the director of the NSF Frontier Center for Trustworthy Machine Learning. He also served as the program manager and lead scientist for the Army Research Laboratory’s Cyber-Security Collaborative Research Alliance from 2013 to 2018. Patrick’s research focuses on a wide range of topics in computer and network security and technical public policy. Prior to joining Wisconsin in 2022, he was the William L. Weiss Professor of Information and Communications Technology and Director of the Institute for Networking and Security Research at Pennsylvania State University.

https://calendars.illinois.edu/detail/1570?eventId=33502561

[11] Reducing the GAP: Improving the Fidelity and Scalability of gem5’s GPU Models

Matt Sinclair

Invited Talk at CAMS ‘24

The breakdown in Moore’s Law and Dennard Scaling is leading to drastic changes in the makeup and constitution of computing systems. For example, a single chip integrates 10-100s of cores and has a heterogeneous mix of general-purpose compute engines and highly specialized accelerators. Traditionally, computer architects have relied on tools like architectural simulators to accurately perform early-stage prototyping and optimizations for the proposed research. However, as systems become increasingly complex and heterogeneous, architectural tools are straining to keep up. In particular, publicly available architectural simulators are often not very representative of the industry parts they intend to represent. This leads to a mismatch in expectations; when prototyping new optimizations researchers may draw the wrong conclusions about the efficacy of proposed optimizations if the tool’s models do not provide high fidelity. Moreover, modeling and simulation tools are also struggling to keep pace with increasingly large, complex workloads from domains such as machine learning (ML).

In this talk, I will discuss our work on improving the open source, publicly available GPU models in the widely used gem5 simulator. gem5 can run entire systems, including CPUs, GPUs, and accelerators, as well as the operating system, runtime, network, and other related components. Thus, gem5 has the potential to allow users to study the behavior of the entire heterogeneous systems. Unfortunately, some of gem5’s publicly available models do not always provide high accuracy relative to their ”real” counterparts, especially for the memory subsystem. I will discuss my group’s efforts to overcome these challenges and improve the fidelity of gem5’s GPU models, as well as our ongoing efforts to scalably run modern ML and HPC workloads in frameworks such as PyTorch and TensorFlow in gem5. Collectively, this work significantly enhances the state-of-the-art and enables more widespread adoption of gem5 as an accurate platform for heterogeneous architecture research.

[12] Two papers at Asiacrypt ‘24

https://asiacrypt.iacr.org/2024/

“Non-interactive Blind Signatures: Post-quantum and Stronger Security”

Foteini Baldimtsi, Jiaqi Cheng, Rishab Goyal, Aayush Yadav

“Leakage-Resilient Incompressible Cryptography: Constructions and Barriers”

Kaartik Bhushan, Rishab Goyal, Venkata Koppula, Varun Narayanan, Manoj Prabhakaran, Mahesh Sreekumar Rajasree

October 2024

[1] The PODS ‘24 program: https://2024.sigmod.org/program_pods.shtml

Award papers include:

Conjunctive Queries with Negation and Aggregation: A Linear Time Characterization (Hangdong Zhao, Austen Z. Fan, Xiating Ouyang, Paraschos Koutris)
Tight Bounds of Circuits for Sum-Product Queries (Austen Z. Fan, Paraschos Koutris, Hangdong Zhao) – invited to JACM.
Topology-aware Parallel Joins (Xiao Hu, Paraschos Koutris)

[2] The Belonging & Inclusion Best Paper Award at UIST 2024
https://uist.acm.org/2024/

Title: CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

Authors: Jaewook Lee, Andrew D Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, and Yuhang Zhao

Abstract: Cooking is a central activity of daily living, supporting independence as well as mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV), we present CookAR, a head-mounted AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools. To design and implement CookAR, we collected and annotated the first egocentric dataset of kitchen tool affordances, fine-tuned an affordance segmentation model, and developed an AR system with a stereo camera to generate visual augmentations. To validate CookAR, we conducted a technical evaluation of our fine-tuned model as well as a qualitative lab study with 10 LV participants for suitable augmentation design. Our technical evaluation demonstrates that our model outperforms the baseline on our tool affordance dataset, while our user study indicates a preference for affordance augmentations over the traditional whole object augmentations.

[3] September was a busy month of talks/presentations in India and the Nederlands for Prof Miron Livny, including:

“OSG and Future of Distributed Computing in the Age of Cloud” at the 8th Asia Tier Centre Forum (ACTF8) in Mumbai, India
“Job Lists in HTCondor {Y[i]=F(X[i]),i=1,n}” at the 2nd Asia HTCondor Workshop in Mumbai, India
“CHTC – Enabling Throughput Computing at All Scales” IIT Bombay, India
“CHTC – Advancing Throughput Computing for the Physical Sciences” Tata Institute of Fundamental Research (TIFR) Mumbai, India
“Introduction to HTCondor and OSG” tutorial on IGWN-DHTC-OSG at the Inter University Centre for Astronomy and Astrophysics (IUCAA) in Pune, India
“Placing a List of Jobs (Work in Progress)” Utrecht Nikhef HTCondor Tutorial” University of Utrecht, Nederland
“Philosophy and Architecture” Utrecht Nikhef HTCondor Tutorial University of Utrecht, Nederland
“Placing a List of Jobs (Work in Progress)” Nikhef HTCondor Tutorial – Amsterdam , National Institute for Subatomic Physics, (Nikhef), Amsterdam, Nederland
“Philosophy and Architecture” Nikhef HTCondor Tutorial – Amsterdam , National Institute for Subatomic Physics, (Nikhef), Amsterdam, Nederland
“What the manual does not tell you – Philosophy and Architecture” 10th European HTCondor workshop, Amsterdam, Nederland
“What the manual does not tell you – History” 10th European HTCondor workshop, Amsterdam, Nederland
“CHTC Vision: Compute and Data Together” 10th European HTCondor workshop, Amsterdam, Nederland

CHTC did not organize these events; all were organized by the locals with CHTC support. In Europe there were more than 60 attendees.

[4] Prof Cai’s papers include

”Dichotomy for Holant* Problems with One Ternary Function on Domain Size 3” Information & Computation (with Pinyan Lu and Mingji Xia)
“Shor’s Algorithm Does Not Factor Large Integers in the Presence of Noise”

SCIENCE CHINA

“Planar #CSP Equality Corresponds to Quantum Isomorphism – A Holant Viewpoint”

ACM Transactions on Computation Theory (with Ben Young)

[5] https://www.gem5.org/events/isca-2024

Prof Matt Sinclair (and student) hosted a tutorial on using gem5 at ISCA 2024 this past week.

Notably, this tutorial (and 2 talks we gave at a concurrent workshop) were the first time his group discussed publicly how they’ve gotten PyTorch and TensorFlow to run at scale in gem5.

[6] Prof Fredrikson (Prof Jha’s student) receives tenure at CMU

Info: https://x.com/jhasomesh/status/1838708177476493639

Prof Fredrikson’s home page: https://mattfredrikson.com/

His research aims to enable systems that make secure, fair, and reliable use of machine learning. His group focuses on finding ways to understand the unique risks and vulnerabilities that arise from learned components, and on developing methods to mitigate them, often with provable guarantees.

[7] We are excited to announce this year’s Wisconsin Database Affiliates Workshop! For detailed information and the official schedule, please visit the workshop’s website. You may also RSVP here.

The workshop took place on October 10 (Thursday), 8:30am–4:00pm, in Union South Room Northwoods (3rd floor). This year’s event included:

– Research talks from faculty and graduate students

– Industry talks from affiliate companies

– A poster and demo session

– Free breakfast and lunch

[8] Jin-Yi Cai chaired the award committee for the 2024 FOCS Test of Time Awards committee. See https://focs.computer.org/2024/awards/ for details.

The members of the committee are

Julia Chuzoy (Toyota Technological Institute at Chicago),

Costis Daskalakis (MIT),

Faith Ellen (University of Toronto),

Shafi Goldwasser (UC Berkeley), and

Leonard Schulman (Caltech).

[9] As organized and related to us by Justin Hines: On Oct 1st, the night before the career fair, we hosted our inaugural Student Org Reception to more casually introduce our student leaders to Employers. We had six employers and over 50 students attend. We invited every active student org across CDIS. We will continue to innovate for the students as they continue to participate in our offerings.

[10] https://www.nsf.gov/awardsearch/showAward?AWD_ID=2431419

“CloudLab Phase-IV: Strengthening a Core Resource for Research and Education”

Prof Ming Liu and Prof Shiv Venkataraman

$12m

Cloud technologies underpin many of the apps and computing services that we use every day, including social networking, online commerce, email, video conferencing, and many more. To be able to do research that drives forward the fundamental architecture of the cloud, scientists need an environment in which they can experiment with the primary building blocks of the cloud: compute power, storage, and networks. CloudLab provides an environment where researchers can build their own clouds, observing and changing how they operate all the way to the bottom layers of software.

CloudLab is a large facility, consisting, as of the start of this award, of more than 1,800 servers hosted at the University of Utah, Clemson University, and the University of Wisconsin-Madison. These servers include cutting-edge compute (such as several different processor architectures, graphics processing units, and field programmable gate arrays), networking (OpenFlow, user-programmable switching), and storage (hard drives, solid state drives) technologies. This phase IV award will support the expansion of this facility by several hundred servers to meet high demand. It will also add new technologies, such as Internet of Things devices, programmable network cards, and non-volatile random access memory, which will in turn support research on data-intensive computing and computing at the edge of the network.

CloudLab is available to researchers who work on the fundamentals of cloud computing. These users do research in areas such as networking, security, and databases; in turn, this work has broad impact, as these are the fundamental technologies upon which we build things such as smart cities, telehealth, online education, and other socially important computer services. Because it is offered at no cost for research and education to academic users, it acts as a level playing field on which institutions large and small, and those with many resources and those with few, can conduct work on an equal footing.

CloudLab can be found online at https://cloudlab.us/, which is the primary interface through which research users interact with the facility, making it accessible from anywhere. This award will support continued operation of the facility, including the aforementioned hardware expansion, development of new features, and user support.

This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.

[11] “Learning to Price Homogeneous Data”

Keran Chen, Joon Suk Huh, Kirthevasan Kandasamy

Accepted at NeurIPS

Abstract: We study a data pricing problem, where a seller has access to N homogeneous data points (e.g. drawn i.i.d. from some distribution). There are m types of buyers in the market, where buyers of the same type i have the same valuation curve vi : [N] → [0, 1], where vi(n) is the value for having n data points. A priori, the seller is unaware of the distribution of buyers, but can repeat the market for T rounds so as to learn the revenue-optimal pricing curve p : [N] → [0, 1]. To solve this online learning problem, we first develop novel discretization schemes to approximate any pricing curve. When compared to prior work, the size of our discretization schemes scales gracefully with the approximation parameter, which translates to better regret in online learning. Under assumptions like smoothness and diminishing returns which are satisfied by data, the discretization size can be reduced further. We then turn to the online learning problem, both in the stochastic and adversarial settings. On each round, the seller chooses an anonymous pricing curve pt. A new buyer appears and may choose to purchase some amount of data. She then reveals her type only if she makes a purchase. Our online algorithms build on classical algorithms such as UCB and FTPL, but require novel ideas to account for the asymmetric nature of this feedback and to deal with the vastness of the space of pricing curves. Using the improved discretization schemes previously developed, we are able to achieve O(m√ T ) regret in the stochastic setting and O(m^{3/2}√ T ) regret in the adversarial setting.

[12] “Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs”

Matthew Zurek, Yudong Chen
NeurIPS ‘24.

We study the sample complexity of learning an ε-optimal policy in an average-reward Markov decision process (MDP) under a generative model. For weakly communicating MDPs, we establish the complexity bound O˜(SAHε2), where H is the span of the bias function of the optimal policy and SA is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters S,A,H, and ε, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. We also initiate the study of sample complexity in general (multichain) average-reward MDPs. We argue a new transient time parameter B is necessary, establish an O˜(SAB+Hε2) complexity bound, and prove a matching (up to log factors) minimax lower bound. Both results are based on reducing the average-reward MDP to a discounted MDP, which requires new ideas in the general setting. To optimally analyze this reduction, we develop improved bounds for γ-discounted MDPs, showing that O˜(SAH(1−γ)2ε2) and O˜(SAB+H(1−γ)2ε2) samples suffice to learn ε-optimal policies in weakly communicating and in general MDPs, respectively. Both these results circumvent the well-known minimax lower bound of Ω˜(SA1(1−γ)3ε2) for γ-discounted MDPs, and establish a quadratic rather than cubic horizon dependence for a fixed MDP instance.

[13] Received an NIH (Smart and Connected Health R01) award to fund our research on designing and developing AI-powered AR systems to support low vision people in complex activities of daily living.

Title: SCH: AI-Assisted Vision: Scene-Aware Augmented Reality Systems to Support Low Vision People in Activities of Daily Living

PIs: Yuhang Zhao (PI), Yapeng Tian (co-PI) from UT-Dallas, Jon Froehlich (co-PI) from UWashington, Sanbrita Mondal (co-I) from UW-Madison

Abstract: Uncorrectable vision loss, a.k.a. low vision, can significantly impact activities of daily living (ADLs), affecting independence, health, and social well-being. Magnifiers are the most commonly used low vision aids, however, they largely reduce visual field. Augmented reality (AR) glasses present a unique opportunity for low vision technology by intelligently recognizing the environment and enhancing users’ vision based on their contexts. However, existing AR solutions mainly rely on image processing techniques, generating pure pixel-level augmentations to the whole scene, such as pixel remapping for visual field loss, and recoloring based on distance. These solutions arbitrarily alter the user’s full visual field with no semantic understanding of the scene, distorting users’ natural vision and diminishing important visual details. As a result, they cannot effectively support complex ADLs that involve constant motion and object interactions, such as cooking in the kitchen, or safely navigating a crowded street. The overall objective of this proposal is to fundamentally advance low vision technology by leveraging state-of-the-art Al techniques and presenting unobtrusive scene-aware AR augmentations that are tailored to users’ environments and visual abilities. Contextualized in two key ADLs (meal preparation, outdoor mobility), the research will investigate how to recognize and augment visual elements in AR from three fundamental dimensions via three specific aims: (A 1) Enhancing perception of visual salience, (A2) Enhancing object affordance for safe and efficient interactions, (A3) Enhancing awareness of environmental dynamics. In each aim, the research will contribute to both HCI and Al by designing scene-aware AR augmentations with low vision users and rehabilitation professionals, and developing egocentric video datasets and Al models to enable fast and robust scene recognition in AR. In the final year, a holistic AR system will be developed and evaluated for real world impact and limitations via field studies (A4). With the Al-powered AR systems for low vision, the project will contribute to the mission of NEI by fundamentally transforming conventional low vision aids and improving independence and quality of life for low vision people in various challenging ADLs. The involvement of low vision participants and rehabilitation professionals throughout the research will also increase their exposure to state-of-the-art technology, potentially increasing the technology acceptance and adoption in low vision rehabilitation.

[14] Presented at ECCV ‘24

https://eccv.ecva.net/

“Photon Inhibition for Energy-Efficient Single-Photon Imaging” (link to details)

Lucas J. Koerner, Shantanu Gupta, Atul Ingle, Mohit Gupta

Single-photon cameras (SPCs) are emerging as sensors of choice for various challenging imaging applications. One class of SPCs based on the single-photon avalanche diode (SPAD) detects individual photons using an avalanche process; the raw photon data can then be processed to extract scene information under extremely low light, high dynamic range, and rapid motion. Yet, single-photon sensitivity in SPADs comes at a cost — each photon detection consumes more energy than that of a CMOS camera. This avalanche power significantly limits sensor resolution and could restrict widespread adoption of SPAD-based SPCs. We propose a computational-imaging approach called photon inhibition to address this challenge. Photon inhibition strategically allocates detections in space and time based on downstream inference task goals and resource constraints. We develop lightweight, on-sensor computational inhibition policies that use past photon data to disable SPAD pixels in real-time, to select the most informative future photons. As case studies, we design policies tailored for image reconstruction and edge detection, and demonstrate, both via simulations and real SPC captured data, considerable reduction in photon detections (over 90% of photons) while maintaining task performance metrics. Our work raises the question of “which photons should be detected?”, and paves the way for future energy-efficient single-photon imaging.

“Light-in-Flight for a World-in-Motion” (link to details)

Jongho Lee, Ryan J. Suess, Mohit Gupta

Although time-of-flight (ToF) cameras are becoming the sensor-of-choice for numerous 3D imaging applications in robotics, augmented reality (AR) and human-computer interfaces (HCI), they do not explicitly consider scene or camera motion. Consequently, current ToF cameras do not provide 3D motion information, and the estimated depth and intensity often suffer from significant motion artifacts in dynamic scenes. In this paper, we propose a novel ToF imaging method for dynamic scenes, with the goal of simultaneously estimating 3D geometry, intensity, and 3D motion using a single indirect ToF (I-ToF) camera. Our key observation is that we can estimate 3D motion, as well as motion artifact-free depth and intensity by designing optical-flow-like algorithms that operate on coded correlation images captured by an I-ToF camera. Through the integration of a multi-frequency I-ToF approach with burst imaging, we demonstrate high-quality all-in-one (3D geometry, intensity, 3D motion) imaging even in challenging low signal-to-noise ratio scenarios. We show the effectiveness of our approach through thorough simulations and real experiments conducted across a wide range of motion and imaging scenarios, including indoor and outdoor dynamic scenes.

[15] Prof Matt Sinclair gave an invited talk at the DOE Energy-Efficient Computing for Science Workshop (https://web.cvent.com/event/a3dd901a-699e-408c-8a84-81445e6ea64f/summary).

Title: Creating Flexible, High Fidelity Energy Modeling for Future HPC Systems

Authors: Matthew D. Sinclair, Bobby Bruce, William Godoy, Oscar Hernandez, Jason Lowe-Power, and Shivaram Venkataraman

Abstract: Energy efficient, large scale, co-designed systems will be necessary in the future. As we increasingly adopt AI, workloads are increasingly bounded by new scaling laws, where the loss scales as a power-law with model size, dataset size, and the amount of compute used for training. Thus, to improve accuracy, scaling requires exponentially increasing resources. This shift means that AI growth requires more energy at a faster pace in a manner distinct from prior patterns, making energy efficiency even more critical. Accordingly, future HPC system designs need to optimize increasingly heterogeneous hardware and applications for energy consumption, without significantly compromising performance. To drive these efforts, there is a need for credible, open-source infrastructure that can study the energy effects of new systems early in the design process, before silicon is produced. Such infrastructure will help study concrete changes to the existing state-of-the-art and evaluate the potential delta of more radical changes. Traditionally, developers rely on simulation and modeling techniques to model and prototype energy consumption. Unfortunately, the tools needed to co-design systems for performance and energy are lacking for both existing and future systems. In this talk, we discuss our vision to design new energy models which can improve accuracy for post-tape out hardware, first principle methods for estimating energy use in future systems by leveraging open-source hardware, and finally integration of energy models into tools such as simulators and development environments.

Link: https://drive.google.com/file/d/1aVnUe9hOqNw8DmWRP9aCbf6P9CTb_9B4/view

[16] CDIS transition announced: https://cdis.wisc.edu/erickson-steps-down-arpaci-dusseau-to-lead-school-of-computer-data-information-sciences/

Tom Erickson, founding director of the School of Computer, Data & Information Sciences (CDIS) and executive associate dean for strategy and innovation at the University of Wisconsin–Madison’s College of Letters & Science, has announced he will step down from his leadership roles effective January 2025. Remzi Arpaci-Dusseau, the Grace Wahba and Vilas Distinguished Achievement Professor of Computer Sciences, will assume leadership of CDIS.

September 2024

[1] On behalf of the USENIX Security Test of Time Award Committee, I’m delighted to inform you that your paper, “Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing,” originally presented at USENIX Security ’14, has won the 2024 USENIX Security Test of Time Award. The award recognizes outstanding work in computing systems security and privacy research that has had a lasting impact on the community. Congratulations!

https://www.usenix.org/conferences/test-of-time-awards

[2] “Uncertainty and my scientific journey”

Aws Albarghouthi

Programming Languages Mentoring Workshop @ PLDI

https://www.youtube.com/watch?v=kYZiPZ3EwRc

Abstract: I will walk you through my academic journey from graduate school to a tenured faculty position. Reflecting upon the past 15 years, I will highlight the recurring theme of uncertainty, and how it arose in unexpected ways along the way. I will talk about how uncertainty can be a significant source of mental stress and how I’ve tried to understand it and navigate it in different ways.

[3] “UW–Madison Awarded National Center of Academic Excellence in Cyber Research (CAE-R) Designation”

The University of Wisconsin–Madison has been designated as a National Center of Academic Excellence in Cyber Research (CAE-R) by the National Security Agency (NSA). This prestigious designation underscores UW–Madison’s commitment to advancing cybersecurity education and research and recognizes its position as a leader in the field.

“This designation recognizes that our faculty, staff, and students are on the forefront of Cyber Research, and it is amongst the most important research that we do,” says Eric Wilcots, Dean of UW–Madison’s College of Letters & Science. “As a Center of Academic Excellence, we will have the opportunity to address hard questions and deepen our impact.”

https://ls.wisc.edu/news/uw-madison-awarded-prestigious-national-center-of-academic-excellence-for-cyber-research-designation

[4] The CS department plans to make the following awards in the inaugural year:

Rakesh Agarwal — research leadership and excellence – https://en.wikipedia.org/wiki/Rakesh_Agrawal_(computer_scientist)
Eric Harslem — tech leadership + philanthropy in computing (Austin area) – https://uteach.utexas.edu/profiles/eric-harslem
Brian Pinkerton — entrepreneurship – https://www.cs.wisc.edu/staff/pinkerton-brian/
Brent Seales — applied research with community impact – https://www.engr.uky.edu/directory/seales-brent

[5] https://www.nsf.gov/awardsearch/showAward?AWD_ID=2404180

Prof Yong Jae Lee

RI: Small: Understanding and Advancing the Generalization Capabilities of Fake Image Detectors

Abstract: This project develops an integrated research, education, and outreach program to provide a foundation for understanding and advancing the generalization capabilities of fake (AI-generated) image detectors. With the rise and maturity of artificial intelligence (AI) generative models, fake images are proliferating in the digital world at an unprecedented rate. Fake images can cause harm in a variety of ways by deceiving, manipulating, and misinforming individuals and society at large. For example, they can be used for blackmail, disinformation, or financial fraud, etc. To make matters worse, there is no longer a single source of fake images. Synthesized images could take the form of realistic human faces generated using one type of fake image generator or they could take the form of complex scenes generated using another type of fake image generator. One can be almost certain that there will be more methods developed for generating fake images coming in the future. To combat the potential harm caused by fake images, this project aims to advance technology in image forensics, and build a deep understanding as to how AI-generated images differ from real images. In addition to scientific impact, this project performs complementary educational and outreach activities that engage students in research and STEM.

This research provides a foundation for gaining a deeper understanding of, and for advancing fundamental research in, detecting AI-generated images. In particular, it will focus on the generalization properties of fake image detectors. Specifically, it will investigate two main thrusts: (Thrust I) Understanding what makes fake AI-generated images fake, including the role of generative models vs. data, a novel benchmark and toolbox for universal fake image detection, and understanding the features that a fake detector focuses on. In Thrust II, the project will advance universal fake AI-generated image detection, including a novel frequency masking strategy, a few-shot adaptation approach for learning with only a few training examples, and extensions to video to account for spatio-temporal aspects of realism. Both thrusts will be studied in the context of creating generalizable (universal) fake image detection algorithms. The investigators’ wealth of experience in generative models and understanding model robustness, as well as initial work in this space, make them well-positioned to formulate and solve the relevant challenges, and lays the groundwork for the project.

[6] Invited Talk at ModSim (Workshop on Modeling & Simulation of Systems and Applications)

Title: Designing Better Tools for Power- and Sustainability-Aware Co-Design

Prof Matt Sinclair

Abstract: With the waning of Moore’s Law and the end of Dennard’s Scaling, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly turning towards heterogeneity to continue scaling performance and energy efficiency. Thus, modeling increasingly large, complex, diverse accelerators is critical to continued computer architecture and systems innovation. The rise in specialized, compute-heavy workloads like machine learning has given enhanced prominence to hardware accelerators. Thus, the next decades of hardware and system software research must balance efficiency through specialization and flexibility through generalization. To drive those innovations, the community requires credible, open-source infrastructure to study concrete changes to the existing state-of-the-art and evaluate the potential delta of more radical changes. Traditionally, developers rely on architecture simulators to model and prototype these systems. Unfortunately, modern computer architecture simulation and modeling techniques for power and carbon are either too inaccurate, too inflexible (e.g., difficult to extend), or too high level to enable co-design for these increasingly complex, increasingly heterogeneous systems. For sustainable computing to thrive, we believe that simulation and modeling tools can — and must — play a major role, and I describe our efforts in this direction.

Invited tutorial on gem5 at ORNL

Title: Learning How to Use gem5 to Study Large-Scale Workloads

Prof Matt Sinclair

Abstract: The community-developed gem5 infrastructure is cycle-level computer architecture simulator tool. It models a wide set of diverse components including processors of various ISAs, caches, main memory, and network with a high level of microarchitecture fidelity. During DOE’s Exascale FastForward2 program, AMD released support for modeling GPUs executing HPC applications built for the GCN3 ISA. This work laid the foundation for development and prototyping of full-stack GPU optimizations for existing exascale-class HPC systems such as ORNL’s Frontier supercomputer. This tutorial workshop will introduce the gem5 simulator, how to use it, and how to use a variety of its important features for prototyping full stack optimizations for large-scale workloads, including the latest AMD MI200 and MI300 ISAs.

[7] Prof Bilge Mutlu is part of a new NSF Engineering Research Center around advancing the state of the art in “dexterous robotics,” led by Northwestern University. The center will receive approximately $52M in funding over 10 years, and UW’s work will focus on programming interfaces for dexterous robots.

News article: https://www.cs.wisc.edu/2024/08/21/human-augmentation-via-dexterity-erc-receives-nsf-funding

[9] https://news.wisc.edu/gilman-walker-and-arpaci-dusseau-to-assume-new-leadership-roles/

From the article: …Isbell has appointed Remzi Arpaci-Dusseau, the Grace Wahba and Vilas Distinguished Achievement Professor of Computer Sciences, to a new role as special advisor to the provost for computing. “This role will advance our institutional planning to ensure that we remain on the forefront in the rapidly advancing research and education areas associated with computing writ large, including artificial intelligence, data and information sciences in the coming decades,” says Isbell.

[10] New faculty in the CS department this fall:

Ali Abedi https://people.eecs.berkeley.edu/~abedi/
Sandeep Silwal https://sandeepsilwal.com/
Manolis Vlatakis https://www.cs.columbia.edu/~emvlatakis/
Tengyang Xie https://tengyangxie.github.io/

August 2024

[1] https://www.siam.org/publications/siam-news/articles/2024-august-prize-spotlight/#Wright

Dr. Stephen Wright, University of Wisconsin – Madison, is the 2024 recipient of the George B. Dantzig Prize. He received the prize for his fundamental contributions to nonlinear optimization and algorithms for control, compressed sensing, machine learning, and data science. He pioneered infeasible interior point methods which culminated in his 1997 SIAM monograph on the subject. Moreover, Dr. Wright contributed highly cited, outstanding, and very influential work in a broad range of fields in mathematical optimization, including algorithms for control, nonsmooth optimization with applications to compressed sensing, machine learning, and data science. His comprehensive contributions range from theory, algorithm design and analysis, to applications and the development of high-impact software.

July 2024

[1] “FlowDroid: precise context, flow, field, object-sensitive and

lifecycle-aware taint analysis for Android apps”

Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel

PLDI ‘14

Most influential paper award: https://www.sigplan.org/Awards/PLDI/

Abstract: Today’s smartphones are a ubiquitous source of private and confidential data. At the same time, smartphone users are plagued by carelessly programmed apps that leak important data by accident, and by malicious apps that exploit their given privileges to copy such data intentionally. While existing static taint-analysis approaches have the potential of detecting such data leaks ahead of time, all approaches for Android use a number of coarse-grain approximations that can yield high numbers of missed leaks and false alarms.

In this work we thus present FlowDroid, a novel and highly precise static taint analysis for Android applications. A precise model of Android’s lifecycle allows the analysis to properly handle callbacks invoked by the Android framework, while context, flow, field and object-sensitivity allows the analysis to reduce the number of false alarms. Novel on-demand algorithms help FlowDroid maintain high efficiency and precision at the same time.

We also propose DroidBench, an open test suite for evaluating the effectiveness and accuracy of taint-analysis tools specifically for Android apps. As we show through a set of experiments using SecuriBench Micro, DroidBench, and a set of well-known Android test applications, FlowDroid finds a very high fraction of data leaks while keeping the rate of false positives low. On DroidBench, FlowDroid achieves 93% recall and 86% precision, greatly outperforming the commercial tools IBM AppScan Source and Fortify SCA. FlowDroid successfully finds leaks in a subset of 500 apps from Google Play and about 1,000 malware apps from the VirusShare project.

[2] ACM SIGGRAPH Seminal Graphics Papers

When we began planning how to celebrate 50 years of SIGGRAPH Conferences, there was unanimous agreement that one of the projects should be publishing a second volume of Seminal Graphics Papers. The first volume was published in 1998 as part of the celebration of the 25th SIGGRAPH conference. Seminal Graphics Papers Volume 2, perhaps more than any other activity undertaken in this milestone year, celebrates ACM SIGGRAPH’s origins and continued success as a Technical and Professional Society. This collection of papers typifies the ground-breaking research that has been the conference’s hallmark since 1974. A quick scan of the chapter and the paper titles shows just how far SIGGRAPH research has pushed the boundaries of our discipline and contributed to its evolution.

The ACM Digital Library team has been supportive of this Seminal Graphics Papers project from the beginning. I am pleased to let you know that both Volumes 1 and 2 of Seminal Graphics Papers are freely available from the ACM Digital Library at these URLs:

Volume 1: https://dl.acm.org/doi/book/10.1145/280811

Volume 2: https://dl.acm.org/doi/book/10.1145/3596711

“Content-Preserving Warps for 3D Video Stabilization”

Feng Liu, Michael Gleicher, Hailin Jin, and Aseem Agarwala

https://dl.acm.org/doi/10.1145/1531326.1531350

ACM Transactions on Graphics (TOG), Volume 28, Issue 3, July 2009

We describe a technique that transforms a video from a hand-held video camera so that it appears as if it were taken with a directed camera motion. Our method adjusts the video to appear as if it were taken from nearby viewpoints, allowing 3D camera movements to be simulated. By aiming only for perceptual plausibility, rather than accurate reconstruction, we are able to develop algorithms that can effectively recreate dynamic scenes from a single source video. Our technique first recovers the original 3D camera motion and a sparse set of 3D, static scene points using an off-the-shelf structure-from-motion system. Then, a desired camera path is computed either automatically (e.g., by fitting a linear or quadratic path) or interactively. Finally, our technique performs a least-squares optimization that computes a spatially-varying warp from each input video frame into an output frame. The warp is computed to both follow the sparse displacements suggested by the recovered 3D structure, and avoid deforming the content in the video frame. Our experiments on stabilizing challenging videos of dynamic scenes demonstrate the effectiveness of our technique.

“Motion Graphs”

Lucas Kovar, Michael Gleicher, and Frédéric Pighin

https://dl.acm.org/doi/10.1145/566654.566605

ACM Transactions on Graphics (TOG), Volume 21, Issue 3, July 2002

In this paper we present a novel method for creating realistic, controllable motion. Given a corpus of motion capture data, we automatically construct a directed graph called a motion graph that encapsulates connections among the database. The motion graph consists both of pieces of original motion and automatically generated transitions. Motion can be generated simply by building walks on the graph. We present a general framework for extracting particular graph walks that meet a user’s specifications. We then show how this framework can be applied to the specific problem of generating different styles of locomotion along arbitrary paths.

[3] https://research.wisc.edu/research-forward-round-4/

The Office of the Vice Chancellor for Research (OVCR) hosts the Research Forward initiative to stimulate and support highly innovative and groundbreaking research at the University of Wisconsin–Madison. The initiative is supported by the Wisconsin Alumni Research Foundation (WARF) and will provide funding for 1–2 years, depending on the needs and scope of the project.

Research Forward seeks to support collaborative, multidisciplinary, multi-investigator research projects that are high-risk, high-impact, and transformative. It seeks to fund research projects that have the potential to fundamentally transform a field of study as well as projects that require significant development prior to the submission of applications for external funding. Collaborative research proposals are welcome from within any of the four divisions (Arts & Humanities, Biological Sciences, Physical Sciences, Social Sciences), as are cross-divisional collaborations.

CS Projects are:

“Quanta sensing for next generation quantum computing”

PRINCIPAL INVESTIGATOR

Mohit Gupta, associate professor of computer sciences

CO-PRINCIPAL INVESTIGATORS

Mark Saffman, professor of physics

Swamit Tannu, assistant professor of computer sciences

Andreas Velten, associate professor of biostatistics and medical informatics, electrical and computer engineering

Abstract: Future quantum computers could open new scientific and engineering frontiers, impacting existential challenges like climate change. However, quantum information is delicate; it leaks with time and is prone to significant errors. These errors are exacerbated by imperfect reading and writing of quantum bits (qubits). These challenges fundamentally limit our ability to run quantum programs, and could hold back this powerful technology. Fast and accurate qubit readout, therefore, is essential for unlocking the quantum advantage. Current quantum computers use conventional cameras for reading qubits, which are inherently slow and noisy.

This research project will use quanta (single-photon) sensors for fast and accurate qubit readout. Quanta sensors detect individual photons scattered from qubits, thus enabling sensing qubits at 2-3 orders of magnitude higher speeds (few microseconds from ~10 milliseconds), thereby transforming the capabilities (speed, accuracy) of future quantum computers, and for the first time, paving the way for scalable and practical quantum computing.

“A novel deep learning platform in design molecular glues that stabilize protein-protein interactions”

PRINCIPAL INVESTIGATOR

Xuhui Huang, professor of chemistry

CO-PRINCIPAL INVESTIGATORS

Sharon Li, assistant professor of computer science

Weiping Tang, professor of pharmacy

Molecular glues can stabilize protein-protein interactions (PPIs) and have emerged as a promising class of therapeutics for many “undruggable” proteins. Deep-learning approaches show enormous potential for revolutionizing structure-based drug design. However, all deep learning methods require a large training dataset containing information on the binding pocket. Such information is often unavailable for molecular glues.

This research project will create a deep-learning platform, Chemical Feature Transformer (CFT), that learns from patterns in chemical environments (using known molecular glue-PPI structures) instead of complementary geometric shapes (like the Key-and-Lock model). To experimentally validate the predictions of the CFT, researchers will apply CFT to identify molecular glues for the degradation of cancer-associated proteins. It is expected that the deep-learning platform can greatly help other UW researchers identify novel molecular glues to tackle their disease-relevant targets. This project will provide the foundation for potentially establishing a center of excellence in AI-driven molecular design.

“An AI Terrarium for understanding intermediated and personal communication”

PRINCIPAL INVESTIGATOR

Timothy Rogers, professor of psychology

CO-PRINCIPAL INVESTIGATORS

Robert Hawkins, assistant professor of psychology

Dhavan Shah, professor of journalism

Sijia Yang, assistant professor of journalism

Jerry Zhu, professor of computer science

[4] https://research.wisc.edu/professorships-and-faculty-fellowships/kellett-mid-career-award/

This award is comparable in competitiveness to the Romnes Faculty Fellowships and the WARF Professorships but is intended to recognize and support mid-career faculty, seven to twenty years past their first promotion to a tenured position. The Mid-Career Award was created to provide needed support and encouragement to faculty at a critical stage of their careers. This award cannot be awarded more than one time to any faculty member.

Winners: https://research.wisc.edu/professorships-and-faculty-fellowships/kellett-mid-career-award/past-winners-kellett-mid-career/

[5] https://research.wisc.edu/professorships-and-faculty-fellowships/h-i-romnes-faculty-fellowships/

Winners:

https://research.wisc.edu/professorships-and-faculty-fellowships/h-i-romnes-faculty-fellowships/past-winners-h-i-romnes-faculty-fellowship/

The immediacy, reach and virality of contemporary communication technologies present an enormous societal opportunity and an escalating threat. Research efforts to understand these phenomena face significant challenges: field studies are complex to design, expensive to conduct, and raise concerns about user manipulation.

This research project leverages AI to power more realistic simulated personas that can interact via natural language and are tuned to update their beliefs in ways that mimic real human participants — an AI Terrarium. The AI agents will be engineered and fine-tuned to capture patterns of real communication and belief change observed in representative human populations collected using panel surveys and behavioral experiments. The behaviors of interacting “digital twins” – AI models matched to human participants in the survey and experimental collections–will then be used to simulate and understand patterns of communication and persuasion in the real world.

Once developed, the AI Terrarium will enable low-cost testing of different message campaigns, news feeds, or interaction rules. Promising strategies will be validated in carefully controlled experiments using lab-based networked messaging platforms. The ability to simulate social media interactions with human fidelity would permit rapid-cycle testing of intervention strategies on a variety of health and political issues, ranging from vaccine skepticism to electoral misinformation, and allow piloting prior to costly real-world trials.

[6] https://mlcommons.org/2024/02/call-for-ml-and-systems-rising-stars-2024/

The Rising Stars program provides a platform for talented young researchers working at the intersection of machine learning (ML) and systems to build connections with a vibrant research community, engage with both industry and academic experts, and develop their skills.

https://saurabh.dev/

Saurabh Agarwal

“I am a fifth year grad student at UW-Madison. Advised by Dimitris Papailiopoulos and Shivaram Venkataraman. My research interests are primarily in Systems for Machine Learning, especially around distributed training and inference of ML workloads. During my PhD I have been very fortunate to intern with Bilge Acun at FAIR, Amar Phanishayee at Microsoft Research and Yucheng Low at Apple.

When I am not being a grad student, I can be found racing keelboats on Lake Mendota or alpine skiing in the winters. I also double up as a sailing instructor at the UW-Madison’s Hoofers Sailing club”

[7] “Mechanism Design for Collaborative Normal Mean estimation”

Prof Kirthi Kandasamy

Talk at RAIN Seminar

https://rain.stanford.edu/

Abstract: With the rise in popularity of machine learning, data is becoming an increasingly valuable resource. While data collection is often costly, once collected, data can be freely replicated and used by many others. Hence, instead of simply collecting and learning from their own data, by sharing data with each other, agents can reduce their own data collection costs and improve the utility they derive from data. Yet naive mechanisms for sharing data can lead to free-riding, where agents may attempt to benefit from the data that the others have collected without contributing any themselves. Free-riding can manifest in anti-social behavior, such as under-collecting data, or submitting fabricated datasets.

In this talk, I will present some of our recent work where we study these challenges in one of the most foundational statistical problems, normal mean estimation. Here, a set of strategic agents collect i.i.d samples from a normal distribution at a cost, and wish to estimate the mean of this distribution. To facilitate collaboration, we design mechanisms that incentivize agents to collect a sufficient amount of data and share it truthfully, so that they are all better off than working alone. Our approach prevents under-collection and data fabrication via two key techniques: first, when sharing the others’ data with an agent, the mechanism corrupts this dataset proportional to how much the data reported by the agent differs from the others; second, we design minimax optimal estimators for the corrupted dataset. Our mechanism, which is Nash incentive compatible and individually rational, achieves a social penalty (sum of all agents’ estimation errors and data collection costs) that is at most a factor 2 of the global minimum. In two special cases where we restrict the strategy space of the agents, we design mechanisms that essentially achieve the global minimum. When applied to high dimensional (non-Gaussian) distributions with bounded variance, our mechanisms retain these properties, but with slightly weaker results.

“Data without Borders: Game-theoretic challenges in democratizing data”

Prof Kirthi Kandasamy

Talk at Midwest Machine Learning Symposium

https://midwest-ml.org/2024/

Abstract: Due to the popularity of machine learning, many organizations view data as an invaluable resource, likening it to the “new oil/gold”. However, unlike many types of resources, data is nonrivalrous: it can be freely replicated and used by many. Hence, data produced by one organization, can, in principle, generate limitless value to many others. This will accelerate economic, social, and scientific breakthroughs and benefit society at large. However, considerations of free-riding and competition may prevent such open sharing of data between organizations. An organization may be wary that others may not be contributing a sufficient amount of data, or contributing fabricated/poisoned datasets. Organizations may also wish to monetize the data they have for profit. In some recent work, we leverage ideas from game theory, market design, and robust statistics to design protocols for data sharing. Our methods incentivize organizations to collect and truthfully contribute large amounts of data, so that socially optimal outcomes can be achieved.

[8] “Data-Efficient Adaptation for Pretrained Decision-Making Models”

Prof Frederic Sala

Talk at Midwest Machine Learning Symposium

https://midwest-ml.org/2024/

Abstract: The use of pretrained models forms the major paradigm change in machine learning workflows this decade, including for decision making. These powerful and typically massive models have the promise to be used as a base for diverse applications. Unfortunately, it turns out that adapting these models for downstream tasks tends to be difficult and expensive, often requiring collecting and labeling additional data to further train or fine-tune the model. In this talk, I will describe my group’s work on addressing this challenge via efficient adaptation. First, when adapting vision-language models to make robust predictions, we show how to self-guide the adaptation process, without any additional data. Second, we show how to integrate relational structures like knowledge graphs into model prediction pipelines, enabling models to adapt to new domains unseen during training, without additional annotated examples. Lastly, in the most challenging scenarios, when the model must be fine-tuned on labeled data, we show how to obtain this data efficiently through techniques called weak supervision.

[9] “Characterizing and Harnessing Performance Variability in Accelerator-rich HPC systems ORNL”

Prof Matt Sinclair

Talk at NVIDIA

Abstract: Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling exascale levels of compute. Recent work demonstrated that power management (PM) can impact application performance in CPU-based HPC systems, even when machines have the same architecture and often the same SKU (stock keeping unit). This variation occurs due to manufacturing variability and the chip’s PM. However, while modern HPC systems widely employ accelerators such as GPUs, it is unclear how much this variability affects applications. In this talk, I will first discuss our work on characterizing the extent of variation due to GPU PM in modern HPC and supercomputing systems across five large-scale clusters: Oak Ridge’s Summit, Sandia’s Vortex, TACC’s Frontera and Longhorn, and Livermore’s Corona. Regardless of the application, cluster, GPU vendor, and cooling method, our results show significant variation, highlighting the difficulty in efficiently using existing GPU clusters for modern HPC and ML workloads, and the need to embrace variability in future accelerator-based systems. Second, I will discuss our efforts to embrace this variability in current systems. Specifically, we design a novel cluster scheduler , PAL, which uses performance variability measurements and application-specific profiles to improve job performance and resource utilization. PAL also balances performance variability with locality to ensure jobs are spread across as few nodes as possible. Time permitting, I will also discuss our plans to embrace performance variability in future systems.

“GEMMs: (One of) the Building Blocks of Fast Machine Learning”
Prof Matt Sinclair

Abstract: Machine learning (ML) has transformed society with significant accuracy improvements for tasks. This tremendous, transformative effect has been enabled by a virtuous synergy of (1) better hardware systems, (2) larger datasets, and (3) improved ML structures and algorithms that further benefit from more efficient hardware and larger datasets. Moreover, while ML relies on a number of important primitives, in this talk I will focus on one of them: General Matrix Multiplies (GEMMs). Specifically, I will discuss why GEMMs are well suited to running some important ML layers, and introduce interactive examples that iteratively improve performance and increase GEMM code complexity for both CPU and GPU programs, before finally discussing how CPU and GPU BLAS libraries provide high performance GEMM implementations without requiring users to write complex code.

[a] “Shadow Filesystems: Recovering from Filesystem Runtime Errors via Robust Alternative Execution”

Jing Liu, Xiangpeng Hao, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tej Chajed

HotStorage ‘24

We present Robust Alternative Execution (RAE), an approach to transparently mask runtime errors in performance-oriented filesystems via temporarily executing an alternative shadow filesystem. A shadow filesystem has the primary goal of robustness, achieved through a simple implementation without performance optimizations and concurrency while adhering to the same API and on-disk formats as the base filesystem it enhances. While the base performance-oriented filesystem may contain bugs, the shadow implementation is formally verified, leveraging advancements in the verification of low-level systems code. In the common case, the base filesystem executes and delivers high performance to applications; however, when a bug is triggered, the slow-but-correct shadow takes over, updates state correctly, and then resumes the base, thus providing high availability.

https://dl.acm.org/doi/10.1145/3655038.3665942

[b] “Homebrew: Optical Polarization Change Detection for Ground Motion Sensing”

Joseph Catudal; Zhenhao Zhou; Weijun Pan; Paul Barford; Dante Fratta; Herb Wang

2024 Optical Fiber Communications Conference and Exhibition (OFC)

We examine laser light polarization measurements using our own novel polarimeter design for ground motion sensing and show that efficacy is highly dependent on the coupling of fiber routes to vibration sources.

https://ieeexplore.ieee.org/document/10527030

[c] “DBSP: Incremental Computation on Streams and Its Applications to Databases”

Authors: Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen

ACM SIGMOD Record

We describe DBSP, a framework for incremental computation. Incremental computations repeatedly evaluate a function on some input values that are “changing”. The goal of an efficient implementation is to “reuse” previously computed results. Ideally, when presented with a new change to the input, an incremental computation should only perform work proportional to the size of the changes of the input, rather than to the size of the entire dataset.

https://dl.acm.org/doi/10.1145/3665252.3665271

[d] “Learned Load Balancing”

Brian Chang, Aditya Akella, Loris D’Antoni, and Kausik Subramanian

Theoretical Computer Science, July ‘24

Effectively balancing traffic in datacenter networks is a crucial operational goal. Most existing load balancing approaches are handcrafted to the structure of the network and/or network workloads. Thus, new load balancing strategies are required if the underlying network conditions change, e.g., due to hard or grey failures, network topology evolution, or workload shifts. While we can theoretically derive the optimal load balancing strategy by solving an optimization problem given certain traffic and topology conditions, these problems take too much time to solve and makes the derived solution stale to deploy. In this paper, we describe a load balancing scheme Learned Load Balancing (LLB), which is a general approach to finding an optimal load balancing strategy for a given network topology and workload, and is fast enough in practice to deploy the inferred strategies. LLB uses deep supervised learning techniques to learn how to handle different traffic patterns and topology changes, and adapts to any failures in the underlying network. LLB leverages emerging trends in network telemetry, programmable switching, and “smart” NICs. Our experiments show that LLB performs well under failures and can be expanded to more complex, multi-layered network topologies. We also prototype neural network inference on smartNICs to demonstrate the workability of LLB.

[e] “Super Non-singular Decompositions of Polynomials and their Application to Robustly Learning Low-degree PTFs”

Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Sihan Liu, Nikos Zarifis

STOC ‘24

We study the efficient learnability of low-degree polynomial threshold functions (PTFs) in the presence of a constant fraction of adversarial corruptions. Our main algorithmic result is a polynomial-time PAC learning algorithm for this concept class in the strong contamination model under the Gaussian distribution with error guarantee Od,c(opt^(1−c)), for any desired constant c>0, where opt is the fraction of corruptions. In the strong contamination model, an omniscient adversary can arbitrarily corrupt an opt-fraction of the data points and their labels. This model generalizes the malicious noise model and the adversarial label noise model. Prior to our work, known polynomial-time algorithms in this corruption model (or even in the weaker adversarial label noise model) achieved error Õd(opt^(1/(d+1))), which deteriorates significantly as a function of the degree d.

Our algorithm employs an iterative approach inspired by localization techniques previously used in the context of learning linear threshold functions. Specifically, we use a robust perceptron algorithm to compute a good partial classifier and then iterate on the unclassified points. In order to achieve this, we need to take a set defined by a number of polynomial inequalities and partition it into several well-behaved subsets. To this end, we develop new polynomial decomposition techniques that may be of independent interest.

“Testing Closeness of Multivariate Distributions via Ramsey Theory”

Ilias Diakonikolas, Daniel M. Kane, Sihan Liu

STOC ‘24

We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions. Specifically, given sample access to two unknown distributions p,q on ℝd, we want to distinguish between the case that p=q versus ‖p−q‖Ak>ϵ, where ‖p−q‖Ak denotes the generalized Ak distance between p and q — measuring the maximum discrepancy between the distributions over any collection of k disjoint, axis-aligned rectangles. Our main result is the first closeness tester for this problem with {\em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound.

In more detail, we provide a computationally efficient closeness tester with sample complexity O((k6/7/polyd(ϵ))logd(k)). On the lower bound side, we establish a qualitatively matching sample complexity lower bound of Ω(k6/7/poly(ϵ)), even for d=2. These sample complexity bounds are surprising because the sample complexity of the problem in the univariate setting is Θ(k4/5/poly(ϵ)). This has the interesting consequence that the jump from one to two dimensions leads to a substantial increase in sample complexity, while increases beyond that do not.

As a corollary of our general Ak tester, we obtain dTV-closeness testers for pairs of k-histograms on ℝd over a common unknown partition, and pairs of uniform distributions supported on the union of k unknown disjoint axis-aligned rectangles.

Both our algorithm and our lower bound make essential use of tools from Ramsey theory.

[f] “Technical Perspective: Unicorn: A Unified Multi-Tasking Matching Model”

AnHai Doan

SIGMOD Record

Data integration has been a long-standing challenge for data management. It has recently received significant attention due to at least three main reasons. First, many data science projects require integrating data from disparate sources before analysis can be carried out to extract insights. Second, many organizations want to build knowledge graphs, such as Customer 360s, Product 360s, and Supplier 360s, which capture all available information about the customers, products, and suppliers of an organization. Building such knowledge graphs often requires integrating data from multiple sources. Finally, there is also an increasing need to integrate a massive amount of data to create training data for AI models, such as large language models.

https://dl.acm.org/doi/10.1145/3665252.3665262

[g]

“Bandit Profit Maximization for Targeted Marketing”

Joon Suk Huh, Ellen Vitercik, Kirthevasan Kandasamy.

ACM Conference on Economics and Computation (EC) 2024

We study a sequential profit-maximization problem, optimizing for both price and ancillary variables like marketing expenditures. Specifically, we aim to maximize profit over an arbitrary sequence of multiple demand curves, each dependent on a distinct ancillary variable, but sharing the same price. A prototypical example is targeted marketing, where a firm (seller) wishes to sell a product over multiple markets. The firm may invest different marketing expenditures for different markets to optimize customer acquisition, but must maintain the same price across all markets. Moreover, markets may have heterogeneous demand curves, each responding to prices and marketing expenditures differently. The firm’s objective is to maximize its gross profit, the total revenue minus marketing costs. Our results are near-optimal algorithms for this class of problems in an adversarial bandit setting, where demand curves are arbitrary non-adaptive sequences, and the firm observes only noisy evaluations of chosen points on the demand curves. We prove a regret upper bound of O(nT 3/4) and a lower bound of Ω ((nT) 3/4) for monotonic demand curves, and a regret bound of Θ(nT 2/3) for demands curves that are monotonic in price and concave in the ancillary variable

https://arxiv.org/html/2403.01361v1

“Nash Incentive-compatible Online Mechanism Learning via Weakly Differentially Private Online Learning”

Joon Suk Huh, Kirthevasan Kandasamy.

International Conference on Machine Learning (ICML) 2024

We study a multi-round mechanism design prob- lem, where we interact with a set of agents over a sequence of rounds. We wish to design an incentive-compatible (IC) online learning scheme to maximize an application-specific objective within a given class of mechanisms, without prior knowledge of the agents’ type distributions. Even if each mechanism in this class is IC in a single round, if an algorithm naively chooses from this class on each round, the entire learning process may not be IC against non-myopic buyers who appear over multiple rounds. On each round, our method randomly chooses between the rec- ommendation of a weakly differentially private online learning algorithm (e.g., Hedge), and a commitment mechanism which penalizes non- truthful behavior. Our method is IC and achieves O(T^{1+h/2} ) regret for the application-specific objective in an adversarial setting, where h quantifies the long-sightedness of the agents. When com- pared to prior work, our approach is conceptually simpler, it applies to general mechanism design problems (beyond auctions), and its regret scales gracefully with the size of the mechanism class.

https://openreview.net/pdf?id=QQkK6YH0Th

[h] “Enabling Tabular Data Exploration for Blind and Low-Vision Users”

Yanan Wang, Arjun Srinivasan, and Yea-Seul Kim

DIS ‘24

In a data-driven society, being able to examine data on one’s own terms is crucial for various aspects of well-being. However, current data exploration paradigms, such as Exploratory Data Analysis (EDA), heavily rely on visualizations to unveil patterns and insights. This visual-centric approach poses significant challenges for blind and low-vision (BLV) individuals. To address this gap, we built a prototype that supports non-visual data exploration and conducted an observational user study involving 18 BLV participants. Participants were asked to conduct various analytical tasks, as well as free exploration of provided datasets. The study findings provide insights into the factors influencing inefficient data exploration and characterizations of BLV participants’ analytical behaviors. We conclude by highlighting future avenues of research for the design of data exploration tools for BLV users.

https://dl.acm.org/doi/10.1145/3643834.3661609

https://pages.cs.wisc.edu/~yeaseulkim/assets/papers/2024_table.pdf

[i] “How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection?”

Soumya Suvra Ghosal, Yiyou Sun, Yixuan Li

AAAI ‘24

Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances in OOD detection rely on distance measures to distinguish samples that are relatively far away from the in-distribution (ID) data. Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high-dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. In training, our method regularizes the model and its feature representation by leveraging the most relevant subset of dimensions (i.e. subspace). Subspace learning yields highly distinguishable distance measures between ID and OOD data. We provide comprehensive experiments and ablations to validate the efficacy of SNN. Compared to the current best distance-based method, SNN reduces the average FPR95 by 15.96% on the CIFAR-100 benchmark.

[j] “Tangible Scenography as a Holistic Design Method for Human-Robot Interaction”

DIS ‘24

Amy Koike, Bengisu Cagiltay, Bilge Mutlu

Traditional approaches to human-robot interaction design typically examine robot behaviors in controlled environments and narrow tasks. These methods are impractical for designing robots that interact with diverse user groups in complex human environments. Drawing from the field of theater, we present the construct of scenes — individual environments consisting of specific people, objects, spatial arrangements, and social norms — and tangible scenography, as a holistic design approach for human-robot interactions. We created a design tool, Tangible Scenography Kit (TaSK), with physical props to aid in design brainstorming. We conducted design sessions with eight professional designers to generate exploratory designs. Designers used tangible scenography and TaSK components to create multiple scenes with specific interaction goals, characterize each scene’s social environment, and design scene-specific robot behaviors. From these sessions, we found that this method can encourage designers to think beyond a robot’s narrow capabilities and consider how they can facilitate complex social interactions.

https://arxiv.org/abs/2405.19449

“The AI-DEC: A Card-based Design Method for User-centered AI Explanations”

DIS ‘24

Christine P Lee, Min Kyung Lee, Bilge Mutlu

Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four dimensions of AI explanations that are critical for the integration of AI systems — communication content, modality, frequency, and direction — and offers design examples for end-users to design AI explanations that meet their needs. We evaluated this method through co-design sessions with workers in healthcare, finance, and management industries who regularly use AI systems in their daily work. Findings indicate that the AI-DEC effectively supported workers in designing explanations that accommodated diverse levels of performance and autonomy needs, which varied depending on the AI system’s workplace role and worker values. We discuss the implications of using the AI-DEC for the user-centered design of AI explanations in real-world systems.

https://arxiv.org/abs/2405.16711

“REX: Designing User-centered Repair and Explanations to Address Robot Failures”

DIS ‘24

Christine P Lee, Pragathi Praveena, and Bilge Mutlu

Robots in real-world environments continuously engage with multiple users and encounter changes that lead to unexpected conflicts in fulfilling user requests. Recent technical advancements (e.g., large-language models (LLMs), program synthesis) offer various methods for automatically generating repair plans that address such conflicts. In this work, we understand how automated repair and explanations can be designed to improve user experience with robot failures through two user studies. In our first, online study (n = 162), users expressed increased trust, satisfaction, and utility with the robot performing automated repair and explanations. However, we also identified risk factors—safety, privacy, and complexity—that require adaptive repair strategies. The second, in-person study (n = 24) elucidated distinct repair and explanation strategies depending on the level of risk severity and type. Using a design-based approach, we explore automated repair with explanations as a solution for robots to handle conflicts and failures, complemented by adaptive strategies for risk factors. Finally, we discuss the implications of incorporating such strategies into robot designs to achieve seamless operation among changing user needs and environments.

https://dl.acm.org/doi/10.1145/3643834.3661559

“Understanding On-the-Fly End-User Robot Programming”

DIS ‘24

Laura Stegner, Yuna Hwang, David Porfirio, Bilge Mutlu

Novel end-user programming (EUP) tools enable on-the-fly (i.e., spontaneous, easy, and rapid) creation of interactions with robotic systems. These tools are expected to empower users in determining system behavior, although very little is understood about how end users perceive, experience, and use these systems. In this paper, we seek to address this gap by investigating end-user experience with on-the-fly robot EUP. We trained 21 end users to use an existing on-the-fly EUP tool, asked them to create robot interactions for four scenarios, and assessed their overall experience. Our findings provide insight into how these systems should be designed to better support end-user experience with on-the-fly EUP, focusing on user interaction with an automatic program synthesizer that resolves imprecise user input, the use of multimodal inputs to express user intent, and the general process of programming a robot.

https://arxiv.org/abs/2406.00841

[k] “Does Compressing Activations Help Model Parallel Training?”

Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

MLSys ‘24

Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression in a model-parallel setting is an understudied area. We have discovered that model parallelism has fundamentally different characteristics than data parallelism. In this work, we present the first empirical study on the effectiveness of compression methods for model parallelism. We implement and evaluate three common classes of compression algorithms – pruning-based, learning-based, and quantization-based – using a popular Transformer training framework. We evaluate these methods across more than 160 settings and 8 popular datasets, taking into account different hyperparameters, hardware, and both fine-tuning and pre-training stages. We also provide analysis when the model is scaled up. Finally, we provide insights for future development of model parallelism compression algorithms.

https://proceedings.mlsys.org/paper_files/paper/2024/hash/71381211d0abef73ed1887b83c4547b1-Abstract-Conference.html

[l] “‘This really lets us see the entire world:’ Designing a conversational telepresence robot for homebound older adults”

Yaxin Hu, Laura Stegner, Yasmine Kotturi, Caroline Zhang, Yi-Hao Peng, Faria Huq, Yuhang Zhao, Jeffrey P. Bigham, Bilge Mutlu

DIS ‘24

In this paper, we explore the design and use of conversational telepresence robots to help homebound older adults interact with the external world. An initial needfinding study (N=8) using video vignettes revealed older adults’ experiential needs for robot-mediated remote experiences such as exploration, reminiscence and social participation. We then designed a prototype system to support these goals and conducted a technology probe study (N=11) to garner a deeper understanding of user preferences for remote experiences. The study revealed user interactive patterns in each desired experience, highlighting the need of robot guidance, social engagements with the robot and the remote bystanders. Our work identifies a novel design space where conversational telepresence robots can be used to foster meaningful interactions in the remote physical environment. We offer design insights into the robot’s proactive role in providing guidance and using dialogue to create personalized, contextualized and meaningful experiences.

https://arxiv.org/abs/2405.15037

June 2024

[1] “Distinguished Seminar in Optimization and Data” seminars

https://math.washington.edu/events/series/distinguished-seminar-optimization-data

The seminar is held on Mondays at 03:30 PM during the term. Previous speakers include E. Tardos, P. Rigollet, L. Levina, B. Sturmfels. This seminar series is an interdepartmental activity at UW organized by faculty in the departments of Applied Mathematics, Computer Science & Engineering, Electrical & Computer Engineering, Mathematics, and Statistics.

“Nonsmooth Optimization on a Finer Scale”

Abstract: Nonsmooth optimization problems are ubiquitous in industrial and machine learning applications, arising from the need to address tasks such as resource allocation, threat detection, and model training. Within the realm of mathematical optimization, nonsmooth problems stand as some of the most challenging; yet they offer the possibility of developing efficient algorithms with provable guarantees. The complexity of these problems, encompassing both lower and upper bounds, has historically typically been examined under generic assumptions, bounding the growth of the objective functions by assuming Lipschitz continuity of either the objective itself or its gradient. In this talk, I will argue that these traditional assumptions defining classes of nonsmooth optimization problems inadequately capture local properties of problems that may make them amenable to more efficient algorithms. I will introduce a notion of local bounded variation of the (sub)gradient and discuss how, under this notion, we can obtain a more fine-grained characterization of the complexity of nonsmooth problems. One consequence of these results is that, contrary to prior belief, the complexity of nonsmooth optimization problems, such as those with piecewise linear objectives with polynomially many pieces, can be improved using parallelization even in high dimensions, either if the problem is unconstrained or if the function is allowed to be queried outside the feasible set.

Based on joint work with Cristóbal Guzmán; preprint available at https://arxiv.org/abs/2403.16317

[2] https://bwyogatama.github.io/

[3] https://www.laurastegner.com/

[4] https://cpsiff.github.io/

[5] https://pages.cs.wisc.edu/~sujayyadalam/

[6] https://shivaram.org/

[7] https://nikoszarifis.github.io/

[8] https://pauley.me/

[a] https://dl.acm.org/doi/abs/10.1002/aaai.12147

The National Science Foundation (NSF) Artificial Intelligence (AI) Institute for Edge Computing Leveraging Next Generation Networks (Athena) seeks to foment a transformation in modern edge computing by advancing AI foundations, computing paradigms, networked computing systems, and edge services and applications from a completely new computing perspective. Led by Duke University, Athena leverages revolutionary developments in computer systems, machine learning, networked computing systems, cyber‐physical systems, and sensing. Members of Athena form a multidisciplinary team from eight universities. Athena organizes its research activities under four interrelated thrusts supporting edge computing: Foundational AI, Computer Systems, Networked Computing Systems, and Services and Applications, which constitute an ambitious and comprehensive research agenda. The research tasks of Athena will focus on developing AI‐driven next‐generation technologies for edge computing and new algorithmic and practical foundations of AI and evaluating the research outcomes through a combination of analytical, experimental, and empirical instruments, especially with target use‐inspired research. The researchers of Athena demonstrate a cohesive effort by synergistically integrating the research outcomes from the four thrusts into three pillars: Edge Computing AI Systems, Collaborative Extended Reality (XR), and Situational Awareness and Autonomy. Athena is committed to a robust and comprehensive suite of educational and workforce development endeavors alongside its domestic and international collaboration and knowledge transfer efforts with external stakeholders that include both industry and community partnerships.

[b] https://arxiv.org/html/2403.04660v1

First responders (FRs) navigate hazardous, unfamiliar environments in the field (e.g., mass-casualty incidents), making life-changing decisions in a split second. AR head-mounted displays (HMDs) have shown promise in supporting them due to its capability of recognizing and augmenting the challenging environments in a hands-free manner. However, the design space have not been thoroughly explored by involving various FRs who serve different roles (e.g., firefighters, law enforcement) but collaborate closely in the field. We interviewed 26 first responders in the field who experienced a state-of-the-art optical-see-through AR HMD, as well as its interaction techniques and four types of AR cues (i.e., overview cues, directional cues, highlighting cues, and labeling cues), soliciting their first-hand experiences, design ideas, and concerns. Our study revealed both generic and role-specific preferences and needs for AR hardware, interactions, and feedback, as well as identifying desired AR designs tailored to urgent, risky scenarios (e.g., affordance augmentation to facilitate fast and safe action). While acknowledging the value of AR HMDs, concerns were also raised around trust, privacy, and proper integration with other equipment. Finally, we derived comprehensive and actionable design guidelines to inform future AR systems for in-field FRs.

[c] https://www.microsoft.com/en-us/research/uploads/prodnew/2024/03/nsdi24fall-final668-CloudLoRA.pdf

The Cloud Radio Access Network (CRAN) architecture has been proposed as a way of addressing the network throughput and scalability challenges of large-scale LoRa networks. CRANs can improve network throughput by coherently combining signals, and scale to multiple channels by implementing the receivers in the cloud. However, in remote LoRa deployments, a CRAN’s demand for high-backhaul bandwidths can be challenging to meet. Therefore, bandwidth-aware compression of LoRa samples is needed to reap the benefits of

CRANs. We introduce Cloud-LoRa, the first practical CRAN for LoRa, that can detect sub-noise LoRa signals and perform bandwidth-adaptive compression. To the best of our knowledge, this is the first demonstration of CRAN for LoRa operating in real-time. We deploy Cloud-LoRa in an agricultural field over multiple days with USRP as the gateway. A cellular backhaul hotspot is then used to stream the compressed samples to a Microsoft Azure server. We demonstrate

SNR gains of over 6 dB using joint multi-gateway decoding and over 2x throughput improvement using state-of-the-art receivers, enabled by CRAN in real-world deployments.

[d] https://ieeexplore.ieee.org/document/10330695

We present a novel framework, GreyLambda, to improve the scalability of traffic engineering (TE) systems. TE systems continuously monitor traffic and allocate network resources based on observed demands. The temporal requirement for TE is to have a time-to-solution in five minutes or less. Additionally, traffic allocations have a spatial requirement, which is to enable all traffic to traverse the network without encountering an over-subscribed link. However, the multi-commodity flow-based TE formulation cannot scale with increasing network sizes. Recent approaches have relaxed multi-commodity flow constraints to meet the temporal requirement but fail to satisfy the spatial requirement due to changing traffic demands, resulting in oversubscribed links or infeasible solutions. To satisfy both these requirements, we utilize optical topology programming (OTP) to rapidly reconfigure optical wavelengths in critical network paths and provide localized bandwidth scaling and new paths for traffic forwarding. GreyLambda integrates OTP into TE systems by introducing a heuristic algorithm that capitalizes on latent hardware resources at high-degree nodes to offer bandwidth scaling, and a method to reduce optical path reconfiguration latencies. Our experiments show that GreyLambda enhances the performance of two state-of-the-art TE systems, SMORE and NCFlow in real-world topologies with challenging traffic and link failure scenarios.

[e] https://dl.acm.org/doi/10.1145/3626778

We present a longitudinal study of intercontinental long-haul links (LHLs) – links with latencies significantly higher than that of all other links in a traceroute path. Our study is motivated by the recognition of these LHLs as a network-layer manifestation of critical transoceanic undersea cables. We present a methodology and associated processing system for identifying long-haul links in traceroute measurements. We apply this system to a large corpus of traceroute data and report on multiple aspects of long haul connectivity including country-level prevalence, routers as international gateways, preferred long-haul destinations, and the evolution of these characteristics over a 7 year period. We identify 85,620 layer-3 links (out of 2.7M links in a large traceroute dataset) that satisfy our definition for intercontinental long haul with many of them terminating in a relatively small number of nodes. An analysis of connected components shows a clearly dominant component with a relative size that remains stable despite a significant growth of the long-haul infrastructure.

[f] https://dl.acm.org/doi/10.1145/3613905.3652036

The ubiquitous use of technology by college students makes them vulnerable to harassment, harm, and intimidation via technological means. We evaluate the prevalence of such technology-facilitated abuse (TFA) among college students in the USA using a critical, feminist, and trauma-informed lens, which is essential to inform policymakers and advocates who support students. We surveyed 776 college students in a large R1 university located in the Midwest region of the USA to examine the prevalence of TFA faced by students marginalized by socio-economic factors, the support sought by student survivors, and the efficacy of support structures. Our findings indicate that 70% students experience TFA, but more than half of them do not seek support. Among the survivors who seek support, 93% students solely rely on informal resources like friends and family, and 6% solely seek support from formal networks such as survivor services or law enforcement. Therefore, we call on policymakers to direct attention to TFA, create tailored interventions to support marginalized students and propose campus-wide campaigns to spread awareness among college students.

[g] https://epubs.siam.org/doi/10.1137/1.9781611977912.115

We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where n sensors are measuring some common, ongoing phenomenon. At each time step t = 1, 2,. ., T, the ith sensor reports its readings for that time step. The algorithm must then commit to its estimate μt for the true mean value of the process at time t. We assume that most of the sensors observe independent samples from some common distribution X, but an ɛ-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation μ to the true mean μ* := E[X]. We note that if the algorithm is allowed to wait until time T to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation.

We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of (ɛ, δ)-stability, we give an efficient online algorithm that outputs estimates μt, t ∈ [T], such that with high probability it holds that ‖μ — μ*‖2 = O (δ log (T)), where μ = (μt)t ∈[T]. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve ℓ2-error of O(δ). Our second main result shows that with additional assumptions on the input (most notably that X is a product distribution) there are inefficient algorithms whose error does not depend on T at all.

[h] https://arxiv.org/abs/2403.05757

Teleoperation systems map operator commands from an input device into some coordinate frame in the remote environment. This frame, which we call a control coordinate system, should be carefully chosen as it determines how operators should move to get desired robot motions. While specific choices made by individual systems have been described in prior work, a design space, i.e., an abstraction that encapsulates the range of possible options, has not been codified. In this paper, we articulate a design space of control coordinate systems, which can be defined by choosing a direction in the remote environment for each axis of the input device. Our key insight is that there is a small set of meaningful directions in the remote environment. Control coordinate systems in prior works can be organized by the alignments of their axes with these directions and new control coordinate systems can be designed by choosing from these directions. We also provide three design criteria to reason about the suitability of control coordinate systems for various scenarios. To demonstrate the utility of our design space, we use it to organize prior systems and design control coordinate systems for three scenarios that we assess through human-subject experiments. Our results highlight the promise of our design space as a conceptual tool to assist system designers to design control coordinate systems that are effective and intuitive for operators.

[i] https://link.springer.com/chapter/10.1007/978-3-031-57728-4_3

Functional Encryption (FE) is a powerful notion of encryption which enables computations and partial message recovery of encrypted data. In FE, each decryption key is associated with a function f such that decryption recovers the function evaluation f(m) from an encryption of m. Informally, security states that a user with access to function keys (and so on) can only learn (and so on) but nothing more about the message. The system is said to be q-bounded collusion resistant if the security holds as long as an adversary gets access to at most decryption keys.

However, until very recently, all these works studied bounded collusion resistance in a static model, where the collusion bound q was a global system parameter. While the static model has led to many great applications, it has major drawbacks. Recently, Agrawal et al. (Crypto 2021) and Garg et al. (Eurocrypt 2022) introduced the dynamic model for bounded collusion resistance, where the collusion bound q was a fluid parameter, not globally set, but chosen by each encryptor. The dynamic model enabled harnessing many virtues of the static model, while avoiding its various drawbacks. In this work, we provide a generic compiler to upgrade any FE scheme from the static model to the dynamic model.

We also extend our techniques to multi-authority attribute-based encryption (MA-ABE). We show that bounded collusion MA-ABE supporting predicates that can be represented as an efficient computational secret sharing (CSS) scheme can be built from minimal assumptions. Efficient CSS schemes are known for access structures whose characteristic function can be computed by a polynomial-size monotone circuit under the existence of one-way functions [Yao89, unpublished]. Thus, our MA-ABE construction is the first MA-ABE scheme from standard assumptions for predicates beyond polynomial-size monotone boolean formula. Our construction also satisfies full adaptive security in the Random Oracle Model.

[j] https://arxiv.org/abs/2301.12357

In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such optimal data collection strategy for policy evaluation involving heteroscedastic reward noise in the linear bandit setting. We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting that reduces the MSE of the value of the target policy. We then use this formulation to derive the optimal allocation of samples per action during data collection. We then introduce a novel algorithm SPEED (Structured Policy Evaluation Experimental Design) that tracks the optimal design and derive its regret with respect to the optimal design. Finally, we empirically validate that SPEED leads to policy evaluation with mean squared error comparable to the oracle strategy and significantly lower than simply running the target policy.

[k] https://arxiv.org/abs/2202.05687

Detecting diffusion-generated deepfake images remains an open problem. Current detection methods fail against an adversary who adds imperceptible adversarial perturbations to the deepfake to evade detection. In this work, we propose Disjoint Diffusion Deepfake Detection (D4), a deepfake detector designed to improve black-box adversarial robustness beyond de facto solutions such as adversarial training. D4 uses an ensemble of models over disjoint subsets of the frequency spectrum to significantly improve adversarial robustness. Our key insight is to leverage a redundancy in the frequency domain and apply a saliency partitioning technique to disjointly distribute frequency components across multiple models. We formally prove that these disjoint ensembles lead to a reduction in the dimensionality of the input subspace where adversarial deepfakes lie, thereby making adversarial deepfakes harder to find for black-box attacks. We then empirically validate the D4 method against several black-box attacks and find that D4 significantly outperforms existing state-of-the-art defenses applied to diffusion-generated deepfake detection. We also demonstrate that D4 provides robustness against adversarial deepfakes from unseen data distributions as well as unseen generative techniques.

[l] https://ieeexplore.ieee.org/abstract/document/10136161

Designing deep networks robust to adversarial examples remains an open problem. Recently, it was shown that adversaries relying on only top-1 feedback (i.e., the hard-label) from an image classification model can arbitrarily shift an image towards an intended target prediction. Likewise, these hard-label adversaries enjoy performance comparable to first-order adversaries relying on the full model gradient. It was also shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors. In this paper, we argue that query efficiency in the hard-label setting is also connected to an adversary’s traversal through the data manifold. To explain this behavior, we propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary’s distribution of gradient estimates. Through numerical experiments of manifold-gradient mutual information, we show this behavior acts as a function of the effective problem dimensionality. On high-dimensional real-world datasets and multiple hard-label attacks using dimension reduction, we observe the same behavior to produce samples closer to the data manifold. This can result in up to 10x decrease in the manifold distance measure, regardless of the model robustness. Our results suggest that our variant of hard-label attack can find a higher concentration of generalization errors than previous techniques, leading to improved worst-case analysis for model designers.

[m] https://dl.acm.org/doi/10.1145/3613904.3642188

In recent years, there has been a growing interest in enhancing the accessibility of visualizations for people with visual impairments. While much of the research has focused on improving accessibility for screen reader users, the specific needs of people with remaining vision (i.e., low-vision individuals) have been largely unaddressed. To bridge this gap, we conducted a qualitative study that provides insights into how low-vision individuals experience visualizations. We found that participants utilized various strategies to examine visualizations using the screen magnifiers and also observed that the default zoom level participants use for general purposes may not be optimal for reading visualizations. We identified that participants relied on their prior knowledge and memory to minimize the traversing cost when examining visualization. Based on the findings, we motivate a personalized tool to accommodate varying visual conditions of low-vision individuals and derive the design goals and features of the tool.

[o] https://arxiv.org/abs/2307.15255

This paper presents predicate transfer, a novel method that optimizes join performance by pre-filtering tables to reduce the join input sizes. Predicate transfer generalizes Bloom join, which conducts pre-filtering within a single join operation, to multi-table joins such that the filtering benefits can be significantly increased. Predicate transfer is inspired by the seminal theoretical results by Yannakakis, which uses semi-joins to pre-filter acyclic queries. Predicate transfer generalizes the theoretical results to any join graphs and use Bloom filters to replace semi-joins leading to significant speedup. Evaluation shows predicate transfer can outperform Bloom join by 3.1x on average on TPC-H benchmark.

[p] https://openaccess.thecvf.com/content/WACV2024/html/Smink_Computer_Vision_on_the_Edge_Individual_Cattle_Identification_in_Real-Time_WACV_2024_paper.html

In precision livestock farming, the individual identification of cattle is crucial to inform the decisions made to enhance animal welfare, health, and productivity. In literature, models exist that can read ear tags; however, they are not easily portable to real-world cattle production environments and make predictions mainly on still images. We propose a video-based cattle ear tag reading system, called ReadMyCow, which takes advantage of the temporal characteristics in videos to accurately detect, track, and read cattle ear tags at 25 FPS on edge devices. For each frame in a video, ReadMyCow functions in two steps. 1) Tag detection: a YOLOv5s Object Detection model and NVIDIA Deepstream Tracking Layer detect and track the tags present. 2) Tag reading: the novel WhenToRead module decides whether to read each tag, using a TRBA Scene Text Recognition model, or to use the reading from a previous frame. The system is implemented on an edge device, namely the NVIDIA Jetson AGX Orin or Xavier, making it portable to cattle production environments without external computational resources. To attain real-time speeds, ReadMyCow only reads the detected tag in the current frame if it thinks it will get a better reading when a decision metric is significantly improved in the current frame. Ideally, this means the best reading of a tag is found and stored throughout a tag’s presence in the video, even when the tag becomes occluded or blurry. While testing the system at a real Midwestern dairy farm housing 9,000 cows, 96.1% of printed ear tags were accurately read by the ReadMyCow system, demonstrating its real-world commercial potential. ReadMyCow opens opportunities for informed data-driven decision-making processes on commercial cattle farms.

[q] https://arxiv.org/abs/2201.08984

Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL — representation learning and label disambiguation — in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.

[r] https://arxiv.org/abs/2201.08984

[s] https://arxiv.org/abs/2209.10365

Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods.

[t] https://arxiv.org/abs/2403.14041

Learning companion robots for young children are increasingly adopted in informal learning environments. Although parents play a pivotal role in their children’s learning, very little is known about how parents prefer to incorporate robots into their children’s learning activities. We developed prototype capabilities for a learning companion robot to deliver educational prompts and responses to parent-child pairs during reading sessions and conducted in-home user studies involving 10 families with children aged 3-5. Our data indicates that parents want to work with robots as collaborators to augment parental activities to foster children’s learning, introducing the notion of parent-robot collaboration. Our findings offer an empirical understanding of the needs and challenges of parent-child interaction in informal learning scenarios and design opportunities for integrating a companion robot into these interactions. We offer insights into how robots might be designed to facilitate parent-robot collaboration, including parenting policies, collaboration patterns, and interaction paradigms.

[u] https://dl.acm.org/doi/10.1145/3613904.3642950

Assistive technologies for adults with Down syndrome (DS) need designs tailored to their specific technology requirements. While prior research has explored technology design for individuals with intellectual disabilities, little is understood about the needs and expectations of adults with DS. Assistive technologies should leverage the abilities and interests of the population, while incorporating age- and context-considerate content. In this work, we interviewed six adults with DS, seven parents of adults with DS, and three experts in speech-language pathology, special education, and occupational therapy to determine how technology could support adults with DS. In our thematic analysis, four main themes emerged, including (1) community vs. home social involvement; (2) misalignment of skill expectations between adults with DS and parents; (3) family limitations in technology support; and (4) considerations for technology development. Our findings extend prior literature by including the voices of adults with DS in how and when they use technology.

[v] https://dl.acm.org/doi/10.1145/3620665.3640367

We are in age of AI, with rapidly changing algorithms and a somewhat synergistic change in hardware. MLPerf is a recent benchmark suite that serves as a way to compare and evaluate hardware. However it has several drawbacks – it is dominated by CNNs and does a poor job of capturing the diversity of AI use cases, and only represents a sliver of production AI use cases. This paper performs a longitudinal study of state-of-art AI applications spanning vision, physical simulation, vision synthesis, language and speech processing, and tabular data processing, across three generations of hardware to understand how the AI revolution has panned out. We call this collection of applications and execution scaffolding the CaSiO suite. The paper reports on data gathered at the framework level, device API level, and hardware and microarchitecture level. The paper provides insights on the hardware-software revolution with pointers to future trends.

[w] https://ieeexplore.ieee.org/document/10476447

Graphics processing units (GPUs) are an important class of parallel processors that offer high compute throughput and memory bandwidth. GPUs are used in a variety of important computing domains, such as machine learning, high performance computing, sparse linear algebra, autonomous vehicles, and robotics. However, some applications from these domains can underperform due to sensitivity to memory latency and bandwidth. Some of this sensitivity can be reduced by better overlapping memory access with compute. Current GPUs often leverage pipeline parallelism in the form of warp specialization to enable better overlap. However, current warp specialization support on GPUs is limited in three ways. First, warp specialization is a complex and manual program transformation that is out of reach for many applications and developers. Second, it is limited to coarse-grained transfers between global memory and the shared memory scratchpad (SMEM); fine-grained memory access patterns are not well supported. Finally, the GPU hardware is unaware of the pipeline parallelism expressed by the programmer, and is unable to take advantage of this information to make better decisions at runtime. In this paper we introduce WASP, hardware and compiler support for warp specialization that addresses these limitations. WASP enables fine-grained streaming and gather memory access patterns through the use of warp-level register file queues and hardware-accelerated address generation. Explicit warp to pipeline stage naming enables the GPU to be aware of pipeline parallelism, which WASP capitalizes on by designing pipeline-aware warp mapping, register allocation, and scheduling. Finally, we design and implement a compiler that can automatically generate warp specialized kernels, reducing programmer burden. Overall, we find that runtime performance can be improved on a variety of important applications by an average of 47% over a modern GPU baseline.

[x] https://arxiv.org/abs/2401.16677

Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed techniques can overlap, and thus, hide this communication with independent computations, techniques such as Tensor Parallelism (TP) inherently serialize communication with model execution. One approach to hide this serialized communication is to interleave it with the producer operation (of the communicated data) in a fine-grained manner. However, this fine-grained interleaving of communication and computation in software can be difficult. Furthermore, as with any concurrent execution, it requires compute and memory resources to be shared between computation and communication, causing resource contention that reduces overlapping efficacy.

To overcome these challenges, we propose T3 which applies hardware-software co-design to transparently overlap serialized communication while minimizing resource contention with compute. T3 transparently fuses producer operations with the subsequent communication via a simple configuration of the producer’s output address space and requires minor software changes. At the hardware level, T3 adds a lightweight track and trigger mechanism to orchestrate the producer’s compute, and communication. It further uses compute-enhanced memories for communication’s attendant compute. As a result, T3 reduces resource contention, and efficiently overlaps serialized communication with computation. For important Transformer models like T-NLG, T3 speeds up communication-heavy sublayers by 30% geomean (max 47%) and reduces data movement by 22% geomean (max 36%). Furthermore, T3’s benefits persist as models scale: geomean 29% for sublayers in ∼500-billion parameter models, PALM and MT-NLG.

[y] https://dl.acm.org/doi/10.1145/3617232.3624854

Modern storage devices, such as Optane NVMe SSDs, offer ultra-low latency of a few microseconds and high bandwidth of multiple gigabytes per second. At these speeds, the kernel software I/O stack is a substantial source of overhead. Userspace approaches avoid kernel software overheads but face challenges in supporting shared storage without major changes to file systems, the OS or the hardware.

We propose a new I/O architecture, BypassD, for fast, userspace access to shared storage devices. BypassD takes inspiration from virtual memory: it uses virtual addresses to access a device and relies on hardware for translation and protection. Like memory-mapping a file, the OS kernel constructs a mapping for file contents in the page table. Userspace I/O requests then use virtual addresses from these mappings to specify which file and file offset to access. BypassD extends the IOMMU hardware to translate file offsets into device Logical Block Addresses. Existing applications require no modifications to use BypassD. Our evaluation shows that BypassD reduces latency for 4KB accesses by 42% compared to standard Linux kernel and performs close to userspace techniques like SPDK that do not support device sharing. By eliminating software overheads, BypassD improves performance of real workloads, such as the WiredTiger storage engine, by ~20%.

[z] https://dl.acm.org/doi/10.1145/3650203.3663336

Recent research has shown the importance of tuning DBMS configuration knobs to achieve high performance. As a result, a large number of search-based and learning-based auto-tuning methods have been proposed. However, despite the promising results, we observe that (1) developing new auto-tuning methods, and (2) comparing them to existing methods is cumbersome and requires extensive engineering effort. This effort is compounded by the fact that auto-tuning methods typically need to be evaluated on multiple systems (PostgreSQL, MySQL, etc.) and benchmarks (TPC-C, TPC-H, etc.). In this paper we describe Nautilus, a platform that automates benchmarking for DBMS configuration tuning. Our insight in building Nautilus is that the process of developing auto-tuners can be separated into a tuning algorithm layer and a benchmarking layer. We focus on automating the benchmarking layer and thus make it easier for researchers to develop new tuning methods and compare with existing methods. We describe the design and implementation of Nautilus, highlighting its main features and advantages to the end-user. Furthermore, we are open-sourcing this project, so that the research community can freely use and benefit from it.

[aa] https://arxiv.org/abs/2312.12621

Deep Learning (DL) workloads have rapidly increased in popularity in enterprise clusters and several new cluster schedulers have been proposed in recent years to support these workloads. With rapidly evolving DL workloads, it is challenging to quickly prototype and compare scheduling policies across workloads. Further, as prior systems target different aspects of scheduling (resource allocation, placement, elasticity etc.), it is also challenging to combine these techniques and understand the overall benefits. To address these challenges we propose Blox, a modular toolkit which allows developers to compose individual components and realize diverse scheduling frameworks. We identify a set of core abstractions for DL scheduling, implement several existing schedulers using these abstractions, and verify the fidelity of these implementations by reproducing results from prior research. We also highlight how we can evaluate and compare existing schedulers in new settings: different workload traces, higher cluster load, change in DNN workloads and deployment characteristics. Finally, we showcase Blox’s extensibility by composing policies from different schedulers, and implementing novel policies with minimal code changes.

[ab] https://dl.acm.org/doi/10.1145/3639286

The scaling of per-GB DRAM cost has slowed down in recent years. Recent research has suggested that adding remote memory to a system can further reduce the overall memory cost while maintaining good performance. Remote memory (i.e., tiered memory), connected to host servers via high-speed interconnect protocols such as RDMA and CXL, is expected to deliver 100x (less than 1µs) lower latency than SSD and be more cost-effective than local DRAM through pooling or adopting cheaper memory technologies.

Tiered memory opens up a large number of potential use cases within database systems. But previous work has only explored limited ways of using tiered memory. Our study provides a systematic study for DBMS to build tiered memory buffer management with respect to a wide range of hardware performance characteristics. Specifically, we study five different indexing designs that leverage remote memory in different ways and evaluate them through a wide range of metrics including performance, tiered-memory latency sensitivity, and cost-effectiveness. In addition, we propose a new memory provisioning strategy that allocates an optimal amount of local and remote memory for a given workload. Our evaluations show that while some designs achieve higher performance than others, no design can win in all measured dimensions.

[ac] https://dl.acm.org/doi/10.1145/3613904.3642829

Autism Spectrum Disorder (ASD) presents challenges in social interaction skill development, particularly in turn-taking. Digital interventions offer potential solutions for improving autistic children’s social skills but often lack addressing specific collaboration techniques. Therefore, we designed a prototype of a turn-taking collaborative tablet game, StarRescue, which encourages children’s distinct collaborative roles and interdependence while progressively enhancing sharing and mutual planning skills. We further conducted a controlled study with 32 autistic children to evaluate StarRescue’s usability and potential effectiveness in improving their social skills. Findings indicated that StarRescue has great potential to foster turn-taking skills and social communication skills (e.g., prompting, negotiation, task allocation) within the game and also extend beyond the game. Additionally, we discussed implications for future work, such as including parents as game spectators and understanding autistic children’s territory awareness in collaboration. Our study contributes a promising digital intervention for autistic children’s turn-taking social skill development via a scaffolding approach and valuable design implications for future research.

[ad] https://arxiv.org/html/2402.12772v1

Reading is a challenging task for low vision people. While conventional low vision aids (e.g., magnification) offer certain support, they cannot fully address the difficulties faced by low vision users, such as locating the next line and distinguishing similar words. To fill this gap, we present GazePrompt, a gaze-aware reading aid that provides timely and targeted visual and audio augmentations based on users’ gaze behaviors. GazePrompt includes two key features: (1) a Line-Switching support that highlights the line a reader intends to read; and (2) a Difficult-Word support that magnifies or reads aloud a word that the reader hesitates with. Through a study with 13 low vision participants who performed well-controlled reading-aloud tasks with and without GazePrompt, we found that GazePrompt significantly reduced participants’ line switching time, reduced word recognition errors, and improved their subjective reading experiences. A follow-up silent-reading study showed that GazePrompt can enhance users’ concentration and perceived comprehension of the reading contents. We further derive design considerations for future gaze-based low vision aids.

[ae] https://arxiv.org/html/2402.07300v2

Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce Spica, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, Spica offers interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, Spica augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of Spica and explored user behaviors, preferences, and mental models when interacting with augmented ADs.

[af] https://arxiv.org/abs/2311.05511

We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no longer sufficient, we show that there exist optimal deterministic policies augmented with cumulative costs. In fact, we present a fixed-parameter tractable reduction from anytime-constrained cMDPs to unconstrained MDPs. Our reduction yields planning and learning algorithms that are time and sample-efficient for tabular cMDPs so long as the precision of the costs is logarithmic in the size of the cMDP. However, we also show that computing non-trivial approximately optimal policies is NP-hard in general. To circumvent this bottleneck, we design provable approximation algorithms that efficiently compute or learn an arbitrarily accurate approximately feasible policy with optimal value so long as the maximum supported cost is bounded by a polynomial in the cMDP or the absolute budget. Given our hardness results, our approximation guarantees are the best possible under worst-case analysis