Notes for Weekly/Monthly Good News

For the Good News list, visit here. The notes below begin with the June, 2024 Good News. For notes previous to June, 2024, visit the Good News list.

This is an accordion element with a series of buttons that open and close related content panels.

June 2024

[1] “Distinguished Seminar in Optimization and Data” seminars

https://math.washington.edu/events/series/distinguished-seminar-optimization-data

The seminar is held on Mondays at 03:30 PM during the term. Previous speakers include E. Tardos, P. Rigollet, L. Levina, B. Sturmfels. This seminar series is an interdepartmental activity at UW organized by faculty in the departments of Applied Mathematics, Computer Science & Engineering, Electrical & Computer Engineering, Mathematics, and Statistics.

“Nonsmooth Optimization on a Finer Scale”

Abstract: Nonsmooth optimization problems are ubiquitous in industrial and machine learning applications, arising from the need to address tasks such as resource allocation, threat detection, and model training. Within the realm of mathematical optimization, nonsmooth problems stand as some of the most challenging; yet they offer the possibility of developing efficient algorithms with provable guarantees. The complexity of these problems, encompassing both lower and upper bounds, has historically typically been examined under generic assumptions, bounding the growth of the objective functions by assuming Lipschitz continuity of either the objective itself or its gradient. In this talk, I will argue that these traditional assumptions defining classes of nonsmooth optimization problems inadequately capture local properties of problems that may make them amenable to more efficient algorithms. I will introduce a notion of local bounded variation of the (sub)gradient and discuss how, under this notion, we can obtain a more fine-grained characterization of the complexity of nonsmooth problems. One consequence of these results is that, contrary to prior belief, the complexity of nonsmooth optimization problems, such as those with piecewise linear objectives with polynomially many pieces, can be improved using parallelization even in high dimensions, either if the problem is unconstrained or if the function is allowed to be queried outside the feasible set.

Based on joint work with Cristóbal Guzmán; preprint available at https://arxiv.org/abs/2403.16317

 

[2] https://bwyogatama.github.io/

 

[3] https://www.laurastegner.com/

 

[4] https://cpsiff.github.io/

 

[5] https://pages.cs.wisc.edu/~sujayyadalam/

 

[6] https://shivaram.org/

 

[7] https://nikoszarifis.github.io/

 

[8] https://pauley.me/

 

[a] https://dl.acm.org/doi/abs/10.1002/aaai.12147

The National Science Foundation (NSF) Artificial Intelligence (AI) Institute for Edge Computing Leveraging Next Generation Networks (Athena) seeks to foment a transformation in modern edge computing by advancing AI foundations, computing paradigms, networked computing systems, and edge services and applications from a completely new computing perspective. Led by Duke University, Athena leverages revolutionary developments in computer systems, machine learning, networked computing systems, cyber‐physical systems, and sensing. Members of Athena form a multidisciplinary team from eight universities. Athena organizes its research activities under four interrelated thrusts supporting edge computing: Foundational AI, Computer Systems, Networked Computing Systems, and Services and Applications, which constitute an ambitious and comprehensive research agenda. The research tasks of Athena will focus on developing AI‐driven next‐generation technologies for edge computing and new algorithmic and practical foundations of AI and evaluating the research outcomes through a combination of analytical, experimental, and empirical instruments, especially with target use‐inspired research. The researchers of Athena demonstrate a cohesive effort by synergistically integrating the research outcomes from the four thrusts into three pillars: Edge Computing AI Systems, Collaborative Extended Reality (XR), and Situational Awareness and Autonomy. Athena is committed to a robust and comprehensive suite of educational and workforce development endeavors alongside its domestic and international collaboration and knowledge transfer efforts with external stakeholders that include both industry and community partnerships.

 

[b] https://arxiv.org/html/2403.04660v1

First responders (FRs) navigate hazardous, unfamiliar environments in the field (e.g., mass-casualty incidents), making life-changing decisions in a split second. AR head-mounted displays (HMDs) have shown promise in supporting them due to its capability of recognizing and augmenting the challenging environments in a hands-free manner. However, the design space have not been thoroughly explored by involving various FRs who serve different roles (e.g., firefighters, law enforcement) but collaborate closely in the field. We interviewed 26 first responders in the field who experienced a state-of-the-art optical-see-through AR HMD, as well as its interaction techniques and four types of AR cues (i.e., overview cues, directional cues, highlighting cues, and labeling cues), soliciting their first-hand experiences, design ideas, and concerns. Our study revealed both generic and role-specific preferences and needs for AR hardware, interactions, and feedback, as well as identifying desired AR designs tailored to urgent, risky scenarios (e.g., affordance augmentation to facilitate fast and safe action). While acknowledging the value of AR HMDs, concerns were also raised around trust, privacy, and proper integration with other equipment. Finally, we derived comprehensive and actionable design guidelines to inform future AR systems for in-field FRs.

 

[c] https://www.microsoft.com/en-us/research/uploads/prodnew/2024/03/nsdi24fall-final668-CloudLoRA.pdf

The Cloud Radio Access Network (CRAN) architecture has been proposed as a way of addressing the network throughput and scalability challenges of large-scale LoRa networks. CRANs can improve network throughput by coherently combining signals, and scale to multiple channels by implementing the receivers in the cloud. However, in remote LoRa deployments, a CRAN’s demand for high-backhaul bandwidths can be challenging to meet. Therefore, bandwidth-aware compression of LoRa samples is needed to reap the benefits of

CRANs. We introduce Cloud-LoRa, the first practical CRAN for LoRa, that can detect sub-noise LoRa signals and perform bandwidth-adaptive compression. To the best of our knowledge, this is the first demonstration of CRAN for LoRa operating in real-time. We deploy Cloud-LoRa in an agricultural field over multiple days with USRP as the gateway. A cellular backhaul hotspot is then used to stream the compressed samples to a Microsoft Azure server. We demonstrate

SNR gains of over 6 dB using joint multi-gateway decoding and over 2x throughput improvement using state-of-the-art receivers, enabled by CRAN in real-world deployments.

 

[d] https://ieeexplore.ieee.org/document/10330695

We present a novel framework, GreyLambda, to improve the scalability of traffic engineering (TE) systems. TE systems continuously monitor traffic and allocate network resources based on observed demands. The temporal requirement for TE is to have a time-to-solution in five minutes or less. Additionally, traffic allocations have a spatial requirement, which is to enable all traffic to traverse the network without encountering an over-subscribed link. However, the multi-commodity flow-based TE formulation cannot scale with increasing network sizes. Recent approaches have relaxed multi-commodity flow constraints to meet the temporal requirement but fail to satisfy the spatial requirement due to changing traffic demands, resulting in oversubscribed links or infeasible solutions. To satisfy both these requirements, we utilize optical topology programming (OTP) to rapidly reconfigure optical wavelengths in critical network paths and provide localized bandwidth scaling and new paths for traffic forwarding. GreyLambda integrates OTP into TE systems by introducing a heuristic algorithm that capitalizes on latent hardware resources at high-degree nodes to offer bandwidth scaling, and a method to reduce optical path reconfiguration latencies. Our experiments show that GreyLambda enhances the performance of two state-of-the-art TE systems, SMORE and NCFlow in real-world topologies with challenging traffic and link failure scenarios.

 

[e] https://dl.acm.org/doi/10.1145/3626778

We present a longitudinal study of intercontinental long-haul links (LHLs) – links with latencies significantly higher than that of all other links in a traceroute path. Our study is motivated by the recognition of these LHLs as a network-layer manifestation of critical transoceanic undersea cables. We present a methodology and associated processing system for identifying long-haul links in traceroute measurements. We apply this system to a large corpus of traceroute data and report on multiple aspects of long haul connectivity including country-level prevalence, routers as international gateways, preferred long-haul destinations, and the evolution of these characteristics over a 7 year period. We identify 85,620 layer-3 links (out of 2.7M links in a large traceroute dataset) that satisfy our definition for intercontinental long haul with many of them terminating in a relatively small number of nodes. An analysis of connected components shows a clearly dominant component with a relative size that remains stable despite a significant growth of the long-haul infrastructure.

 

[f] https://dl.acm.org/doi/10.1145/3613905.3652036

The ubiquitous use of technology by college students makes them vulnerable to harassment, harm, and intimidation via technological means. We evaluate the prevalence of such technology-facilitated abuse (TFA) among college students in the USA using a critical, feminist, and trauma-informed lens, which is essential to inform policymakers and advocates who support students. We surveyed 776 college students in a large R1 university located in the Midwest region of the USA to examine the prevalence of TFA faced by students marginalized by socio-economic factors, the support sought by student survivors, and the efficacy of support structures. Our findings indicate that 70% students experience TFA, but more than half of them do not seek support. Among the survivors who seek support, 93% students solely rely on informal resources like friends and family, and 6% solely seek support from formal networks such as survivor services or law enforcement. Therefore, we call on policymakers to direct attention to TFA, create tailored interventions to support marginalized students and propose campus-wide campaigns to spread awareness among college students.

 

[g] https://epubs.siam.org/doi/10.1137/1.9781611977912.115

We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where n sensors are measuring some common, ongoing phenomenon. At each time step t = 1, 2,. ., T, the ith sensor reports its readings for that time step. The algorithm must then commit to its estimate μt for the true mean value of the process at time t. We assume that most of the sensors observe independent samples from some common distribution X, but an ɛ-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation μ to the true mean μ* := E[X]. We note that if the algorithm is allowed to wait until time T to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation.

We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of (ɛ, δ)-stability, we give an efficient online algorithm that outputs estimates μt, t ∈ [T], such that with high probability it holds that ‖μ — μ*‖2 = O (δ log (T)), where μ = (μt)t ∈[T]. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve ℓ2-error of O(δ). Our second main result shows that with additional assumptions on the input (most notably that X is a product distribution) there are inefficient algorithms whose error does not depend on T at all.

[h] https://arxiv.org/abs/2403.05757

Teleoperation systems map operator commands from an input device into some coordinate frame in the remote environment. This frame, which we call a control coordinate system, should be carefully chosen as it determines how operators should move to get desired robot motions. While specific choices made by individual systems have been described in prior work, a design space, i.e., an abstraction that encapsulates the range of possible options, has not been codified. In this paper, we articulate a design space of control coordinate systems, which can be defined by choosing a direction in the remote environment for each axis of the input device. Our key insight is that there is a small set of meaningful directions in the remote environment. Control coordinate systems in prior works can be organized by the alignments of their axes with these directions and new control coordinate systems can be designed by choosing from these directions. We also provide three design criteria to reason about the suitability of control coordinate systems for various scenarios. To demonstrate the utility of our design space, we use it to organize prior systems and design control coordinate systems for three scenarios that we assess through human-subject experiments. Our results highlight the promise of our design space as a conceptual tool to assist system designers to design control coordinate systems that are effective and intuitive for operators.

[i] https://link.springer.com/chapter/10.1007/978-3-031-57728-4_3

Functional Encryption (FE) is a powerful notion of encryption which enables computations and partial message recovery of encrypted data. In FE, each decryption key is associated with a function f such that decryption recovers the function evaluation f(m) from an encryption of m. Informally, security states that a user with access to function keys (and so on) can only learn  (and so on) but nothing more about the message. The system is said to be q-bounded collusion resistant if the security holds as long as an adversary gets access to at most decryption keys.

However, until very recently, all these works studied bounded collusion resistance in a static model, where the collusion bound q was a global system parameter. While the static model has led to many great applications, it has major drawbacks. Recently, Agrawal et al. (Crypto 2021) and Garg et al. (Eurocrypt 2022) introduced the dynamic model for bounded collusion resistance, where the collusion bound q was a fluid parameter, not globally set, but chosen by each encryptor. The dynamic model enabled harnessing many virtues of the static model, while avoiding its various drawbacks. In this work, we provide a generic compiler to upgrade any FE scheme from the static model to the dynamic model.

We also extend our techniques to multi-authority attribute-based encryption (MA-ABE). We show that bounded collusion MA-ABE supporting predicates that can be represented as an efficient computational secret sharing (CSS) scheme can be built from minimal assumptions. Efficient CSS schemes are known for access structures whose characteristic function can be computed by a polynomial-size monotone circuit under the existence of one-way functions [Yao89, unpublished]. Thus, our MA-ABE construction is the first MA-ABE scheme from standard assumptions for predicates beyond polynomial-size monotone boolean formula. Our construction also satisfies full adaptive security in the Random Oracle Model.

[j] https://arxiv.org/abs/2301.12357

In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such optimal data collection strategy for policy evaluation involving heteroscedastic reward noise in the linear bandit setting. We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting that reduces the MSE of the value of the target policy. We then use this formulation to derive the optimal allocation of samples per action during data collection. We then introduce a novel algorithm SPEED (Structured Policy Evaluation Experimental Design) that tracks the optimal design and derive its regret with respect to the optimal design. Finally, we empirically validate that SPEED leads to policy evaluation with mean squared error comparable to the oracle strategy and significantly lower than simply running the target policy.

[k] https://arxiv.org/abs/2202.05687

Detecting diffusion-generated deepfake images remains an open problem. Current detection methods fail against an adversary who adds imperceptible adversarial perturbations to the deepfake to evade detection. In this work, we propose Disjoint Diffusion Deepfake Detection (D4), a deepfake detector designed to improve black-box adversarial robustness beyond de facto solutions such as adversarial training. D4 uses an ensemble of models over disjoint subsets of the frequency spectrum to significantly improve adversarial robustness. Our key insight is to leverage a redundancy in the frequency domain and apply a saliency partitioning technique to disjointly distribute frequency components across multiple models. We formally prove that these disjoint ensembles lead to a reduction in the dimensionality of the input subspace where adversarial deepfakes lie, thereby making adversarial deepfakes harder to find for black-box attacks. We then empirically validate the D4 method against several black-box attacks and find that D4 significantly outperforms existing state-of-the-art defenses applied to diffusion-generated deepfake detection. We also demonstrate that D4 provides robustness against adversarial deepfakes from unseen data distributions as well as unseen generative techniques.

[l] https://ieeexplore.ieee.org/abstract/document/10136161

Designing deep networks robust to adversarial examples remains an open problem. Recently, it was shown that adversaries relying on only top-1 feedback (i.e., the hard-label) from an image classification model can arbitrarily shift an image towards an intended target prediction. Likewise, these hard-label adversaries enjoy performance comparable to first-order adversaries relying on the full model gradient. It was also shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors. In this paper, we argue that query efficiency in the hard-label setting is also connected to an adversary’s traversal through the data manifold. To explain this behavior, we propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary’s distribution of gradient estimates. Through numerical experiments of manifold-gradient mutual information, we show this behavior acts as a function of the effective problem dimensionality. On high-dimensional real-world datasets and multiple hard-label attacks using dimension reduction, we observe the same behavior to produce samples closer to the data manifold. This can result in up to 10x decrease in the manifold distance measure, regardless of the model robustness. Our results suggest that our variant of hard-label attack can find a higher concentration of generalization errors than previous techniques, leading to improved worst-case analysis for model designers.

[m] https://dl.acm.org/doi/10.1145/3613904.3642188

In recent years, there has been a growing interest in enhancing the accessibility of visualizations for people with visual impairments. While much of the research has focused on improving accessibility for screen reader users, the specific needs of people with remaining vision (i.e., low-vision individuals) have been largely unaddressed. To bridge this gap, we conducted a qualitative study that provides insights into how low-vision individuals experience visualizations. We found that participants utilized various strategies to examine visualizations using the screen magnifiers and also observed that the default zoom level participants use for general purposes may not be optimal for reading visualizations. We identified that participants relied on their prior knowledge and memory to minimize the traversing cost when examining visualization. Based on the findings, we motivate a personalized tool to accommodate varying visual conditions of low-vision individuals and derive the design goals and features of the tool.

[o] https://arxiv.org/abs/2307.15255

This paper presents predicate transfer, a novel method that optimizes join performance by pre-filtering tables to reduce the join input sizes. Predicate transfer generalizes Bloom join, which conducts pre-filtering within a single join operation, to multi-table joins such that the filtering benefits can be significantly increased. Predicate transfer is inspired by the seminal theoretical results by Yannakakis, which uses semi-joins to pre-filter acyclic queries. Predicate transfer generalizes the theoretical results to any join graphs and use Bloom filters to replace semi-joins leading to significant speedup. Evaluation shows predicate transfer can outperform Bloom join by 3.1x on average on TPC-H benchmark.

[p] https://openaccess.thecvf.com/content/WACV2024/html/Smink_Computer_Vision_on_the_Edge_Individual_Cattle_Identification_in_Real-Time_WACV_2024_paper.html

In precision livestock farming, the individual identification of cattle is crucial to inform the decisions made to enhance animal welfare, health, and productivity. In literature, models exist that can read ear tags; however, they are not easily portable to real-world cattle production environments and make predictions mainly on still images. We propose a video-based cattle ear tag reading system, called ReadMyCow, which takes advantage of the temporal characteristics in videos to accurately detect, track, and read cattle ear tags at 25 FPS on edge devices. For each frame in a video, ReadMyCow functions in two steps. 1) Tag detection: a YOLOv5s Object Detection model and NVIDIA Deepstream Tracking Layer detect and track the tags present. 2) Tag reading: the novel WhenToRead module decides whether to read each tag, using a TRBA Scene Text Recognition model, or to use the reading from a previous frame. The system is implemented on an edge device, namely the NVIDIA Jetson AGX Orin or Xavier, making it portable to cattle production environments without external computational resources. To attain real-time speeds, ReadMyCow only reads the detected tag in the current frame if it thinks it will get a better reading when a decision metric is significantly improved in the current frame. Ideally, this means the best reading of a tag is found and stored throughout a tag’s presence in the video, even when the tag becomes occluded or blurry. While testing the system at a real Midwestern dairy farm housing 9,000 cows, 96.1% of printed ear tags were accurately read by the ReadMyCow system, demonstrating its real-world commercial potential. ReadMyCow opens opportunities for informed data-driven decision-making processes on commercial cattle farms.

[q] https://arxiv.org/abs/2201.08984

Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL — representation learning and label disambiguation — in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.

[r] https://arxiv.org/abs/2201.08984

Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL — representation learning and label disambiguation — in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Moreover, we study a challenging yet practical noisy partial label learning setup, where the ground-truth may not be included in the candidate set. To remedy this problem, we present an extension PiCO+ that performs distance-based clean sample selection and learns robust classifiers by a semi-supervised contrastive learning algorithm. Extensive experiments demonstrate that our proposed methods significantly outperform the current state-of-the-art approaches in standard and noisy PLL tasks and even achieve comparable results to fully supervised learning.

[s] https://arxiv.org/abs/2209.10365

Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods. 

[t] https://arxiv.org/abs/2403.14041

Learning companion robots for young children are increasingly adopted in informal learning environments. Although parents play a pivotal role in their children’s learning, very little is known about how parents prefer to incorporate robots into their children’s learning activities. We developed prototype capabilities for a learning companion robot to deliver educational prompts and responses to parent-child pairs during reading sessions and conducted in-home user studies involving 10 families with children aged 3-5. Our data indicates that parents want to work with robots as collaborators to augment parental activities to foster children’s learning, introducing the notion of parent-robot collaboration. Our findings offer an empirical understanding of the needs and challenges of parent-child interaction in informal learning scenarios and design opportunities for integrating a companion robot into these interactions. We offer insights into how robots might be designed to facilitate parent-robot collaboration, including parenting policies, collaboration patterns, and interaction paradigms.

[u] https://dl.acm.org/doi/10.1145/3613904.3642950

Assistive technologies for adults with Down syndrome (DS) need designs tailored to their specific technology requirements. While prior research has explored technology design for individuals with intellectual disabilities, little is understood about the needs and expectations of adults with DS. Assistive technologies should leverage the abilities and interests of the population, while incorporating age- and context-considerate content. In this work, we interviewed six adults with DS, seven parents of adults with DS, and three experts in speech-language pathology, special education, and occupational therapy to determine how technology could support adults with DS. In our thematic analysis, four main themes emerged, including (1) community vs. home social involvement; (2) misalignment of skill expectations between adults with DS and parents; (3) family limitations in technology support; and (4) considerations for technology development. Our findings extend prior literature by including the voices of adults with DS in how and when they use technology.

[v] https://dl.acm.org/doi/10.1145/3620665.3640367

We are in age of AI, with rapidly changing algorithms and a somewhat synergistic change in hardware. MLPerf is a recent benchmark suite that serves as a way to compare and evaluate hardware. However it has several drawbacks – it is dominated by CNNs and does a poor job of capturing the diversity of AI use cases, and only represents a sliver of production AI use cases. This paper performs a longitudinal study of state-of-art AI applications spanning vision, physical simulation, vision synthesis, language and speech processing, and tabular data processing, across three generations of hardware to understand how the AI revolution has panned out. We call this collection of applications and execution scaffolding the CaSiO suite. The paper reports on data gathered at the framework level, device API level, and hardware and microarchitecture level. The paper provides insights on the hardware-software revolution with pointers to future trends.

[w] https://ieeexplore.ieee.org/document/10476447

Graphics processing units (GPUs) are an important class of parallel processors that offer high compute throughput and memory bandwidth. GPUs are used in a variety of important computing domains, such as machine learning, high performance computing, sparse linear algebra, autonomous vehicles, and robotics. However, some applications from these domains can underperform due to sensitivity to memory latency and bandwidth. Some of this sensitivity can be reduced by better overlapping memory access with compute. Current GPUs often leverage pipeline parallelism in the form of warp specialization to enable better overlap. However, current warp specialization support on GPUs is limited in three ways. First, warp specialization is a complex and manual program transformation that is out of reach for many applications and developers. Second, it is limited to coarse-grained transfers between global memory and the shared memory scratchpad (SMEM); fine-grained memory access patterns are not well supported. Finally, the GPU hardware is unaware of the pipeline parallelism expressed by the programmer, and is unable to take advantage of this information to make better decisions at runtime. In this paper we introduce WASP, hardware and compiler support for warp specialization that addresses these limitations. WASP enables fine-grained streaming and gather memory access patterns through the use of warp-level register file queues and hardware-accelerated address generation. Explicit warp to pipeline stage naming enables the GPU to be aware of pipeline parallelism, which WASP capitalizes on by designing pipeline-aware warp mapping, register allocation, and scheduling. Finally, we design and implement a compiler that can automatically generate warp specialized kernels, reducing programmer burden. Overall, we find that runtime performance can be improved on a variety of important applications by an average of 47% over a modern GPU baseline.

[x] https://arxiv.org/abs/2401.16677

Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed techniques can overlap, and thus, hide this communication with independent computations, techniques such as Tensor Parallelism (TP) inherently serialize communication with model execution. One approach to hide this serialized communication is to interleave it with the producer operation (of the communicated data) in a fine-grained manner. However, this fine-grained interleaving of communication and computation in software can be difficult. Furthermore, as with any concurrent execution, it requires compute and memory resources to be shared between computation and communication, causing resource contention that reduces overlapping efficacy.

To overcome these challenges, we propose T3 which applies hardware-software co-design to transparently overlap serialized communication while minimizing resource contention with compute. T3 transparently fuses producer operations with the subsequent communication via a simple configuration of the producer’s output address space and requires minor software changes. At the hardware level, T3 adds a lightweight track and trigger mechanism to orchestrate the producer’s compute, and communication. It further uses compute-enhanced memories for communication’s attendant compute. As a result, T3 reduces resource contention, and efficiently overlaps serialized communication with computation. For important Transformer models like T-NLG, T3 speeds up communication-heavy sublayers by 30% geomean (max 47%) and reduces data movement by 22% geomean (max 36%). Furthermore, T3’s benefits persist as models scale: geomean 29% for sublayers in ∼500-billion parameter models, PALM and MT-NLG.

[y] https://dl.acm.org/doi/10.1145/3617232.3624854

Modern storage devices, such as Optane NVMe SSDs, offer ultra-low latency of a few microseconds and high bandwidth of multiple gigabytes per second. At these speeds, the kernel software I/O stack is a substantial source of overhead. Userspace approaches avoid kernel software overheads but face challenges in supporting shared storage without major changes to file systems, the OS or the hardware.

We propose a new I/O architecture, BypassD, for fast, userspace access to shared storage devices. BypassD takes inspiration from virtual memory: it uses virtual addresses to access a device and relies on hardware for translation and protection. Like memory-mapping a file, the OS kernel constructs a mapping for file contents in the page table. Userspace I/O requests then use virtual addresses from these mappings to specify which file and file offset to access. BypassD extends the IOMMU hardware to translate file offsets into device Logical Block Addresses. Existing applications require no modifications to use BypassD. Our evaluation shows that BypassD reduces latency for 4KB accesses by 42% compared to standard Linux kernel and performs close to userspace techniques like SPDK that do not support device sharing. By eliminating software overheads, BypassD improves performance of real workloads, such as the WiredTiger storage engine, by ~20%.

[z] https://dl.acm.org/doi/10.1145/3650203.3663336

Recent research has shown the importance of tuning DBMS configuration knobs to achieve high performance. As a result, a large number of search-based and learning-based auto-tuning methods have been proposed. However, despite the promising results, we observe that (1) developing new auto-tuning methods, and (2) comparing them to existing methods is cumbersome and requires extensive engineering effort. This effort is compounded by the fact that auto-tuning methods typically need to be evaluated on multiple systems (PostgreSQL, MySQL, etc.) and benchmarks (TPC-C, TPC-H, etc.). In this paper we describe Nautilus, a platform that automates benchmarking for DBMS configuration tuning. Our insight in building Nautilus is that the process of developing auto-tuners can be separated into a tuning algorithm layer and a benchmarking layer. We focus on automating the benchmarking layer and thus make it easier for researchers to develop new tuning methods and compare with existing methods. We describe the design and implementation of Nautilus, highlighting its main features and advantages to the end-user. Furthermore, we are open-sourcing this project, so that the research community can freely use and benefit from it.

[aa] https://arxiv.org/abs/2312.12621

Deep Learning (DL) workloads have rapidly increased in popularity in enterprise clusters and several new cluster schedulers have been proposed in recent years to support these workloads. With rapidly evolving DL workloads, it is challenging to quickly prototype and compare scheduling policies across workloads. Further, as prior systems target different aspects of scheduling (resource allocation, placement, elasticity etc.), it is also challenging to combine these techniques and understand the overall benefits. To address these challenges we propose Blox, a modular toolkit which allows developers to compose individual components and realize diverse scheduling frameworks. We identify a set of core abstractions for DL scheduling, implement several existing schedulers using these abstractions, and verify the fidelity of these implementations by reproducing results from prior research. We also highlight how we can evaluate and compare existing schedulers in new settings: different workload traces, higher cluster load, change in DNN workloads and deployment characteristics. Finally, we showcase Blox’s extensibility by composing policies from different schedulers, and implementing novel policies with minimal code changes.

[ab] https://dl.acm.org/doi/10.1145/3639286

The scaling of per-GB DRAM cost has slowed down in recent years. Recent research has suggested that adding remote memory to a system can further reduce the overall memory cost while maintaining good performance. Remote memory (i.e., tiered memory), connected to host servers via high-speed interconnect protocols such as RDMA and CXL, is expected to deliver 100x (less than 1µs) lower latency than SSD and be more cost-effective than local DRAM through pooling or adopting cheaper memory technologies.

Tiered memory opens up a large number of potential use cases within database systems. But previous work has only explored limited ways of using tiered memory. Our study provides a systematic study for DBMS to build tiered memory buffer management with respect to a wide range of hardware performance characteristics. Specifically, we study five different indexing designs that leverage remote memory in different ways and evaluate them through a wide range of metrics including performance, tiered-memory latency sensitivity, and cost-effectiveness. In addition, we propose a new memory provisioning strategy that allocates an optimal amount of local and remote memory for a given workload. Our evaluations show that while some designs achieve higher performance than others, no design can win in all measured dimensions.

[ac] https://dl.acm.org/doi/10.1145/3613904.3642829

Autism Spectrum Disorder (ASD) presents challenges in social interaction skill development, particularly in turn-taking. Digital interventions offer potential solutions for improving autistic children’s social skills but often lack addressing specific collaboration techniques. Therefore, we designed a prototype of a turn-taking collaborative tablet game, StarRescue, which encourages children’s distinct collaborative roles and interdependence while progressively enhancing sharing and mutual planning skills. We further conducted a controlled study with 32 autistic children to evaluate StarRescue’s usability and potential effectiveness in improving their social skills. Findings indicated that StarRescue has great potential to foster turn-taking skills and social communication skills (e.g., prompting, negotiation, task allocation) within the game and also extend beyond the game. Additionally, we discussed implications for future work, such as including parents as game spectators and understanding autistic children’s territory awareness in collaboration. Our study contributes a promising digital intervention for autistic children’s turn-taking social skill development via a scaffolding approach and valuable design implications for future research.

[ad] https://arxiv.org/html/2402.12772v1

Reading is a challenging task for low vision people. While conventional low vision aids (e.g., magnification) offer certain support, they cannot fully address the difficulties faced by low vision users, such as locating the next line and distinguishing similar words. To fill this gap, we present GazePrompt, a gaze-aware reading aid that provides timely and targeted visual and audio augmentations based on users’ gaze behaviors. GazePrompt includes two key features: (1) a Line-Switching support that highlights the line a reader intends to read; and (2) a Difficult-Word support that magnifies or reads aloud a word that the reader hesitates with. Through a study with 13 low vision participants who performed well-controlled reading-aloud tasks with and without GazePrompt, we found that GazePrompt significantly reduced participants’ line switching time, reduced word recognition errors, and improved their subjective reading experiences. A follow-up silent-reading study showed that GazePrompt can enhance users’ concentration and perceived comprehension of the reading contents. We further derive design considerations for future gaze-based low vision aids.

[ae] https://arxiv.org/html/2402.07300v2

Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce Spica, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, Spica offers interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, Spica augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of Spica and explored user behaviors, preferences, and mental models when interacting with augmented ADs.

[af] https://arxiv.org/abs/2311.05511

We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no longer sufficient, we show that there exist optimal deterministic policies augmented with cumulative costs. In fact, we present a fixed-parameter tractable reduction from anytime-constrained cMDPs to unconstrained MDPs. Our reduction yields planning and learning algorithms that are time and sample-efficient for tabular cMDPs so long as the precision of the costs is logarithmic in the size of the cMDP. However, we also show that computing non-trivial approximately optimal policies is NP-hard in general. To circumvent this bottleneck, we design provable approximation algorithms that efficiently compute or learn an arbitrarily accurate approximately feasible policy with optimal value so long as the maximum supported cost is bounded by a polynomial in the cMDP or the absolute budget. Given our hardness results, our approximation guarantees are the best possible under worst-case analysis

July 2024

[1]  “FlowDroid: precise context, flow, field, object-sensitive and 

       lifecycle-aware taint analysis for Android apps”

Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel

PLDI ‘14

Most influential paper award: https://www.sigplan.org/Awards/PLDI/

Abstract: Today’s smartphones are a ubiquitous source of private and confidential data. At the same time, smartphone users are plagued by carelessly programmed apps that leak important data by accident, and by malicious apps that exploit their given privileges to copy such data intentionally. While existing static taint-analysis approaches have the potential of detecting such data leaks ahead of time, all approaches for Android use a number of coarse-grain approximations that can yield high numbers of missed leaks and false alarms.

In this work we thus present FlowDroid, a novel and highly precise static taint analysis for Android applications. A precise model of Android’s lifecycle allows the analysis to properly handle callbacks invoked by the Android framework, while context, flow, field and object-sensitivity allows the analysis to reduce the number of false alarms. Novel on-demand algorithms help FlowDroid maintain high efficiency and precision at the same time.

We also propose DroidBench, an open test suite for evaluating the effectiveness and accuracy of taint-analysis tools specifically for Android apps. As we show through a set of experiments using SecuriBench Micro, DroidBench, and a set of well-known Android test applications, FlowDroid finds a very high fraction of data leaks while keeping the rate of false positives low. On DroidBench, FlowDroid achieves 93% recall and 86% precision, greatly outperforming the commercial tools IBM AppScan Source and Fortify SCA. FlowDroid successfully finds leaks in a subset of 500 apps from Google Play and about 1,000 malware apps from the VirusShare project.

 

[2] ACM SIGGRAPH Seminal Graphics Papers

When we began planning how to celebrate 50 years of SIGGRAPH Conferences, there was unanimous agreement that one of the projects should be publishing a second volume of Seminal Graphics Papers. The first volume was published in 1998 as part of the celebration of the 25th SIGGRAPH conference. Seminal Graphics Papers Volume 2, perhaps more than any other activity undertaken in this milestone year, celebrates ACM SIGGRAPH’s origins and continued success as a Technical and Professional Society. This collection of papers typifies the ground-breaking research that has been the conference’s hallmark since 1974. A quick scan of the chapter and the paper titles shows just how far SIGGRAPH research has pushed the boundaries of our discipline and contributed to its evolution.

The ACM Digital Library team has been supportive of this Seminal Graphics Papers project from the beginning. I am pleased to let you know that both Volumes 1 and 2 of Seminal Graphics Papers are freely available from the ACM Digital Library at these URLs:

Volume 1: https://dl.acm.org/doi/book/10.1145/280811

Volume 2: https://dl.acm.org/doi/book/10.1145/3596711

“Content-Preserving Warps for 3D Video Stabilization”

Feng Liu, Michael Gleicher, Hailin Jin, and Aseem Agarwala

https://dl.acm.org/doi/10.1145/1531326.1531350

ACM Transactions on Graphics (TOG), Volume 28, Issue 3, July 2009

We describe a technique that transforms a video from a hand-held video camera so that it appears as if it were taken with a directed camera motion. Our method adjusts the video to appear as if it were taken from nearby viewpoints, allowing 3D camera movements to be simulated. By aiming only for perceptual plausibility, rather than accurate reconstruction, we are able to develop algorithms that can effectively recreate dynamic scenes from a single source video. Our technique first recovers the original 3D camera motion and a sparse set of 3D, static scene points using an off-the-shelf structure-from-motion system. Then, a desired camera path is computed either automatically (e.g., by fitting a linear or quadratic path) or interactively. Finally, our technique performs a least-squares optimization that computes a spatially-varying warp from each input video frame into an output frame. The warp is computed to both follow the sparse displacements suggested by the recovered 3D structure, and avoid deforming the content in the video frame. Our experiments on stabilizing challenging videos of dynamic scenes demonstrate the effectiveness of our technique.

“Motion Graphs”

Lucas Kovar, Michael Gleicher, and Frédéric Pighin

https://dl.acm.org/doi/10.1145/566654.566605

ACM Transactions on Graphics (TOG), Volume 21, Issue 3, July 2002

In this paper we present a novel method for creating realistic, controllable motion. Given a corpus of motion capture data, we automatically construct a directed graph called a motion graph that encapsulates connections among the database. The motion graph consists both of pieces of original motion and automatically generated transitions. Motion can be generated simply by building walks on the graph. We present a general framework for extracting particular graph walks that meet a user’s specifications. We then show how this framework can be applied to the specific problem of generating different styles of locomotion along arbitrary paths.

 

[3] https://research.wisc.edu/research-forward-round-4/

The Office of the Vice Chancellor for Research (OVCR) hosts the Research Forward initiative to stimulate and support highly innovative and groundbreaking research at the University of Wisconsin–Madison. The initiative is supported by the Wisconsin Alumni Research Foundation (WARF) and will provide funding for 1–2 years, depending on the needs and scope of the project.

Research Forward seeks to support collaborative, multidisciplinary, multi-investigator research projects that are high-risk, high-impact, and transformative. It seeks to fund research projects that have the potential to fundamentally transform a field of study as well as projects that require significant development prior to the submission of applications for external funding. Collaborative research proposals are welcome from within any of the four divisions (Arts & Humanities, Biological Sciences, Physical Sciences, Social Sciences), as are cross-divisional collaborations.

CS Projects are:

“Quanta sensing for next generation quantum computing”

PRINCIPAL INVESTIGATOR

Mohit Gupta, associate professor of computer sciences

CO-PRINCIPAL INVESTIGATORS

Mark Saffman, professor of physics

Swamit Tannu, assistant professor of computer sciences

Andreas Velten, associate professor of biostatistics and medical informatics, electrical and computer engineering

Abstract: Future quantum computers could open new scientific and engineering frontiers, impacting existential challenges like climate change. However, quantum information is delicate; it leaks with time and is prone to significant errors. These errors are exacerbated by imperfect reading and writing of quantum bits (qubits). These challenges fundamentally limit our ability to run quantum programs, and could hold back this powerful technology. Fast and accurate qubit readout, therefore, is essential for unlocking the quantum advantage. Current quantum computers use conventional cameras for reading qubits, which are inherently slow and noisy.

This research project will use quanta (single-photon) sensors for fast and accurate qubit readout. Quanta sensors detect individual photons scattered from qubits, thus enabling sensing qubits at 2-3 orders of magnitude higher speeds (few microseconds from ~10 milliseconds), thereby transforming the capabilities (speed, accuracy) of future quantum computers, and for the first time, paving the way for scalable and practical quantum computing.

“A novel deep learning platform in design molecular glues that stabilize protein-protein interactions”

PRINCIPAL INVESTIGATOR

Xuhui Huang, professor of chemistry

CO-PRINCIPAL INVESTIGATORS

Sharon Li, assistant professor of computer science

Weiping Tang, professor of pharmacy

Molecular glues can stabilize protein-protein interactions (PPIs) and have emerged as a promising class of therapeutics for many “undruggable” proteins. Deep-learning approaches show enormous potential for revolutionizing structure-based drug design. However, all deep learning methods require a large training dataset containing information on the binding pocket. Such information is often unavailable for molecular glues.

This research project will create a deep-learning platform, Chemical Feature Transformer (CFT), that learns from patterns in chemical environments (using known molecular glue-PPI structures) instead of complementary geometric shapes (like the Key-and-Lock model). To experimentally validate the predictions of the CFT, researchers will apply CFT to identify molecular glues for the degradation of cancer-associated proteins. It is expected that the deep-learning platform can greatly help other UW researchers identify novel molecular glues to tackle their disease-relevant targets. This project will provide the foundation for potentially establishing  a center of excellence in AI-driven molecular design.

“An AI Terrarium for understanding intermediated and personal communication”

PRINCIPAL INVESTIGATOR

Timothy Rogers, professor of psychology

CO-PRINCIPAL INVESTIGATORS

Robert Hawkins, assistant professor of psychology

Dhavan Shah, professor of journalism

Sijia Yang, assistant professor of journalism

Jerry Zhu, professor of computer science

 

[4] https://research.wisc.edu/professorships-and-faculty-fellowships/kellett-mid-career-award/

This award is comparable in competitiveness to the Romnes Faculty Fellowships and the WARF Professorships but is intended to recognize and support mid-career faculty, seven to twenty years past their first promotion to a tenured position. The Mid-Career Award was created to provide needed support and encouragement to faculty at a critical stage of their careers. This award cannot be awarded more than one time to any faculty member.

Winners: https://research.wisc.edu/professorships-and-faculty-fellowships/kellett-mid-career-award/past-winners-kellett-mid-career/

 

[5] https://research.wisc.edu/professorships-and-faculty-fellowships/h-i-romnes-faculty-fellowships/

This program, funded by WARF in recognition of the leadership of the late WARF Trustee President H. I. Romnes, is designed to bridge the gap between the Research Committee’s initial research support for new faculty and the Mid-Career Award for Faculty Research. This award is intended to recognize and support faculty up to SIX years past their first promotion to a tenured position. This award cannot be awarded more than one time to any faculty member.

Winners: 

https://research.wisc.edu/professorships-and-faculty-fellowships/h-i-romnes-faculty-fellowships/past-winners-h-i-romnes-faculty-fellowship/

The immediacy, reach and virality of contemporary communication technologies present an enormous societal opportunity and an escalating threat. Research efforts to understand these phenomena face significant challenges: field studies are complex to design, expensive to conduct, and raise concerns about user manipulation.

This research project leverages AI to power more realistic simulated personas that can interact via natural language and are tuned to update their beliefs in ways that mimic real human participants — an AI Terrarium. The AI agents will be engineered and fine-tuned to capture patterns of real communication and belief change observed in representative human populations collected using panel surveys and behavioral experiments. The behaviors of interacting “digital twins” – AI models matched to human participants in the survey and experimental collections–will then be used to simulate and understand patterns of communication and persuasion in the real world.

Once developed, the AI Terrarium will enable low-cost testing of different message campaigns, news feeds, or interaction rules. Promising strategies will be validated in carefully controlled experiments using lab-based networked messaging platforms. The ability to simulate social media interactions with human fidelity would permit rapid-cycle testing of intervention strategies on a variety of health and political issues, ranging from vaccine skepticism to electoral misinformation, and allow piloting prior to costly real-world trials.

 

[6] https://mlcommons.org/2024/02/call-for-ml-and-systems-rising-stars-2024/

The Rising Stars program provides a platform for talented young researchers working at the intersection of machine learning (ML) and systems to build connections with a vibrant research community, engage with both industry and academic experts, and develop their skills.

https://saurabh.dev/

Saurabh Agarwal

“I am a fifth year grad student at UW-Madison. Advised by Dimitris Papailiopoulos and Shivaram Venkataraman. My research interests are primarily in Systems for Machine Learning, especially around distributed training and inference of ML workloads. During my PhD I have been very fortunate to intern with Bilge Acun at FAIR, Amar Phanishayee at Microsoft Research and Yucheng Low at Apple.

When I am not being a grad student, I can be found racing keelboats on Lake Mendota or alpine skiing in the winters. I also double up as a sailing instructor at the UW-Madison’s Hoofers Sailing club”

 

[7] “Mechanism Design for Collaborative Normal Mean estimation”

Prof Kirthi Kandasamy

Talk at RAIN Seminar

https://rain.stanford.edu/

Abstract: With the rise in popularity of machine learning, data is becoming an increasingly valuable resource. While data collection is often costly, once collected, data can be freely replicated and used by many others. Hence, instead of simply collecting and learning from their own data, by sharing data with each other, agents can  reduce their own data collection costs and improve the utility they derive from data. Yet naive mechanisms for sharing data can lead to free-riding, where agents may attempt to benefit from the data that the others have collected without contributing any themselves. Free-riding can manifest in anti-social behavior, such as under-collecting data, or submitting fabricated datasets.

In this talk, I will present some of our recent work where we study these challenges in one of the most foundational statistical problems, normal mean estimation. Here, a set of strategic agents collect i.i.d samples from a normal distribution at a cost, and wish to estimate the mean of this distribution. To facilitate collaboration, we design mechanisms that incentivize agents to collect a sufficient amount of data and share it truthfully, so that they are all better off than working alone. Our approach prevents under-collection and data fabrication via two key techniques: first, when sharing the others’ data with an agent, the mechanism corrupts this dataset proportional to how much the data reported by the agent differs from the others; second, we design minimax optimal estimators for the corrupted dataset. Our mechanism, which is Nash incentive compatible and individually rational, achieves a social penalty (sum of all agents’ estimation errors and data collection costs) that is at most a factor 2 of the global minimum. In two special cases where we restrict the strategy space of the agents, we design mechanisms that essentially achieve the global minimum. When applied to high dimensional (non-Gaussian) distributions with bounded variance, our mechanisms retain these properties, but with slightly weaker results. 

“Data without Borders: Game-theoretic challenges in democratizing data”

Prof Kirthi Kandasamy

Talk at Midwest Machine Learning Symposium

https://midwest-ml.org/2024/

Abstract: Due to the popularity of machine learning, many organizations view data as an invaluable resource, likening it to the “new oil/gold”. However, unlike many types of resources, data is nonrivalrous: it can be freely replicated and used by many. Hence, data produced by one organization, can, in principle, generate limitless value to many others. This will accelerate economic, social, and scientific breakthroughs and benefit society at large. However, considerations of free-riding and competition may prevent such open sharing of data between organizations. An organization may be wary that others may not be contributing a sufficient amount of data, or contributing fabricated/poisoned datasets. Organizations may also wish to monetize the data they have for profit. In some recent work, we leverage ideas from game theory, market design, and robust statistics to design protocols for data sharing. Our methods incentivize organizations to collect and truthfully contribute large amounts of data, so that socially optimal outcomes can be achieved.

 

[8] “Data-Efficient Adaptation for Pretrained Decision-Making Models”

Prof Frederic Sala

Talk at Midwest Machine Learning Symposium

https://midwest-ml.org/2024/

Abstract: The use of pretrained models forms the major paradigm change in machine learning workflows this decade, including for decision making. These powerful and typically massive models have the promise to be used as a base for diverse applications. Unfortunately, it turns out that adapting these models for downstream tasks tends to be difficult and expensive, often requiring collecting and labeling additional data to further train or fine-tune the model. In this talk, I will describe my group’s work on addressing this challenge via efficient adaptation. First, when adapting vision-language models to make robust predictions, we show how to self-guide the adaptation process, without any additional data. Second, we show how to integrate relational structures like knowledge graphs into model prediction pipelines, enabling models to adapt to new domains unseen during training, without additional annotated examples. Lastly, in the most challenging scenarios, when the model must be fine-tuned on labeled data, we show how to obtain this data efficiently through techniques called weak supervision.

 

[9] “Characterizing and Harnessing Performance Variability in Accelerator-rich HPC systems ORNL”

Prof Matt Sinclair

Talk at NVIDIA

Abstract: Scientists are increasingly exploring and utilizing the massive parallelism of general-purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters, hyperscalers, national computing centers, and supercomputers have procured hardware to support this evolving application paradigm. These systems contain hundreds to tens of thousands of accelerators, enabling exascale levels of compute. Recent work demonstrated that power management (PM) can impact application performance in CPU-based HPC systems, even when machines have the same architecture and often the same SKU (stock keeping unit). This variation occurs due to manufacturing variability and the chip’s PM. However, while modern HPC systems widely employ accelerators such as GPUs, it is unclear how much this variability affects applications. In this talk, I will first discuss our work on characterizing the extent of variation due to GPU PM in modern HPC and supercomputing systems across five large-scale clusters: Oak Ridge’s Summit, Sandia’s Vortex, TACC’s Frontera and Longhorn, and Livermore’s Corona. Regardless of the application, cluster, GPU vendor, and cooling method, our results show significant variation,  highlighting the difficulty in efficiently using existing GPU clusters for modern HPC and ML workloads, and the need to embrace variability in future accelerator-based systems.  Second, I will discuss our efforts to embrace this variability in current systems.  Specifically, we design a novel cluster scheduler , PAL, which uses performance variability measurements and application-specific profiles to improve job performance and resource utilization. PAL also balances performance variability with locality to ensure jobs are spread across as few nodes as possible.  Time permitting, I will also discuss our plans to embrace performance variability in future systems.

“GEMMs: (One of) the Building Blocks of Fast Machine Learning”
Prof Matt Sinclair

Abstract: Machine learning (ML) has transformed society with significant accuracy improvements for tasks. This tremendous, transformative effect has been enabled by a virtuous synergy of (1) better hardware systems, (2) larger datasets, and (3) improved ML structures and algorithms that further benefit from more efficient hardware and larger datasets.  Moreover, while ML relies on a number of important primitives, in this talk I will focus on one of them: General Matrix Multiplies (GEMMs).  Specifically, I will discuss why GEMMs are well suited to running some important ML layers, and introduce interactive examples that iteratively improve performance and increase GEMM code complexity for both CPU and GPU programs, before finally discussing how CPU and GPU BLAS libraries provide high performance GEMM implementations without requiring users to write complex code.

 

[a] “Shadow Filesystems: Recovering from Filesystem Runtime Errors via Robust Alternative Execution”

Jing Liu, Xiangpeng Hao, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tej Chajed

HotStorage ‘24

We present Robust Alternative Execution (RAE), an approach to transparently mask runtime errors in performance-oriented filesystems via temporarily executing an alternative shadow filesystem. A shadow filesystem has the primary goal of robustness, achieved through a simple implementation without performance optimizations and concurrency while adhering to the same API and on-disk formats as the base filesystem it enhances. While the base performance-oriented filesystem may contain bugs, the shadow implementation is formally verified, leveraging advancements in the verification of low-level systems code. In the common case, the base filesystem executes and delivers high performance to applications; however, when a bug is triggered, the slow-but-correct shadow takes over, updates state correctly, and then resumes the base, thus providing high availability.

https://dl.acm.org/doi/10.1145/3655038.3665942

[b] “Homebrew: Optical Polarization Change Detection for Ground Motion Sensing”

Joseph Catudal; Zhenhao Zhou; Weijun Pan; Paul Barford; Dante Fratta; Herb Wang

2024 Optical Fiber Communications Conference and Exhibition (OFC)

We examine laser light polarization measurements using our own novel polarimeter design for ground motion sensing and show that efficacy is highly dependent on the coupling of fiber routes to vibration sources.

https://ieeexplore.ieee.org/document/10527030

[c] “DBSP: Incremental Computation on Streams and Its Applications to Databases”

Authors: Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen

ACM SIGMOD Record

We describe DBSP, a framework for incremental computation. Incremental computations repeatedly evaluate a function on some input values that are “changing”. The goal of an efficient implementation is to “reuse” previously computed results. Ideally, when presented with a new change to the input, an incremental computation should only perform work proportional to the size of the changes of the input, rather than to the size of the entire dataset.

https://dl.acm.org/doi/10.1145/3665252.3665271

[d] “Learned Load Balancing”

Brian Chang, Aditya Akella, Loris D’Antoni, and Kausik Subramanian

Theoretical Computer Science, July ‘24

Effectively balancing traffic in datacenter networks is a crucial operational goal. Most existing load balancing approaches are handcrafted to the structure of the network and/or network workloads. Thus, new load balancing strategies are required if the underlying network conditions change, e.g., due to hard or grey failures, network topology evolution, or workload shifts. While we can theoretically derive the optimal load balancing strategy by solving an optimization problem given certain traffic and topology conditions, these problems take too much time to solve and makes the derived solution stale to deploy. In this paper, we describe a load balancing scheme Learned Load Balancing (LLB), which is a general approach to finding an optimal load balancing strategy for a given network topology and workload, and is fast enough in practice to deploy the inferred strategies. LLB uses deep supervised learning techniques to learn how to handle different traffic patterns and topology changes, and adapts to any failures in the underlying network. LLB leverages emerging trends in network telemetry, programmable switching, and “smart” NICs. Our experiments show that LLB performs well under failures and can be expanded to more complex, multi-layered network topologies. We also prototype neural network inference on smartNICs to demonstrate the workability of LLB.

[e] “Super Non-singular Decompositions of Polynomials and their Application to Robustly Learning Low-degree PTFs”

Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Sihan Liu, Nikos Zarifis

STOC ‘24

We study the efficient learnability of low-degree polynomial threshold functions (PTFs) in the presence of a constant fraction of adversarial corruptions. Our main algorithmic result is a polynomial-time PAC learning algorithm for this concept class in the strong contamination model under the Gaussian distribution with error guarantee Od,c(opt^(1−c)), for any desired constant c>0, where opt is the fraction of corruptions. In the strong contamination model, an omniscient adversary can arbitrarily corrupt an opt-fraction of the data points and their labels. This model generalizes the malicious noise model and the adversarial label noise model. Prior to our work, known polynomial-time algorithms in this corruption model (or even in the weaker adversarial label noise model) achieved error Õd(opt^(1/(d+1))), which deteriorates significantly as a function of the degree d.

Our algorithm employs an iterative approach inspired by localization techniques previously used in the context of learning linear threshold functions. Specifically, we use a robust perceptron algorithm to compute a good partial classifier and then iterate on the unclassified points. In order to achieve this, we need to take a set defined by a number of polynomial inequalities and partition it into several well-behaved subsets. To this end, we develop new polynomial decomposition techniques that may be of independent interest.

“Testing Closeness of Multivariate Distributions via Ramsey Theory”

Ilias Diakonikolas, Daniel M. Kane, Sihan Liu

STOC ‘24

We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions. Specifically, given sample access to two unknown distributions p,q on ℝd, we want to distinguish between the case that p=q versus ‖p−q‖Ak>ϵ, where ‖p−q‖Ak denotes the generalized Ak distance between p and q — measuring the maximum discrepancy between the distributions over any collection of k disjoint, axis-aligned rectangles. Our main result is the first closeness tester for this problem with {\em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound.

In more detail, we provide a computationally efficient closeness tester with sample complexity O((k6/7/polyd(ϵ))logd(k)). On the lower bound side, we establish a qualitatively matching sample complexity lower bound of Ω(k6/7/poly(ϵ)), even for d=2. These sample complexity bounds are surprising because the sample complexity of the problem in the univariate setting is Θ(k4/5/poly(ϵ)). This has the interesting consequence that the jump from one to two dimensions leads to a substantial increase in sample complexity, while increases beyond that do not.

As a corollary of our general Ak tester, we obtain dTV-closeness testers for pairs of k-histograms on ℝd over a common unknown partition, and pairs of uniform distributions supported on the union of k unknown disjoint axis-aligned rectangles.

Both our algorithm and our lower bound make essential use of tools from Ramsey theory.

[f] “Technical Perspective: Unicorn: A Unified Multi-Tasking Matching Model”

AnHai Doan

SIGMOD Record

Data integration has been a long-standing challenge for data management. It has recently received significant attention due to at least three main reasons. First, many data science projects require integrating data from disparate sources before analysis can be carried out to extract insights. Second, many organizations want to build knowledge graphs, such as Customer 360s, Product 360s, and Supplier 360s, which capture all available information about the customers, products, and suppliers of an organization. Building such knowledge graphs often requires integrating data from multiple sources. Finally, there is also an increasing need to integrate a massive amount of data to create training data for AI models, such as large language models.

https://dl.acm.org/doi/10.1145/3665252.3665262

[g] 

“Bandit Profit Maximization for Targeted Marketing”

Joon Suk Huh, Ellen Vitercik, Kirthevasan Kandasamy.

ACM Conference on Economics and Computation (EC) 2024

We study a sequential profit-maximization problem, optimizing for both price and ancillary variables like marketing expenditures. Specifically, we aim to maximize profit over an arbitrary sequence of multiple demand curves, each dependent on a distinct ancillary variable, but sharing the same price. A prototypical example is targeted marketing, where a firm (seller) wishes to sell a product over multiple markets. The firm may invest different marketing expenditures for different markets to optimize customer acquisition, but must maintain the same price across all markets. Moreover, markets may have heterogeneous demand curves, each responding to prices and marketing expenditures differently. The firm’s objective is to maximize its gross profit, the total revenue minus marketing costs. Our results are near-optimal algorithms for this class of problems in an adversarial bandit setting, where demand curves are arbitrary non-adaptive sequences, and the firm observes only noisy evaluations of chosen points on the demand curves. We prove a regret upper bound of O(nT 3/4) and a lower bound of Ω ((nT) 3/4) for monotonic demand curves, and a regret bound of Θ(nT 2/3) for demands curves that are monotonic in price and concave in the ancillary variable

https://arxiv.org/html/2403.01361v1

“Nash Incentive-compatible Online Mechanism Learning via Weakly Differentially Private Online Learning”

Joon Suk Huh, Kirthevasan Kandasamy.

International Conference on Machine Learning (ICML) 2024

We study a multi-round mechanism design prob- lem, where we interact with a set of agents over a sequence of rounds. We wish to design an incentive-compatible (IC) online learning scheme to maximize an application-specific objective within a given class of mechanisms, without prior knowledge of the agents’ type distributions. Even if each mechanism in this class is IC in a single round, if an algorithm naively chooses from this class on each round, the entire learning process may not be IC against non-myopic buyers who appear over multiple rounds. On each round, our method randomly chooses between the rec- ommendation of a weakly differentially private online learning algorithm (e.g., Hedge), and a commitment mechanism which penalizes non- truthful behavior. Our method is IC and achieves O(T^{1+h/2} ) regret for the application-specific objective in an adversarial setting, where h quantifies the long-sightedness of the agents. When com- pared to prior work, our approach is conceptually simpler, it applies to general mechanism design problems (beyond auctions), and its regret scales gracefully with the size of the mechanism class.

https://openreview.net/pdf?id=QQkK6YH0Th

[h] “Enabling Tabular Data Exploration for Blind and Low-Vision Users”

Yanan Wang, Arjun Srinivasan, and Yea-Seul Kim

DIS ‘24

In a data-driven society, being able to examine data on one’s own terms is crucial for various aspects of well-being. However, current data exploration paradigms, such as Exploratory Data Analysis (EDA), heavily rely on visualizations to unveil patterns and insights. This visual-centric approach poses significant challenges for blind and low-vision (BLV) individuals. To address this gap, we built a prototype that supports non-visual data exploration and conducted an observational user study involving 18 BLV participants. Participants were asked to conduct various analytical tasks, as well as free exploration of provided datasets. The study findings provide insights into the factors influencing inefficient data exploration and characterizations of BLV participants’ analytical behaviors. We conclude by highlighting future avenues of research for the design of data exploration tools for BLV users.

https://dl.acm.org/doi/10.1145/3643834.3661609

https://pages.cs.wisc.edu/~yeaseulkim/assets/papers/2024_table.pdf

[i] “How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection?”

Soumya Suvra Ghosal, Yiyou Sun, Yixuan Li

AAAI ‘24

Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances in OOD detection rely on distance measures to distinguish samples that are relatively far away from the in-distribution (ID) data. Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high-dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. In training, our method regularizes the model and its feature representation by leveraging the most relevant subset of dimensions (i.e. subspace). Subspace learning yields highly distinguishable distance measures between ID and OOD data. We provide comprehensive experiments and ablations to validate the efficacy of SNN. Compared to the current best distance-based method, SNN reduces the average FPR95 by 15.96% on the CIFAR-100 benchmark.

[j]  “Tangible Scenography as a Holistic Design Method for Human-Robot Interaction”

DIS ‘24

Amy Koike, Bengisu Cagiltay, Bilge Mutlu

Traditional approaches to human-robot interaction design typically examine robot behaviors in controlled environments and narrow tasks. These methods are impractical for designing robots that interact with diverse user groups in complex human environments. Drawing from the field of theater, we present the construct of scenes — individual environments consisting of specific people, objects, spatial arrangements, and social norms — and tangible scenography, as a holistic design approach for human-robot interactions. We created a design tool, Tangible Scenography Kit (TaSK), with physical props to aid in design brainstorming. We conducted design sessions with eight professional designers to generate exploratory designs. Designers used tangible scenography and TaSK components to create multiple scenes with specific interaction goals, characterize each scene’s social environment, and design scene-specific robot behaviors. From these sessions, we found that this method can encourage designers to think beyond a robot’s narrow capabilities and consider how they can facilitate complex social interactions.

https://arxiv.org/abs/2405.19449

“The AI-DEC: A Card-based Design Method for User-centered AI Explanations”

DIS ‘24

Christine P Lee, Min Kyung Lee, Bilge Mutlu

Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four dimensions of AI explanations that are critical for the integration of AI systems — communication content, modality, frequency, and direction — and offers design examples for end-users to design AI explanations that meet their needs. We evaluated this method through co-design sessions with workers in healthcare, finance, and management industries who regularly use AI systems in their daily work. Findings indicate that the AI-DEC effectively supported workers in designing explanations that accommodated diverse levels of performance and autonomy needs, which varied depending on the AI system’s workplace role and worker values. We discuss the implications of using the AI-DEC for the user-centered design of AI explanations in real-world systems.

https://arxiv.org/abs/2405.16711

“REX: Designing User-centered Repair and Explanations to Address Robot Failures”

DIS ‘24

Christine P Lee, Pragathi Praveena, and Bilge Mutlu

Robots in real-world environments continuously engage with multiple users and encounter changes that lead to unexpected conflicts in fulfilling user requests. Recent technical advancements (e.g., large-language models (LLMs), program synthesis) offer various methods for automatically generating repair plans that address such conflicts. In this work, we understand how automated repair and explanations can be designed to improve user experience with robot failures through two user studies. In our first, online study (n = 162), users expressed increased trust, satisfaction, and utility with the robot performing automated repair and explanations. However, we also identified risk factors—safety, privacy, and complexity—that require adaptive repair strategies. The second, in-person study (n = 24) elucidated distinct repair and explanation strategies depending on the level of risk severity and type. Using a design-based approach, we explore automated repair with explanations as a solution for robots to handle conflicts and failures, complemented by adaptive strategies for risk factors. Finally, we discuss the implications of incorporating such strategies into robot designs to achieve seamless operation among changing user needs and environments.

https://dl.acm.org/doi/10.1145/3643834.3661559

“Understanding On-the-Fly End-User Robot Programming”

DIS ‘24

Laura Stegner, Yuna Hwang, David Porfirio, Bilge Mutlu

Novel end-user programming (EUP) tools enable on-the-fly (i.e., spontaneous, easy, and rapid) creation of interactions with robotic systems. These tools are expected to empower users in determining system behavior, although very little is understood about how end users perceive, experience, and use these systems. In this paper, we seek to address this gap by investigating end-user experience with on-the-fly robot EUP. We trained 21 end users to use an existing on-the-fly EUP tool, asked them to create robot interactions for four scenarios, and assessed their overall experience. Our findings provide insight into how these systems should be designed to better support end-user experience with on-the-fly EUP, focusing on user interaction with an automatic program synthesizer that resolves imprecise user input, the use of multimodal inputs to express user intent, and the general process of programming a robot.

https://arxiv.org/abs/2406.00841

[k] “Does Compressing Activations Help Model Parallel Training?”

Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

MLSys ‘24

Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression in a model-parallel setting is an understudied area. We have discovered that model parallelism has fundamentally different characteristics than data parallelism. In this work, we present the first empirical study on the effectiveness of compression methods for model parallelism. We implement and evaluate three common classes of compression algorithms – pruning-based, learning-based, and quantization-based – using a popular Transformer training framework. We evaluate these methods across more than 160 settings and 8 popular datasets, taking into account different hyperparameters, hardware, and both fine-tuning and pre-training stages. We also provide analysis when the model is scaled up. Finally, we provide insights for future development of model parallelism compression algorithms.

https://proceedings.mlsys.org/paper_files/paper/2024/hash/71381211d0abef73ed1887b83c4547b1-Abstract-Conference.html

[l] “‘This really lets us see the entire world:’  Designing a conversational telepresence robot for homebound older adults”

Yaxin Hu, Laura Stegner, Yasmine Kotturi, Caroline Zhang, Yi-Hao Peng, Faria Huq, Yuhang Zhao, Jeffrey P. Bigham, Bilge Mutlu

DIS ‘24

In this paper, we explore the design and use of conversational telepresence robots to help homebound older adults interact with the external world. An initial needfinding study (N=8) using video vignettes revealed older adults’ experiential needs for robot-mediated remote experiences such as exploration, reminiscence and social participation. We then designed a prototype system to support these goals and conducted a technology probe study (N=11) to garner a deeper understanding of user preferences for remote experiences. The study revealed user interactive patterns in each desired experience, highlighting the need of robot guidance, social engagements with the robot and the remote bystanders. Our work identifies a novel design space where conversational telepresence robots can be used to foster meaningful interactions in the remote physical environment. We offer design insights into the robot’s proactive role in providing guidance and using dialogue to create personalized, contextualized and meaningful experiences.

https://arxiv.org/abs/2405.15037

 

August 2024

[1] https://www.siam.org/publications/siam-news/articles/2024-august-prize-spotlight/#Wright

Dr. Stephen Wright, University of Wisconsin – Madison, is the 2024 recipient of the George B. Dantzig Prize. He received the prize for his fundamental contributions to nonlinear optimization and algorithms for control, compressed sensing, machine learning, and data science. He pioneered infeasible interior point methods which culminated in his 1997 SIAM monograph on the subject. Moreover, Dr. Wright contributed highly cited, outstanding, and very influential work in a broad range of fields in mathematical optimization, including algorithms for control, nonsmooth optimization with applications to compressed sensing, machine learning, and data science. His comprehensive contributions range from theory, algorithm design and analysis, to applications and the development of high-impact software.

Read more at the URL above.

[2] “Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture”,

Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore

https://ieeexplore.ieee.org/document/1207019

Original Publication: ISCA 2003. 

Award Details: The ACM SIGARCH/IEEE-CS TCCA Influential ISCA Paper Award is presented annually at the International Symposium on Computer Architecture. It honors the paper from the ISCA Proceedings published 20 years prior that has had the greatest impact on the field, whether through research, development, products, or ideas. The award includes a $1,000 honorarium for the authors and a certificate.

https://www.sigarch.org/benefit/awards/acm-sigarchieee-cs-tcca-influential-isca-paper-award/

Abstract: We describe the polymorphous TRIPS architecture, which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and large-grain concurrency, the TRIPS architecture contains four out-of-order, 16-wide-issue grid processor cores, which can be partitioned when easily extractable fine-grained parallelism exists. This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism. Our results show that high performance can be obtained in each of the three modes-ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.

[3] “Active Adaptation for Decentralized Foundation Models”

Prof Fred Sala

DARPA Young Faculty Award 

Abstract: Foundation models have revolutionized the way that machine learning and data science tasks are performed. While exciting, these models require large amounts of data, both for creating them and for specializing them to satisfy user needs. This makes these models unsuitable for data-constrained scenarios. Model training and adaptation also impose heavy computational burdens, blocking the use of foundation models in challenging decentralized environments. This award studies how to create new forms of foundation models that overcome these obstacles. It aims to build these models with less data by focusing on data quality rather than quantity, to train them more efficiently, and to enable their use to power decentralized agents that face communication and other constraints.

More about the DARPA Young Faculty Award here:

https://www.darpa.mil/work-with-us/for-universities/young-faculty-award

The objective of the DARPA Young Faculty Award (YFA) program is to identify and engage researchers in junior faculty positions at U.S. academic and non-profit research institutions and expose them to Department of Defense (DoD) needs and DARPA’s program development process.

[4] This week NSF awarded supplements to two awards – “Pelican: Advancing the Open Science Data Federation Platform” (Bockelman PI and Livny co-PI)  and “Partnership to Advance Throughput Computing (PATh)” (Livny PI, Bockelman and Tannenbaum co-PIs) – for a total of ~$1.2M.  One of these supplements involves Tony Gitter and is in the context of the National AI Research Resource (NAIRR). 

[5] “Anvil: Verifying Liveness of Cluster Management Controllers”

Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, and Zicheng Ma, University of Illinois Urbana-Champaign; Tej Chajed, University of Wisconsin-Madison; Jon Howell, Andrea Lattuada, and Oded Padon, VMware Research; Lalith Suresh, Feldera; Adriana Szekeres, VMware Research; Tianyin Xu, University of Illinois Urbana-Champaign

https://www.usenix.org/conference/osdi24/presentation/sun-xudong

Awarded Best Paper!

Abstract: Modern clouds depend crucially on an extensible ecosystem of thousands of controllers, each managing critical systems (e.g., a ZooKeeper cluster). A controller continuously reconciles the current state of the system to a desired state according to a declarative description. However, controllers have bugs that make them never achieve the desired state, due to concurrency, asynchrony, and failures; there are cases where after an inopportune failure, a controller can make no further progress. Formal verification is promising for avoiding bugs in distributed systems, but most work so far focused on safety, whereas reconciliation is fundamentally not a safety property.

This paper develops the first tool to apply formal verification to the problem of controller correctness, with a general specification we call eventually stable reconciliation, written as a concise temporal logic liveness property. We present Anvil, a framework for developing controller implementations in Rust and verifying that the controllers correctly implement eventually stable reconciliation. We use Anvil to verify three Kubernetes controllers for managing ZooKeeper, RabbitMQ, and FluentBit, which can readily be deployed in Kubernetes platforms and are comparable in terms of features and performance to widely used unverified controllers.

 

[6] https://caecommunity.org/about-us/what-cae-cybersecurity

PhD students are invited to this event:

https://www.caecommunity.org/news/research-poster-and-phd-student-nominations-2024-cae-r-cop-research-symposium

[7] https://2024.robocup.org/

This past week some UW-Madison students and Prof Hanna participated in the international RoboCup competition in Eindhoven, Netherlands as part of a new joint team between UW-Madison and UT Austin. They participated in the standard platform league in which all teams use the same robot platform to emphasize AI research. As a new team, they chose to participate in the “Challenge Shield” (lower) division of the competition — and they won! 

The UW-Madison students who traveled were PhD students Adam Labiosa and Will Cong and junior student Chen Li. Several other UW-Madison undergraduates and another PhD student contributed beforehand, in addition to some graduate students from UT Austin

The final match is here with the UW team in red and white: https://www.youtube.com/watch?v=JDZ6wihyELQ#t=2h14m14s
(a convincing 7-0 victory)

 

The most exciting games is linked here (robots wore UT colors for this one):
https://www.youtube.com/live/rUyYoW4jcfc#t=6h13m10s
(a come-from-behind victory in the 2nd half, with a little bit of luck involved)

This is actually the second time Prof Hanna participated in RoboCup with Wisconsin students. Last year, the team finished third in the same division with the aim to largely base our decision-making on reinforcement learning. UT’s team was interested in adopting the same approach which led to the joint effort this year.

 

[8] Title: CAREER: Rethinking Rack-Scale System Design over Memory Fabrics

Prof Ming Liu

Abstract: Memory fabrics, an emerging and powerful cluster interconnect technology, are revolutionizing the construction of the modern data center. Albeit promising, this infrastructure shift fundamentally changes decade-long system design assumptions and challenges conventional wisdom on how to build efficient rack-scale systems. This project aims to develop a new computing paradigm that views the memory fabric as a first-class citizen on which to instantiate, orchestrate, and reclaim computations. The project’s novelties are that a memory fabric-aware intermediate system stack allows applications to effectively harness the capabilities of memory fabrics and hide their limitations. The project’s broader significance and importance are laying out the system foundation for building next-generation sustainable and cost-efficient computing infrastructures for enterprise on-premise and cloud-scale data center systems using memory fabrics. This project introduces the technology to a broader audience via the LegoCluster program and BigComputer workshop education activities. 

This project rethinks the rack-scale system design from memory, computing, and communication perspectives. The first thrust classifies remote memory nodes into four categories and develops a benchmarking framework to characterize their performance. It builds an active remote memory system that transparently and dynamically moves a memory object to a suitable memory node at runtime based on the object’s access profile and the underlying node’s capabilities. The second thrust applies idempotent tasks as the fault-tolerant computation abstraction and develops a language system to generate them. It designs hardware cooperative engines and an execution framework on which to schedule idempotent tasks. The third thrust introduces new communication primitives, translates them into the memory fabric’s commands, and builds a transport layer for end-to-end traffic control. Together, these efforts produce a new open-source software system over memory fabrics encompassing APIs, libraries, programming systems, and software runtimes.

Link: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2339755&HistoricalAwards=false

 

[9] https://chtc.cs.wisc.edu/htc-24-event.html

CHTC and the OSG Consortium hosted its second annual Throughput Computing Week in Madison, Wisconsin joined in person and remotely by 388 participants. This year’s themes included dedicated sessions on campus cyberinfrastructure, talks on AI and machine learning enabled by high throughput computing, and tutorials and presentations on the new Pelican Platform project. 

[10] 

https://cacm.acm.org/blogcacm/confuseds-strategists-and-snoozers-a-whimsical-odyssey-through-the-maze-of-academic-reviews/

Navigating the labyrinthine corridors of academic publishing, authors often encounter three formidable guardians, each with their unique, sometimes perplexing, approach to gatekeeping. These are the Confused Reviewer, the Rational Reviewer, and the Lazy Reviewer. Their presence not only adds a layer of complexity to the publication process but also infuses it with a certain tragicomic flair.

Read more at the link above.

[11] https://awards.advising.wisc.edu/all-scholarships/hilldale-undergraduatefaculty-research-fellowship/

The Hilldale Undergraduate/Faculty Research Fellowship provides research training and support to undergraduates. Students have the opportunity to undertake their own research project in collaboration with UW–Madison faculty or research/instructional academic staff. Please Note: Graduate students are ineligible to serve as the project advisor. Approximately 97 – 100 Hilldale awards are available each year.

From CS:

  • Professor Remzi Arpaci-Dusseau (students Yuanzhuo Yang, Hongtao Zhang)
  • Professor Jelena Diakonikolas (student Yi Wei)
  • Professor Shivaram Venkataraman (student Mengze Tang)

 

[12] “Simulating and Modeling Hardware for Machine Learning Workloads at Scale”

Prof Matt Sinclair

SRC AIHW PI meeting

Abstract: As both the size of Machine Learning (ML) and Artificial Intelligence (AI) hardware and the data they operate on scale-up, there is a growing challenge in performing early-system exploration based on sound simulation methodology. Using existing tools and infrastructure, a detailed simulation of a contemporary deep neural network can take years. In addition, the difficult task of modeling energy at scale, without working RTL for a newly proposed accelerator, or accelerator component (i.e., Tensor Cores), remains an important research challenge. In particular modeling energy in a way that is flexible and can be extended to both different types of programmable accelerators (like GPUs) and to more bespoke discrete accelerators (like TPUs) is needed.  The broad goals of this proposal are to (1) reduce detailed simulation time, particularly on ML training and inference workloads, by characterizing execution in silicon, validated with detailed simulation, to find representative program slices, (2) develop a flexible energy model for accelerators, and (3) develop infrastructure to model the multi-GPU and multi-accelerator system of the future. To tackle these problems, we propose to extend existing simulation infrastructure, such as gem5, to create a scalable, parallel simulation framework that models diverse system components with varying levels of detail. In this talk, I discuss our on-going efforts to identify the most important parts of the applications to simulate at high fidelity while reducing simulation time.

“Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems”

Prof Matt Sinclair

University of Connecticut CS Seminar

Abstract: In recent years, system designers have increasingly been turning to heterogeneous systems to improve performance and energy efficiency.  Specialized accelerators are frequently used to improve the efficiency of computations that run inefficiently on conventional, general-purpose processors. As a result, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly using large numbers of accelerators (including GPUs) while providing better efficiency than CPU-based solutions.  In particular, GPUs are widely used in these systems due to their combination of programmability and efficiency.  Traditionally, GPUs are throughput-oriented, focused on data parallelism, and assume synchronization happens at a coarse granularity.  However, programmers have begun using these systems for a wider variety of applications which exhibit different characteristics, including latency-sensitivity, mixes of both task and data parallelism, and fine-grained synchronization.  Thus, future heterogeneous systems must evolve and make deadline-aware scheduling, more intelligent data movement, efficient fine-grained synchronization, and effective power management first-order design constraints.  In the first part of this talk, I will discuss our efforts to apply hardware-software co-design to help future heterogeneous systems overcome these challenges and improve performance, energy efficiency, and scalability.  Then, in the second part I will discuss how the on-going transition to chiplet-based heterogeneous systems exacerbates these challenges and how we address these challenges in chiplet-based heterogeneous systems by rethinking the control plane.

 

[13] “DPU-Centric Disaggregated Storage”

Prof Ming Liu

Grant from Western Digital

Abstract: Storage disaggregation has gained significant traction and is being widely deployed. Lately, with the advent of high-bandwidth server interconnects and new NVMe form factor, commodity storage appliances are becoming denser and able to deliver tens of millions of IOPS. To fully unleash the I/O capability, the server processor has to spend tremendous CPU cycles on I/O processing, leaving little computing headroom available to other co-located applications. This project aims to build a DPU-centric storage stack that manages, schedules, and processes I/O-induced tasks with minimal CPU involvement.

[14] https://www.nsf.gov/awardsearch/showAward?AWD_ID=2431419&HistoricalAwards=false

Cloud technologies underpin many of the apps and computing services that we use every day, including social networking, online commerce, email, video conferencing, and many more. To be able to do research that drives forward the fundamental architecture of the cloud, scientists need an environment in which they can experiment with the primary building blocks of the cloud: compute power, storage, and networks. CloudLab provides an environment where researchers can build their own clouds, observing and changing how they operate all the way to the bottom layers of software.

CloudLab is a large facility, consisting, as of the start of this award, of more than 1,800 servers hosted at the University of Utah, Clemson University, and the University of Wisconsin-Madison. These servers include cutting-edge compute (such as several different processor architectures, graphics processing units, and field programmable gate arrays), networking (OpenFlow, user-programmable switching), and storage (hard drives, solid state drives) technologies. This phase IV award will support the expansion of this facility by several hundred servers to meet high demand. It will also add new technologies, such as Internet of Things devices, programmable network cards, and non-volatile random access memory, which will in turn support research on data-intensive computing and computing at the edge of the network.

CloudLab is available to researchers who work on the fundamentals of cloud computing. These users do research in areas such as networking, security, and databases; in turn, this work has broad impact, as these are the fundamental technologies upon which we build things such as smart cities, telehealth, online education, and other socially important computer services. Because it is offered at no cost for research and education to academic users, it acts as a level playing field on which institutions large and small, and those with many resources and those with few, can conduct work on an equal footing.

CloudLab can be found online at https://cloudlab.us/, which is the primary interface through which research users interact with the facility, making it accessible from anywhere. This award will support continued operation of the facility, including the aforementioned hardware expansion, development of new features, and user support.

This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.

 

[15] https://s2024.siggraph.org/program/emerging-technologies/

BEST IN SHOW

Congratulations to “A Live Demo of Single-photon Imaging and Applications” (Sacha Jungerman, Varun Sundar, Mohit Gupta), recipient of the SIGGRAPH 2024 Emerging Technologies Best in Show Award!

About A Live Demo of Single-photon Imaging and Applications

Single-photon sensors are a rapidly evolving imaging technology featuring the unique ability of sensing light at its finest possible granularity, viz., at the photon level. Our demo will exhibit several exciting capabilities of single-photon sensors, which is made possible by an interplay of novel algorithms and recently developed, large-format sensor arrays.

More info: https://wisionlab.com/project/single-photon-imaging-at-siggraph/

 

[16] Reactions from colleagues on the Internet had regarding the announcement of the retirement of our esteemed colleague Tom Reps. Thank you Prof Loris D’Antoni for collecting these.

Satnam Singh

I sadly don’t know Prof. Reps well, but I did have the honour of presenting the ACM SIGPLAN 2017 Programming Languages Achievement Award to him in person. The process of selecting him for this highly distinguished award made me read much about his research contributions and industrial impact. One thing I wrote in the citation for the award was “A common thread in all of Tom’s research is that it provides elegant solutions to deep foundational problems”, which is amazing, and not easily achieved. Congratulations, Prof. Reps.

Manu Sridharan

A truly astounding career.  Tom’s work has influenced mine profoundly.  I still learn something new from his papers every time I read them.

Rajeev Alur

Congratulations Tom for all your contributions. I would recommend “program analysis via graph reachability” as an exemplar of great PL research!

Swarat Chaudhuri

One of the most inspiring scholars I have met in my life. I am honored to work with Tom.

Jeff Dean

Congrats to Tom on his retirement!  I have fond memories of reading this gem of a paper when I was in grad school on “Interprocedural slicing using dependence graphs” that Susan Horowitz, he and David Binkley wrote.

Mooly Sagiv

I still remember skiing with Susan and you, Tom.  Susan’s 40th birthday, our collaboration in Copenhagen. Thanksgiving with you guys.

Isil Dillig

Kudos to Tom and best wishes for a fulfilling retirement! And what a wonderful way to commemorate Susan and all her accomplishments

Wei Yang

One of the first research papers I have read in my undergraduate years (15 years ago) is the IFDS framework by Prof. Reps. Later, my ICSE’ 15 and ICSE’ 18 papers are all developed based on it (I suppose most of interprocedural static analysis papers are based on IFDS).

Thomas Gilray

Always loved Rep’s comments on his “POPL drought”: https://cs.cmu.edu/~popl-interviews/reps.html

Those striving and struggling to publish at the top PL venues should really check it out for a dose of perspective.

Shuvendu Lahiri

Very fortunate to have collaborated with Tom, and also to revive in small parts Tom and (late) Susan’s seminal work on program integration/merge conflict resolution through formal methods and ML.

 

[a] https://dl.acm.org/doi/10.1145/3672202.3673715

Poster: Beyond Proximity: Exploring Remote Cloud Peering

Authors: Esteban Carisimo, Mia Weaver, Fabián Bustamante, Paul Barford

We investigate network peering location choices, focusing on whether networks opt for distant peering sites even when nearby options are available. We conduct a network-wide cloud-based traceroute campaign using virtual machine instances from four major cloud providers to identify peering locations and calculate the “peering stretch”: the extra distance networks travel beyond the nearest data center to their actual peering points. Our results reveal a median peering stretch of 300 kilometers, with some networks traveling as much as 6,700 kilometers. We explore the characteristics of networks that prefer distant peering points and the potential motivations behind these choices, providing insights into digital sovereignty and cybersecurity implications.

[b] Efficient Implementation of an Abstract Domain of Quantified First-Order Formulas

Eden Frenkel, Tej Chajed, Oded Padon, Sharon Shoham

This paper lays a practical foundation for using abstract interpretation with an abstract domain that consists of sets of quantified first-order logic formulas. This abstract domain seems infeasible at first sight due to the complexity of the formulas involved and the enormous size of sets of formulas (abstract elements). We introduce an efficient representation of abstract elements, which eliminates redundancies based on a novel syntactic subsumption relation that under-approximates semantic entailment. We develop algorithms and data structures to efficiently compute the join of an abstract element with the abstraction of a concrete state, operating on the representation of abstract elements. To demonstrate feasibility of the domain, we use our data structures and algorithms to implement a symbolic abstraction algorithm that computes the least fixpoint of the best abstract transformer of a transition system, which corresponds to the strongest inductive invariant. We succeed at finding, for example, the least fixpoint for Paxos (which in our representation has 1,438 formulas with ∀∗∃∗∀∗ quantification) in time comparable to state-of-the-art property-directed approaches.

[c] See above [5] 

[d] https://www.usenix.org/conference/osdi24/presentation/zha

“Inductive Invariants That Spark Joy: Using Invariant Taxonomies to Streamline Distributed Protocol Proofs”

Authors: Tony Nuda Zhang, University of Michigan; Travis Hance, Carnegie Mellon University; Manos Kapritsos, University of Michigan; Tej Chajed, University of Wisconsin–Madison; Bryan Parno, Carnegie Mellon University

Abstract:  Proving the correctness of a distributed protocol is a challenging endeavor. Central to this task is finding an inductive invariant for the protocol. Currently, automated invariant inference algorithms require developers to describe protocols using a restricted logic. If the developer wants to prove a protocol expressed without these restrictions, they must devise an inductive invariant manually.

We propose an approach that simplifies and partially automates finding the inductive invariant of a distributed protocol, as well as proving that it really is an invariant. The key insight is to identify an invariant taxonomy that divides invariants into Regular Invariants, which have one of a few simple low-level structures, and Protocol Invariants, which capture the higher-level host relationships that make the protocol work.

Building on the insight of this taxonomy, we describe the Kondo methodology for proving the correctness of a distributed protocol modeled as a state machine. The developer first manually devises the Protocol Invariants by proving a synchronous version of the protocol correct. In this simpler version, sends and receives are replaced with atomic variable assignments. The Kondo tool then automatically generates the asynchronous protocol description, Regular Invariants, and proofs that the Regular Invariants are inductive on their own. Finally, Kondo combines these with the synchronous proof into a draft proof of the asynchronous protocol, which may then require a small amount of user effort to complete. Our evaluation shows that Kondo reduces developer effort for a wide variety of distributed protocols.

[e] https://arxiv.org/abs/2309.04664

 

“Compact: Approximating Complex Activation Functions for Secure Computation”

Mazharul Islam, Sunpreet S. Arora, Rahul Chatterjee, Peter Rindal, Maliheh Shirvanian

Secure multi-party computation (MPC) techniques can be used to provide data privacy when users query deep neural network (DNN) models hosted on a public cloud. State-of-the-art MPC techniques can be directly leveraged for DNN models that use simple activation functions such as ReLU. However, these techniques are ineffective and/or inefficient for the complex and highly non-linear activation functions used in cutting-edge DNN models.

We present Compact, which produces piece-wise polynomial approximations of complex AFs to enable their efficient use with state-of-the-art MPC techniques. Compact neither requires nor imposes any restriction on model training and results in near-identical model accuracy. To achieve this, we design Compact with input density awareness and use an application-specific simulated annealing type optimization to generate computationally more efficient approximations of complex AFs. We extensively evaluate Compact on four different machine-learning tasks with DNN architectures that use popular complex AFs silu, gelu, and mish. Our experimental results show that Compact incurs negligible accuracy loss while being 2x-5x computationally more efficient than state-of-the-art approaches for DNN models with large number of hidden layers. Our work accelerates easy adoption of MPC techniques to provide user data privacy even when the queried DNN models consist of a number of hidden layers and trained over complex AFs.

 

[f] https://petsymposium.org/popets/2024/popets-2024-0135.php

Scalable Metadata-Hiding for Privacy-Preserving IoT Systems

Authors: Yunang Chen (University of Wisconsin-Madison), David Heath (University of Illinois Urbana-Champaign), Rahul Chatterjee (University of Wisconsin-Madison), Earlence Fernandes (University of California San Diego)

Abstract: Modern cloud-based IoT services comprise an integrator service and several device vendor services. The vendor services enable users to remotely control their devices, while the integrator serves as a central intermediary, offering a unified interface for managing devices from different vendors. Although such a model is quite beneficial for IoT services to evolve quickly, it also creates a serious privacy concern: the vendor and integrator services observe all interactions between users and devices. Toward this, we propose Mohito, a privacy-preserving IoT system that hides such interactions from both the integrator and the vendors. In Mohito, we protect both the interaction data and the metadata, so that no one learns which user is communicating with which device. By utilizing oblivious key-value storage as a primitive and leveraging the unique communication graph of IoT services, we build a scalable protocol specialized in handling large concurrent traffic, a common demand in IoT systems. Our evaluation shows that Mohito can achieve up to 600x more throughput than the state-of-the-art general-purpose systems that provide similar security guarantees.

 

[g] https://dl.acm.org/doi/10.1145/3650110

Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program Traces

Authors: Asmita Pal, Keerthana Desai, Rahul Chatterjee, Joshua San Miguel

ACM Transactions on Architecture and Code Optimization

Volume 21, Issue 2

Trace-based simulation is a widely used methodology for system design exploration. It relies on realistic traces that represent a range of behaviors necessary to be evaluated, containing a lot of information about the application, its inputs and the underlying system on which it was generated. Consequently, generating traces from real-world executions risks leakage of sensitive information. To prevent this, traces can be obfuscated before release. However, this can undermine their ideal utility, i.e., how realistically a program behavior was captured. To address this, we propose Camouflage, a novel obfuscation framework, designed with awareness of the necessary architectural properties required to preserve trace utility, while ensuring secrecy of the inputs used to generate the trace. Focusing on memory access traces, our extensive evaluation on various benchmarks shows that camouflaged traces preserve the performance measurements of the original execution, with an average τ correlation of 0.66. We model input secrecy as an input indistinguishability problem and show that the average security loss is 7.8%, which is better than traces generated from the state-of-the-art.

 

[h] https://www.usenix.org/conference/usenixsecurity24/presentation/gupta

‘”I really just leaned on my community for support”: Barriers, Challenges, and Coping Mechanisms Used by Survivors of Technology-Facilitated Abuse to Seek Social Support’

Naman Gupta, University of Wisconsin–Madison, Madison Tech Clinic; Kate Walsh, University of Wisconsin–Madison; Sanchari Das, University of Denver; Rahul Chatterjee, University of Wisconsin–Madison, Madison Tech Clinic

Abstract: Technology-facilitated abuse (TFA) from an intimate partner is a growing concern for survivors’ safety and security. Prior research introduced tailored interventions to support the survivors of TFA. However, most survivors do not have access to these interventions or are unaware of the appropriate support networks and resources in their community. We conducted nine semi-structured interviews to examine survivors’ support-seeking dynamics from their social networks and the effectiveness of social networks in addressing survivors’ needs. Through survivors’ lived experiences, we systematize socio-technical barriers that impede the participant from seeking support and challenges in seeking support. Further, we identify coping mechanisms used by the survivors despite those barriers and challenges. Through a participatory lens, we echo survivors’ call for action to improve support networks and propose recommendations for technology design to promote safer support-seeking practices and resources, consciousness-raising awareness campaigns, and collaborations with the community. Finally, we call for a restorative-justice-oriented framework that recognizes TFA.

 

[i] https://arxiv.org/abs/2406.01792

The SemGuS Toolkit

Keith J.C. Johnson, Andrew Reynolds, Thomas Reps, Loris D’Antoni

Semantics-Guided Synthesis (SemGuS) is a programmable framework for defining synthesis problems in a domain- and solver-agnostic way. This paper presents the standardized SemGuS format, together with an open-source toolkit that provides a parser, a verifier, and enumerative SemGuS solvers. The paper also describes an initial set of SemGuS benchmarks, which form the basis for comparing SemGuS solvers, and presents an evaluation of the baseline enumerative solvers.

 

[j] https://learningtheory.org/colt2024/schedule/poster_206.html

Testable Learning of General Halfspaces with Adversarial Label Noise

Diakonikolas, Ilias; Kane, Daniel M; Liu, Sihan; Zarifis, Nikos

Abstract: We study the task of testable learning of general — not necessarily homogeneous — halfspaces with adversarial label noise with respect to the Gaussian distribution. In the testable learning framework, the goal is to develop a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data. Our main result is the first polynomial time tester-learner for general halfspaces that achieves dimension-independent misclassification error. At the heart of our approach is a new methodology to reduce testable learning of general halfspaces to testable learning of nearly homogeneous halfspaces that may be of broader interest.

 

[k] https://arxiv.org/abs/2403.02300

Statistical Query Lower Bounds for Learning Truncated Gaussians

Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas, Nikos Zarifis

We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family  of sets. Specifically, for a fixed but unknown truncation set S⊆ℝd, we are given access to samples from the distribution (μ,I) truncated to the set S. The goal is to estimate μ within accuracy ϵ>0 in ℓ2-norm. Our main result is a Statistical Query (SQ) lower bound suggesting a super-polynomial information-computation gap for this task. In more detail, we show that the complexity of any SQ algorithm for this problem is dpoly(1/ϵ), even when the class  is simple so that poly(d/ϵ) samples information-theoretically suffice. Concretely, our SQ lower bound applies when  is a union of a bounded number of rectangles whose VC dimension and Gaussian surface are small. As a corollary of our construction, it also follows that the complexity of the previously known algorithm for this task is qualitatively best possible.

 

[l] https://arxiv.org/abs/2307.12840

Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials

Ilias Diakonikolas, Daniel M. Kane

We study the problem of PAC learning a linear combination of k ReLU activations under the standard Gaussian distribution on ℝd with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity (dk/ϵ)O(k), where ϵ>0 is the target accuracy. Prior work had given an algorithm for this problem with complexity (dk/ϵ)h(k), where the function h(k) scales super-polynomially in k. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the O(k)-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.

[m] https://arxiv.org/abs/2402.03502

How Does Unlabeled Data Provably Help Out-of-Distribution Detection?

Xuefeng Du, Zhen Fang, Ilias Diakonikolas, Yixuan Li

Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD classifier. Currently, there is a lack of research on formally understanding how unlabeled data helps OOD detection. This paper bridges the gap by introducing a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness. The framework separates candidate outliers from the unlabeled data and then trains an OOD classifier using the candidate outliers and the labeled ID data. Theoretically, we provide rigorous error bounds from the lens of separability and learnability, formally justifying the two components in our algorithm. Our theory shows that SAL can separate the candidate outliers with small error rates, which leads to a generalization guarantee for the learned OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, reinforcing our theoretical insights. Code is publicly available at this https URL.

[n] https://openreview.net/forum?id=0i6Z9N5MLY

Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions

Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas

Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which $n$ component operators in the finite sum are “on average” either cocoercive or Lipschitz continuous and monotone, with parameter $L$. The resulting oracle complexity of our methods, which provide guarantees for the last iterate and for a (computable) operator norm residual, is $\widetilde{\mathcal{O}}( n + \sqrt{n}L\varepsilon^{-1})$, which improves upon existing methods by a factor up to $\sqrt{n}$. This constitutes the first variance reduction-type result for general finite-sum monotone inclusions and for more specific problems such as convex-concave optimization when operator norm residual is the optimality measure. We further argue that, up to poly-logarithmic factors, this complexity is unimprovable in the monotone Lipschitz setting; i.e., the provided result is near-optimal.

[o] https://eprint.iacr.org/2024/798

“Incompressible Functional Encryption”

Rishab Goyal, Venkata Koppula, Mahesh Sreekumar Rajasree, Aman Verma

Incompressible encryption (Dziembowski, Crypto’06; Guan, Wichs, Zhandry, Eurocrypt’22) protects from attackers that learn the entire decryption key, but cannot store the full ciphertext. In incompressible encryption, the attacker must try to compress a ciphertext within pre-specified memory bound before receiving the secret key.

In this work, we generalize the notion of incompressibility to functional encryption. …

 

[p] https://eprint.iacr.org/2024/931

“Leveled Fully-Homomorphic Signatures from Batch Arguments”

Abtin Afshar, Jiaqi Cheng, Rishab Goyal

Fully homomorphic signatures are a significant strengthening of digital signatures, enabling computations on \emph{secretly} signed data. Today, we have multiple approaches to design fully homomorphic signatures such as from lattices, or succinct functional commitments, or indistinguishability obfuscation, or mutable batch arguments. Unfortunately, all existing constructions for homomorphic signatures suffer from one or more limitations. We do not have homomorphic signatures with features such as multi-hop evaluation, context hiding, and fast amortized verification, while relying on standard falsifiable assumptions.

In this work, we design homomorphic signatures satisfying all above properties. We construct homomorphic signatures for polynomial-sized circuits from a variety of standard assumptions such as sub-exponential DDH, standard pairing-based assumptions, or learning with errors. We also discuss how our constructions can be easily extended to the multi-key setting.

 

[q] https://dl.acm.org/doi/10.1145/3641517.3664397

“A Live Demo of Single-Photon Imaging and Applications”

Authors: Sacha Jungerman, Varun Sundar, Mohit Gupta

SIGGRAPH ’24: ACM SIGGRAPH 2024 Emerging Technologies

Single-photon sensors are a novel class of imaging sensors that are sensitive to the individual arrival of photons. In recent years, single-photon sensors have witnessed rapid growth in key sensor characteristics such as array resolution, which, for the first time, has made computer vision and imaging with these emerging sensors possible. This rapid growth has also spurred the development of algorithms that can harness the rich spatio-temporal scene information captured by single-photon sensors, which has led to several new imaging capabilities. These capabilities include imaging in challenging SNR conditions, high-dynamic range imaging, compensating extreme camera motion extents, and even providing the functionality of numerous imaging systems via post-capture processing. Our demo will showcase these exciting capabilities to a wide computer vision and graphics audience, and in doing so, make a case for the mainstream adoption of single-photon technology.

[r] https://openreview.net/forum?id=sVEu295o70

“Understanding when Dynamics-Invariant Data Augmentations Benefit Model-free Reinforcement Learning Updates”

Nicholas Corrado, Josiah P. Hanna

ICLR 2024

Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency. While prior work has demonstrated the utility of incorporating augmented data directly into model-free RL updates, it is not well-understood when a particular DA strategy will improve data efficiency. In this paper, we seek to identify general aspects of DA responsible for observed learning improvements. Our study focuses on sparse-reward tasks with dynamics-invariant data augmentation functions, serving as an initial step towards a more general understanding of DA and its integration into RL training. Experimentally, we isolate three relevant aspects of DA: state-action coverage, reward density, and the number of augmented transitions generated per update (the augmented replay ratio). From our experiments, we draw two conclusions: (1) increasing state-action coverage often has a much greater impact on data efficiency than increasing reward density, and (2) decreasing the augmented replay ratio substantially improves data efficiency. In fact, certain tasks in our empirical study are solvable only when the replay ratio is sufficiently low.

[s] https://dl.acm.org/doi/10.1145/3579986

“Introduction to the Special Issue on Automotive CPS Safety & Security: Part 1”

Authors: Samarjit Chakraborty, Somesh Jha, Soheil Samii, Philipp Mundhenk

[t] https://arxiv.org/abs/2304.03870

“ASPEST: Bridging the Gap Between Active Learning and Selective Prediction”

Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan Arik, Somesh Jha, Tomas Pfister

Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST→SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop.

 

[u] https://openreview.net/forum?id=dwzLn78jq7

On the Scalability and Memory Efficiency of Semidefinite Programs for Lipschitz Constant Estimation of Neural Networks

Download PDF

Zi Wang, Bin Hu, Aaron J Havens, Alexandre Araujo, Yang Zheng, Yudong Chen, Somesh Jha

ICLR 2024

TL;DR: We scale the SDP for Lipschitz constant estimation to practical CIFAR10/ImageNet networks.

Lipschitz constant estimation plays an important role in understanding generalization, robustness, and fairness in deep learning. Unlike naive bounds based on the network weight norm product, semidefinite programs (SDPs) have shown great promise in providing less conservative Lipschitz bounds with polynomial-time complexity guarantees. However, due to the memory consumption and running speed, standard SDP algorithms cannot scale to modern neural network architectures. In this paper, we transform the SDPs for Lipschitz constant estimation into an eigenvalue optimization problem, which aligns with the modern large-scale optimization paradigms based on first-order methods. This is amenable to autodiff frameworks such as PyTorch and TensorFlow, requiring significantly less memory than standard SDP algorithms. The transformation also allows us to leverage various existing numerical techniques for eigenvalue optimization, opening the way for further memory improvement and computational speedup. The essential technique of our eigenvalue-problem transformation is to introduce redundant quadratic constraints and then utilize both Lagrangian and Shor’s SDP relaxations under a certain trace constraint. Notably, our numerical study successfully scales the SDP-based Lipschitz constant estimation to address large neural networks on ImageNet. Our numerical examples on CIFAR10 and ImageNet demonstrate that our technique is more scalable than existing approaches. Our code is available at https://github.com/z1w/LipDiff.

[v] https://arxiv.org/abs/2310.05385

Conjunctive Queries with Negation and Aggregation: A Linear Time Characterization

Hangdong Zhao, Austen Z. Fan, Xiating Ouyang, Paraschos KoutrisIn this paper, we study the complexity of evaluating Conjunctive Queries with negation (\cqneg). First, we present an algorithm with linear preprocessing time and constant delay enumeration for a class of CQs with negation called free-connex signed-acyclic queries. We show that no other queries admit such an algorithm subject to lower bound conjectures. Second, we extend our algorithm to Conjunctive Queries with negation and aggregation over a general semiring, which we call Functional Aggregate Queries with negation (\faqneg). Such an algorithm achieves constant delay enumeration for the same class of queries, but with a slightly increased preprocessing time which includes an inverse Ackermann function. We show that this surprising appearance of the Ackermmann function is probably unavoidable for general semirings, but can be removed when the semiring has specific structure. Finally, we show an application of our results to computing the difference of CQs.

[w] https://arxiv.org/abs/2310.19642

Consistent Query Answering for Primary Keys on Rooted Tree Queries

Paraschos Koutris, Xiating Ouyang, Jef Wijsen

We study the data complexity of consistent query answering (CQA) on databases that may violate the primary key constraints. A repair is a maximal subset of the database satisfying the primary key constraints. For a Boolean query q, the problem CERTAINTY(q) takes a database as input, and asks whether or not each repair satisfies q. The computational complexity of CERTAINTY(q) has been established whenever q is a self-join-free Boolean conjunctive query, or a (not necessarily self-join-free) Boolean path query. In this paper, we take one more step towards a general classification for all Boolean conjunctive queries by considering the class of rooted tree queries. In particular, we show that for every rooted tree query q, CERTAINTY(q) is in FO, NL-hard ∩ LFP, or coNP-complete, and it is decidable (in polynomial time), given q, which of the three cases applies. We also extend our classification to larger classes of queries with simple primary keys. Our classification criteria rely on query homomorphisms and our polynomial-time fixpoint algorithm is based on a novel use of context-free grammar (CFG).

[x] https://dl.acm.org/doi/10.1145/3651588

Tight Bounds of Circuits for Sum-Product Queries

Authors: Austen Z. Fan, Paraschos Koutris, Hangdong Zhao
Proceedings of the ACM on Management of Data, Volume 2, Issue 2

In this paper, we ask the following question: given a Boolean Conjunctive Query (CQ), what is the smallest circuit that computes the provenance polynomial of the query over a given semiring? We answer this question by giving upper and lower bounds. Notably, it is shown that any circuit F that computes a CQ over the tropical semiring must have size log |F| ≥ (1-ε) · da-entw for any ε >0, where da-entw is the degree-aware entropic width of the query. We show a circuit construction that matches this bound when the semiring is idempotent. The techniques we use combine several central notions in database theory: provenance polynomials, tree decompositions, and disjunctive Datalog programs. We extend our results to lower and upper bounds for formulas (i.e., circuits where each gate has outdegree one), and to bounds for non-Boolean CQs.

[y] https://arxiv.org/abs/2403.12436

Evaluating Datalog over Semirings: A Grounding-based Approach

Hangdong Zhao, Shaleen Deep, Paraschos Koutris, Sudeepa Roy, Val Tannen

Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring σ, what is the tightest possible runtime? To this end, our main contribution is a general two-phase framework for analyzing the data complexity of Datalog over σ: first ground the program into an equivalent system of polynomial equations (i.e. grounding) and then find the least fixpoint of the grounding over σ. We present algorithms that use structure-aware query evaluation techniques to obtain the smallest possible groundings. Next, efficient algorithms for fixpoint evaluation are introduced over two classes of semirings: (1) finite-rank semirings and (2) absorptive semirings of total order. Combining both phases, we obtain state-of-the-art and new algorithmic results. Finally, we complement our results with a matching fine-grained lower bound.

[z] https://dl.acm.org/doi/abs/10.1145/3651598

Topology-aware Parallel Joins

Authors: Xiao Hu, Paraschos KoutrisAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 2, Issue 2

We study the design and analysis of parallel join algorithms in a topology-aware computational model. In this model, the network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred between the two endpoints and the link bandwidth. The computation proceeds in synchronous rounds and the cost of each round is measured as the maximum cost over all the edges in the network. Our main result is an asymptotically optimal join algorithm over symmetric tree topologies. The algorithm generalizes prior topology-aware protocols for set intersection and cartesian product to a binary join over an arbitrary input distribution with possible data skew.

[aa] https://arxiv.org/abs/2311.07377

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

Xi Zheng, Aloysius K. Mok, Ruzica Piskac, Yong Jae Lee, Bhaskar Krishnamachari, Dakai Zhu, Oleg Sokolsky, Insup Lee


The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations. This convergence has accelerated the development and deployment of a range of real-world applications, such as autonomous vehicles, delivery drones, service robots, and telemedicine procedures. However, the software development life cycle (SDLC) for AI-infused CPS diverges significantly from traditional approaches, featuring data and learning as two critical components. Existing verification and validation techniques are often inadequate for these new paradigms. In this study, we pinpoint the main challenges in ensuring formal safety for learningenabled CPS.We begin by examining testing as the most pragmatic method for verification and validation, summarizing the current state-of-the-art methodologies. Recognizing the limitations in current testing approaches to provide formal safety guarantees, we propose a roadmap to transition from foundational probabilistic testing to a more rigorous approach capable of delivering formal assurance.

[ab] https://arxiv.org/abs/2402.07785

HYPO: Hyperspherical Out-of-Distribution Generalization

Yifei Ming, Haoyue Bai, Julian Katz-Samuels, Yixuan Li

Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires the ability to learn invariant features across different domains or environments. In this paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space. In particular, our hyperspherical learning algorithm is guided by intra-class variation and inter-class separation principles — ensuring that features from the same class (across different training domains) are closely aligned with their class prototypes, while different class prototypes are maximally separated. We further provide theoretical justifications on how our prototypical learning objective improves the OOD generalization bound. Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. Code is available at this https URL.

[ac] https://arxiv.org/abs/2402.17888

ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection

Bo Peng, Yadan Luo, Yonggang Zhang, Yixuan Li, Zhen Fang

Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. Many efforts have been dedicated to deriving score functions based on logits, distances, or rigorous data distribution assumptions to identify low-scoring OOD samples. Nevertheless, these estimate scores may fail to accurately reflect the true data density or impose impractical constraints. To provide a unified perspective on density-based score design, we propose a novel theoretical framework grounded in Bregman divergence, which extends distribution considerations to encompass an exponential family of distributions. Leveraging the conjugation constraint revealed in our theorem, we introduce a \textsc{ConjNorm} method, reframing density function design as a search for the optimal norm coefficient p against the given dataset. In light of the computational challenges of normalization, we devise an unbiased and analytically tractable estimator of the partition function using the Monte Carlo-based importance sampling technique. Extensive experiments across OOD detection benchmarks empirically demonstrate that our proposed \textsc{ConjNorm} has established a new state-of-the-art in a variety of OOD detection setups, outperforming the current best method by up to 13.25% and 28.19% (FPR95) on CIFAR-100 and ImageNet-1K, respectively.

[ad] https://arxiv.org/abs/2402.15017

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang

Foundation models have emerged as a powerful tool for many AI problems. Despite the tremendous success of foundation models, effective adaptation to new tasks, particularly those with limited labels, remains an open question and lacks theoretical understanding. An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples. In this paper, we study the theoretical justification of this multitask finetuning approach. Our theoretical analysis reveals that with a diverse set of related tasks, this multitask finetuning leads to reduced error in the target task, in comparison to directly adapting the same pretrained model. We quantify the relationship between finetuning tasks and target tasks by diversity and consistency metrics, and further propose a practical task selection algorithm. We substantiate our theoretical claims with extensive empirical evidence. Further, we present results affirming our task selection algorithm adeptly chooses related finetuning tasks, providing advantages to the model performance on target tasks. We believe our study shed new light on the effective adaptation of foundation models to new tasks that lack abundant labels. Our code is available at this https URL.

[ae] https://dl.acm.org/doi/full/10.1145/3653716

eZNS: An Elastic Zoned Namespace for Commodity ZNS SSDs

Jaehong Min, Chenxingyu Zhao, Ming Liu, and Arvind Krishnamurthy

Emerging Zoned Namespace (ZNS) SSDs, providing the coarse-grained zone abstraction, hold the potential to significantly enhance the cost-efficiency of future storage infrastructure and mitigate performance unpredictability. However, existing ZNS SSDs have a static zoned interface, making them in-adaptable to workload runtime behavior, unscalable to underlying hardware capabilities, and interfering with co-located zones. Applications either under-provision the zone resources yielding unsatisfied throughput, create over-provisioned zones and incur costs, or experience unexpected I/O latencies. We propose eZNS, an elastic-zoned namespace interface that exposes an adaptive zone with predictable characteristics. eZNS comprises two major components: a zone arbiter that manages zone allocation and active resources on the control plane, a hierarchical I/O scheduler with read congestion control, and write admission control on the data plane. Together, eZNS enables the transparent use of a ZNS SSD and closes the gap between application requirements and zone interface properties. Our evaluations over RocksDB demonstrate that eZNS outperforms a static zoned interface by 17.7% and

80.3% in throughput and tail latency, respectively, at most.

[af] https://www.usenix.org/conference/nsdi24/presentation/hou

Understanding Routable PCIe Performance for Composable Infrastructures

Wentao Hou, Jie Zhang, Zeke Wang, Ming Liu

Routable PCIe has become the predominant cluster interconnect to build emerging composable infrastructures. Empowered by PCIe non-transparent bridge devices, PCIe transactions can traverse multiple switching domains, enabling a server to elastically integrate a number of remote PCIe devices as local ones. However, it is unclear how to move data or perform communication efficiently over the routable PCIe fabric without understanding its capabilities and limitations.

This paper presents the design and implementation of rPCIeBench, a software-hardware co-designed benchmarking framework to systematically characterize the routable PCIe fabric. rPCIeBench provides flexible data communication primitives, exposes end-to-end PCIe transaction observability, and enables reconfigurable experiment deployment. Using rPCIeBench, we first analyze the communication characteristics of a routable PCIe path, quantify its performance tax, and compare it with the local PCIe link. We then use it to dissect in-fabric traffic orchestration behaviors and draw three interesting findings: approximate max-min bandwidth partition, fast end-to-end bandwidth synchronization, and interference-free among orthogonal data paths. Finally, we encode gathered characterization insights as traffic orchestration rules and develop an edge constraints relaxing algorithm to estimate PCIe flow transmission performance over a shared fabric. We validate its accuracy and demonstrate its potential to provide an optimization guide to design efficient flow schedulers.

 

[ag] https://arxiv.org/abs/2403.09543

Explorations in Texture Learning

Blaine Hoak, Patrick McDaniel

In this work, we investigate \textit{texture learning}: the identification of textures learned by object classification models, and the extent to which they rely on these textures. We build texture-object associations that uncover new insights about the relationships between texture and object classes in CNNs and find three classes of results: associations that are strong and expected, strong and not expected, and expected but not present. Our analysis demonstrates that investigations in texture learning enable new methods for interpretability and have the potential to uncover unexpected biases.

[ah] https://arxiv.org/abs/2403.19577

A Public and Reproducible Assessment of the Topics API on Real Data

Yohan Beugin, Patrick McDaniel

The Topics API for the web is Google’s privacy-enhancing alternative to replace third-party cookies. Results of prior work have led to an ongoing discussion between Google and research communities about the capability of Topics to trade off both utility and privacy. The central point of contention is largely around the realism of the datasets used in these analyses and their reproducibility; researchers using data collected on a small sample of users or generating synthetic datasets, while Google’s results are inferred from a private dataset. In this paper, we complement prior research by performing a reproducible assessment of the latest version of the Topics API on the largest and publicly available dataset of real browsing histories. First, we measure how unique and stable real users’ interests are over time. Then, we evaluate if Topics can be used to fingerprint the users from these real browsing traces by adapting methodologies from prior privacy studies. Finally, we call on web actors to perform and enable reproducible evaluations by releasing anonymized distributions. We find that 2%, 3%, and 4% of the 1207 users in the dataset are uniquely re-identified across websites after only 1, 2, and 3 observations of their topics by advertisers, respectively. This paper shows on real data that Topics does not provide the same privacy guarantees to all users and that the information leakage worsens over time, further highlighting the need for public and reproducible evaluations of the claims made by new web proposals.

[ai] https://dl.acm.org/doi/10.1145/3656156.3658395

RoboCare Design Workshop: Understanding, Translating, Operationalizing, and Scaling Up Design Knowledge Regarding Robotic Systems for Care Assistance

Laura Stegner, Richard Paluch, Long-Jing Hsu, Sawyer Collins, Yaxin Hu, Marius Greuèl, Naonori Kodate, Claudia Müller, Bilge Mutlu, Selma Šabanović

Robots and other autonomous agents are well-positioned in the research discourse to support the care of people with challenges such as physical and/or cognitive disabilities. However, designing these robots can be complex as it involves considering a wide range of factors (e.g., individual needs, physical environment, technology capabilities, digital literacy), stakeholders (e.g., care recipients, formal and informal caregivers, technology developers), and contexts (e.g., hospitals, nursing homes, outpatient care facilities, private homes). The challenges are in gaining design insights for this unique use case and translating this knowledge into actionable, generalizable guidelines for other designers. This one-day workshop seeks to bring together researchers with diverse expertise and experience across academia, healthcare, and industry, spanning perspectives from multiple disciplines, including design, robotics, and human-computer interaction, with the primary goal being a consensus on best practices for generating and operationalizing design knowledge for robotic systems for care settings.

[aj] https://arxiv.org/abs/2307.09064

Newtonian Program Analysis of Probabilistic Programs

Di Wang, Thomas Reps

Due to their quantitative nature, probabilistic programs pose non-trivial challenges for designing compositional and efficient program analyses. Many analyses for probabilistic programs rely on iterative approximation. This article presents an interprocedural dataflow-analysis framework, called NPA-PMA, for designing and implementing (partially) non-iterative program analyses of probabilistic programs with unstructured control-flow, nondeterminism, and general recursion. NPA-PMA is based on Newtonian Program Analysis (NPA), a generalization of Newton’s method to solve equation systems over semirings. The key challenge for developing NPA-PMA is to handle multiple kinds of confluences in both the algebraic structures that specify analyses and the equation systems that encode control flow: semirings support a single confluence operation, whereas NPA-PMA involves three confluence operations (conditional, probabilistic, and nondeterministic).

Our work introduces ω-continuous pre-Markov algebras (ωPMAs) to factor out common parts of different analyses; adopts regular infinite-tree expressions to encode program-execution paths in control-flow hyper-graphs; and presents a linearization method that makes Newton’s method applicable to the setting of regular-infinite-tree equations over ωPMAs. NPA-PMA allows analyses to supply a non-iterative strategy to solve linearized equations. Our experimental evaluation demonstrates that (i) NPA-PMA holds considerable promise for outperforming Kleene iteration, and (ii) provides great generality for designing program analyses.

[ak] https://arxiv.org/abs/2211.06818

CFLOBDDs: Context-Free-Language Ordered Binary Decision Diagrams

Meghana Sistla, Swarat Chaudhuri, Thomas Reps

This paper presents a new compressed representation of Boolean functions, called CFLOBDDs (for Context-Free-Language Ordered Binary Decision Diagrams). They are essentially a plug-compatible alternative to BDDs (Binary Decision Diagrams), and hence useful for representing certain classes of functions, matrices, graphs, relations, etc. in a highly compressed fashion. CFLOBDDs share many of the good properties of BDDs, but–in the best case–the CFLOBDD for a Boolean function can be exponentially smaller than any BDD for that function. Compared with the size of the decision tree for a function, a CFLOBDD–again, in the best case–can give a double-exponential reduction in size. They have the potential to permit applications to (i) execute much faster, and (ii) handle much larger problem instances than has been possible heretofore.

CFLOBDDs are a new kind of decision diagram that go beyond BDDs (and their many relatives). The key insight is a new way to reuse sub-decision-diagrams: components of CFLOBDDs are structured hierarchically, so that sub-decision-diagrams can be treated as standalone ”procedures” and reused.

We applied CFLOBDDs to the problem of simulating quantum circuits, and found that for several standard problems the improvement in scalability–compared to simulation using BDDs–is quite dramatic. In particular, the number of qubits that could be handled using CFLOBDDs was larger, compared to BDDs, by a factor of 128x for GHZ; 1,024x for BV; 8,192x for DJ; and 128x for Grover’s algorithm. (With a 15-minute timeout, the number of qubits that CFLOBDDs can handle are 65,536 for GHZ, 524,288 for BV; 4,194,304 for DJ; and 4,096 for Grover’s Algorithm.)

[al] https://arxiv.org/abs/2309.04344

Zero-Shot Robustification of Zero-Shot Models

Dyah Adila, Changho Shin, Linrong Cai, Frederic Sala

Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We propose RoboShot, a method that improves the robustness of pretrained model embeddings in a fully zero-shot fashion. First, we use language models (LMs) to obtain useful insights from task descriptions. These insights are embedded and used to remove harmful and boost useful components in embeddings — without any supervision. Theoretically, we provide a simple and tractable model for biases in zero-shot embeddings and give a result characterizing under what conditions our approach can boost performance. Empirically, we evaluate RoboShot on nine image and NLP classification tasks and show an average improvement of 15.98% on worst group accuracy, with trivial decrease in overall accuracy over several zero-shot baselines. Additionally, we demonstrate that RoboShot is compatible with a variety of pretrained and language models and propose a way to further boost performance with a zero-shot adaptation variant.

[am] https://www.usenix.org/conference/atc24/presentation/tabatabai

FBMM: Making Memory Management Extensible With Filesystems

Bijan Tabatabai, James Sorenson, and Michael M. Swift, University of Wisconsin—Madison

New memory technologies like CXL promise diverse memory configurations such as tiered memory, far memory, and processing in memory. Operating systems must be modified to support these new hardware configurations for applications to make use of them. While many parts of operating systems are extensible, memory management remains monolithic in most systems, making it cumbersome to add support for a diverse set of new memory policies and mechanisms.

Rather than creating a whole new extensible interface for memory managers, we propose to instead use the memory management callbacks provided by the Linux virtual file system (VFS) to write memory managers, called memory management filesystems (MFSs). Memory is allocated by creating and mapping a file in an MFS’s mount directory and freed by deleting the file. Use of an MFS is transparent to applications. We call this system File Based Memory Management (FBMM).

Using FBMM, we created a diverse set of standalone memory managers for tiered memory, contiguous allocations, and memory bandwidth allocation, each comprising 500-1500 lines of code. Unlike current approaches that require custom kernels, with FBMM, an MFS can be compiled separately from the kernel and loaded dynamically when needed. We measure the overhead of using filesystems for memory management and found the overhead to be less than 8% when allocating a single page, and less than 0.1% when allocating as little as 128 pages. MFSs perform competitively with kernel implementations, and sometimes better due to simpler implementations.

[an] https://arxiv.org/abs/2309.01753

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak

In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter σ>0. In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be O(σ)-close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as O(σ)-approximation of the original BO, we propose first-order algorithms that find an ϵ-stationary solution by optimizing the penalty formulation with σ=O(ϵ). When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an ϵ-stationary point of the penalty function, using in total O(ϵ−3) and O(ϵ−7) accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully {\it single-loop} manner, i.e., with O(1) samples per iteration, and achieves the improved oracle-complexity of O(ϵ−3) and O(ϵ−5), respectively.

September 2024

[1] On behalf of the USENIX Security Test of Time Award Committee, I’m delighted to inform you that your paper, “Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing,” originally presented at USENIX Security ’14, has won the 2024 USENIX Security Test of Time Award. The award recognizes outstanding work in computing systems security and privacy research that has had a lasting impact on the community. Congratulations!

https://www.usenix.org/conferences/test-of-time-awards

[2] “Uncertainty and my scientific journey”

Aws Albarghouthi

Programming Languages Mentoring Workshop @ PLDI

https://www.youtube.com/watch?v=kYZiPZ3EwRc

Abstract: I will walk you through my academic journey from graduate school to a tenured faculty position. Reflecting upon the past 15 years, I will highlight the recurring theme of uncertainty, and how it arose in unexpected ways along the way. I will talk about how uncertainty can be a significant source of mental stress and how I’ve tried to understand it and navigate it in different ways.

[3] “UW–Madison Awarded National Center of Academic Excellence in Cyber Research (CAE-R) Designation”

The University of Wisconsin–Madison has been designated as a National Center of Academic Excellence in Cyber Research (CAE-R) by the National Security Agency (NSA). This prestigious designation underscores UW–Madison’s commitment to advancing cybersecurity education and research and recognizes its position as a leader in the field.

“This designation recognizes that our faculty, staff, and students are on the forefront of Cyber Research, and it is amongst the most important research that we do,” says Eric Wilcots, Dean of UW–Madison’s College of Letters & Science. “As a Center of Academic Excellence, we will have the opportunity to address hard questions and deepen our impact.”

https://ls.wisc.edu/news/uw-madison-awarded-prestigious-national-center-of-academic-excellence-for-cyber-research-designation

[4] The CS department plans to make the following awards in the inaugural year:

[5] https://www.nsf.gov/awardsearch/showAward?AWD_ID=2404180

Prof Yong Jae Lee

RI: Small: Understanding and Advancing the Generalization Capabilities of Fake Image Detectors

Abstract: This project develops an integrated research, education, and outreach program to provide a foundation for understanding and advancing the generalization capabilities of fake (AI-generated) image detectors. With the rise and maturity of artificial intelligence (AI) generative models, fake images are proliferating in the digital world at an unprecedented rate. Fake images can cause harm in a variety of ways by deceiving, manipulating, and misinforming individuals and society at large. For example, they can be used for blackmail, disinformation, or financial fraud, etc. To make matters worse, there is no longer a single source of fake images. Synthesized images could take the form of realistic human faces generated using one type of fake image generator or they could take the form of complex scenes generated using another type of fake image generator. One can be almost certain that there will be more methods developed for generating fake images coming in the future. To combat the potential harm caused by fake images, this project aims to advance technology in image forensics, and build a deep understanding as to how AI-generated images differ from real images. In addition to scientific impact, this project performs complementary educational and outreach activities that engage students in research and STEM.

This research provides a foundation for gaining a deeper understanding of, and for advancing fundamental research in, detecting AI-generated images. In particular, it will focus on the generalization properties of fake image detectors. Specifically, it will investigate two main thrusts: (Thrust I) Understanding what makes fake AI-generated images fake, including the role of generative models vs. data, a novel benchmark and toolbox for universal fake image detection, and understanding the features that a fake detector focuses on. In Thrust II, the project will advance universal fake AI-generated image detection, including a novel frequency masking strategy, a few-shot adaptation approach for learning with only a few training examples, and extensions to video to account for spatio-temporal aspects of realism. Both thrusts will be studied in the context of creating generalizable (universal) fake image detection algorithms. The investigators’ wealth of experience in generative models and understanding model robustness, as well as initial work in this space, make them well-positioned to formulate and solve the relevant challenges, and lays the groundwork for the project.

[6] Invited Talk at ModSim (Workshop on Modeling & Simulation of Systems and Applications)

Title: Designing Better Tools for Power- and Sustainability-Aware Co-Design

Prof Matt Sinclair

Abstract: With the waning of Moore’s Law and the end of Dennard’s Scaling, systems ranging from smartphones to datacenters, hyperscalers, and supercomputers are increasingly turning towards heterogeneity to continue scaling performance and energy efficiency. Thus, modeling increasingly large, complex, diverse accelerators is critical to continued computer architecture and systems innovation. The rise in specialized, compute-heavy workloads like machine learning has given enhanced prominence to hardware accelerators. Thus, the next decades of hardware and system software research must balance efficiency through specialization and flexibility through generalization. To drive those innovations, the community requires credible, open-source infrastructure to study concrete changes to the existing state-of-the-art and evaluate the potential delta of more radical changes. Traditionally, developers rely on architecture simulators to model and prototype these systems. Unfortunately, modern computer architecture simulation and modeling techniques for power and carbon are either too inaccurate, too inflexible (e.g., difficult to extend), or too high level to enable co-design for these increasingly complex, increasingly heterogeneous systems. For sustainable computing to thrive, we believe that simulation and modeling tools can — and must — play a major role, and I describe our efforts in this direction.

Invited tutorial on gem5 at ORNL

Title: Learning How to Use gem5 to Study Large-Scale Workloads

Prof Matt Sinclair

Abstract: The community-developed gem5 infrastructure is cycle-level computer architecture simulator tool.  It models a wide set of diverse components including processors of various ISAs, caches, main memory, and network with a high level of microarchitecture fidelity.  During DOE’s Exascale FastForward2 program, AMD released support for modeling GPUs executing HPC applications built for the GCN3 ISA.  This work laid the foundation for development and prototyping of full-stack GPU optimizations for existing exascale-class HPC systems such as ORNL’s Frontier supercomputer.  This tutorial workshop will introduce the gem5 simulator, how to use it, and how to use a variety of its important features for prototyping full stack optimizations for large-scale workloads, including the latest AMD MI200 and MI300 ISAs.

[7] Prof Bilge Mutlu is part of a new NSF Engineering Research Center around advancing the state of the art in “dexterous robotics,” led by Northwestern University. The center will receive approximately $52M in funding over 10 years, and UW’s work will focus on programming interfaces for dexterous robots. 

News article: https://www.cs.wisc.edu/2024/08/21/human-augmentation-via-dexterity-erc-receives-nsf-funding

[9] https://news.wisc.edu/gilman-walker-and-arpaci-dusseau-to-assume-new-leadership-roles/

From the article: Isbell has appointed Remzi Arpaci-Dusseau, the Grace Wahba and Vilas Distinguished Achievement Professor of Computer Sciences, to a new role as special advisor to the provost for computing. “This role will advance our institutional planning to ensure that we remain on the forefront in the rapidly advancing research and education areas associated with computing writ large, including artificial intelligence, data and information sciences in the coming decades,” says Isbell.

[10] New faculty in the CS department this fall:

October 2024

[1] The PODS ‘24 program: https://2024.sigmod.org/program_pods.shtml

Award papers include:

[2] The Belonging & Inclusion Best Paper Award at UIST 2024
https://uist.acm.org/2024/ 

Title: CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

Authors: Jaewook Lee, Andrew D Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, and Yuhang Zhao

Abstract: Cooking is a central activity of daily living, supporting independence as well as mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV), we present CookAR, a head-mounted AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools. To design and implement CookAR, we collected and annotated the first egocentric dataset of kitchen tool affordances, fine-tuned an affordance segmentation model, and developed an AR system with a stereo camera to generate visual augmentations. To validate CookAR, we conducted a technical evaluation of our fine-tuned model as well as a qualitative lab study with 10 LV participants for suitable augmentation design. Our technical evaluation demonstrates that our model outperforms the baseline on our tool affordance dataset, while our user study indicates a preference for affordance augmentations over the traditional whole object augmentations.

[3] September was a busy month of talks/presentations in India and the Nederlands for Prof Miron Livny, including: 

  1. OSG and Future of Distributed Computing in the Age of Cloud” at the 8th Asia Tier Centre Forum (ACTF8) in Mumbai, India
  2. Job Lists in HTCondor {Y[i]=F(X[i]),i=1,n}” at the 2nd Asia HTCondor Workshop in Mumbai, India
  3. “CHTC – Enabling Throughput Computing at All Scales” IIT Bombay, India
  4. CHTC – Advancing Throughput Computing for the Physical Sciences” Tata Institute of Fundamental Research (TIFR) Mumbai, India 
  5. “Introduction to HTCondor and OSG” tutorial on IGWN-DHTC-OSG at the Inter University Centre for Astronomy and Astrophysics (IUCAA) in Pune, India
  6. “Placing a List of Jobs (Work in Progress)” Utrecht Nikhef HTCondor Tutorial” University of Utrecht, Nederland
  7. “Philosophy and Architecture” Utrecht Nikhef HTCondor Tutorial University of Utrecht, Nederland
  8. “Placing a List of Jobs (Work in Progress)” Nikhef HTCondor Tutorial – Amsterdam , National Institute for Subatomic Physics, (Nikhef), Amsterdam, Nederland
  9. “Philosophy and Architecture” Nikhef HTCondor Tutorial – Amsterdam , National Institute for Subatomic Physics, (Nikhef), Amsterdam, Nederland
  10. What the manual does not tell you – Philosophy and Architecture” 10th European HTCondor workshop, Amsterdam, Nederland
  11. What the manual does not tell you – History” 10th European HTCondor workshop, Amsterdam, Nederland
  12. CHTC Vision: Compute and Data Together” 10th European HTCondor workshop, Amsterdam, Nederland

CHTC did not organize these events; all were organized by the locals with CHTC support. In Europe there were more than 60 attendees. 

[4] Prof Cai’s papers include

SCIENCE CHINA

ACM Transactions on Computation Theory (with Ben Young)

[5] https://www.gem5.org/events/isca-2024

Prof Matt Sinclair (and student) hosted a tutorial on using gem5 at ISCA 2024 this past week. 

Notably, this tutorial (and 2 talks we gave at a concurrent workshop) were the first time his group discussed publicly how they’ve gotten PyTorch and TensorFlow to run at scale in gem5.

[6] Prof Fredrikson (Prof Jha’s student) receives tenure at CMU

Info: https://x.com/jhasomesh/status/1838708177476493639

Prof Fredrikson’s home page: https://mattfredrikson.com/

His research aims to enable systems that make secure, fair, and reliable use of machine learning. His group focuses on finding ways to understand the unique risks and vulnerabilities that arise from learned components, and on developing methods to mitigate them, often with provable guarantees.

[7] We are excited to announce this year’s Wisconsin Database Affiliates Workshop! For detailed information and the official schedule, please visit the workshop’s website. You may also RSVP here

The workshop took place on October 10 (Thursday), 8:30am–4:00pm, in Union South Room Northwoods (3rd floor). This year’s event included:

  – Research talks from faculty and graduate students 

  – Industry talks from affiliate companies 

  – A poster and demo session 

  – Free breakfast and lunch

[8] Jin-Yi Cai chaired the award committee for the 2024 FOCS Test of Time Awards committee. See https://focs.computer.org/2024/awards/ for details.

The members of the committee are

Julia Chuzoy (Toyota Technological Institute at Chicago), 

Costis Daskalakis (MIT), 

Faith Ellen (University of Toronto), 

Shafi Goldwasser (UC Berkeley), and

Leonard Schulman (Caltech).

[9] As organized and related to us by Justin Hines: On Oct 1st, the night before the career fair, we hosted our inaugural Student Org Reception to more casually introduce our student leaders to Employers.  We had six employers and over 50 students attend.  We invited every active student org across CDIS. We will continue to innovate for the students as they continue to participate in our offerings.  

[10] https://www.nsf.gov/awardsearch/showAward?AWD_ID=2431419

CloudLab Phase-IV: Strengthening a Core Resource for Research and Education”

Prof Ming Liu and Prof Shiv Venkataraman

$12m

Cloud technologies underpin many of the apps and computing services that we use every day, including social networking, online commerce, email, video conferencing, and many more. To be able to do research that drives forward the fundamental architecture of the cloud, scientists need an environment in which they can experiment with the primary building blocks of the cloud: compute power, storage, and networks. CloudLab provides an environment where researchers can build their own clouds, observing and changing how they operate all the way to the bottom layers of software.

CloudLab is a large facility, consisting, as of the start of this award, of more than 1,800 servers hosted at the University of Utah, Clemson University, and the University of Wisconsin-Madison. These servers include cutting-edge compute (such as several different processor architectures, graphics processing units, and field programmable gate arrays), networking (OpenFlow, user-programmable switching), and storage (hard drives, solid state drives) technologies. This phase IV award will support the expansion of this facility by several hundred servers to meet high demand. It will also add new technologies, such as Internet of Things devices, programmable network cards, and non-volatile random access memory, which will in turn support research on data-intensive computing and computing at the edge of the network.

CloudLab is available to researchers who work on the fundamentals of cloud computing. These users do research in areas such as networking, security, and databases; in turn, this work has broad impact, as these are the fundamental technologies upon which we build things such as smart cities, telehealth, online education, and other socially important computer services. Because it is offered at no cost for research and education to academic users, it acts as a level playing field on which institutions large and small, and those with many resources and those with few, can conduct work on an equal footing.

CloudLab can be found online at https://cloudlab.us/, which is the primary interface through which research users interact with the facility, making it accessible from anywhere. This award will support continued operation of the facility, including the aforementioned hardware expansion, development of new features, and user support.

This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.

[11] “Learning to Price Homogeneous Data”

Keran Chen, Joon Suk Huh, Kirthevasan Kandasamy

Accepted at NeurIPS

Abstract: We study a data pricing problem, where a seller has access to N homogeneous data points (e.g. drawn i.i.d. from some distribution). There are m types of buyers in the market, where buyers of the same type i have the same valuation curve vi : [N] → [0, 1], where vi(n) is the value for having n data points. A priori, the seller is unaware of the distribution of buyers, but can repeat the market for T rounds so as to learn the revenue-optimal pricing curve p : [N] → [0, 1]. To solve this online learning problem, we first develop novel discretization schemes to approximate any pricing curve. When compared to prior work, the size of our discretization schemes scales gracefully with the approximation parameter, which translates to better regret in online learning. Under assumptions like smoothness and diminishing returns which are satisfied by data, the discretization size can be reduced further. We then turn to the online learning problem, both in the stochastic and adversarial settings. On each round, the seller chooses an anonymous pricing curve pt. A new buyer appears and may choose to purchase some amount of data. She then reveals her type only if she makes a purchase. Our online algorithms build on classical algorithms such as UCB and FTPL, but require novel ideas to account for the asymmetric nature of this feedback and to deal with the vastness of the space of pricing curves. Using the improved discretization schemes previously developed, we are able to achieve O(m√ T ) regret in the stochastic setting and O(m^{3/2}√ T ) regret in the adversarial setting.

[12] “Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs”

Matthew Zurek, Yudong Chen
NeurIPS ‘24.

We study the sample complexity of learning an ε-optimal policy in an average-reward Markov decision process (MDP) under a generative model. For weakly communicating MDPs, we establish the complexity bound O˜(SAHε2), where H is the span of the bias function of the optimal policy and SA is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters S,A,H, and ε, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. We also initiate the study of sample complexity in general (multichain) average-reward MDPs. We argue a new transient time parameter B is necessary, establish an O˜(SAB+Hε2) complexity bound, and prove a matching (up to log factors) minimax lower bound. Both results are based on reducing the average-reward MDP to a discounted MDP, which requires new ideas in the general setting. To optimally analyze this reduction, we develop improved bounds for γ-discounted MDPs, showing that O˜(SAH(1−γ)2ε2) and O˜(SAB+H(1−γ)2ε2) samples suffice to learn ε-optimal policies in weakly communicating and in general MDPs, respectively. Both these results circumvent the well-known minimax lower bound of Ω˜(SA1(1−γ)3ε2) for γ-discounted MDPs, and establish a quadratic rather than cubic horizon dependence for a fixed MDP instance.

[13] Received an NIH (Smart and Connected Health R01) award to fund our research on designing and developing AI-powered AR systems to support low vision people in complex activities of daily living. 

Title: SCH: AI-Assisted Vision: Scene-Aware Augmented Reality Systems to Support Low Vision People in Activities of Daily Living

PIs: Yuhang Zhao (PI), Yapeng Tian (co-PI) from UT-Dallas, Jon Froehlich (co-PI) from UWashington, Sanbrita Mondal (co-I) from UW-Madison

Abstract: Uncorrectable vision loss, a.k.a. low vision, can significantly impact activities of daily living (ADLs), affecting independence, health, and social well-being. Magnifiers are the most commonly used low vision aids, however, they largely reduce visual field. Augmented reality (AR) glasses present a unique opportunity for low vision technology by intelligently recognizing the environment and enhancing users’ vision based on their contexts. However, existing AR solutions mainly rely on image processing techniques, generating pure pixel-level augmentations to the whole scene, such as pixel remapping for visual field loss, and recoloring based on distance. These solutions arbitrarily alter the user’s full visual field with no semantic understanding of the scene, distorting users’ natural vision and diminishing important visual details. As a result, they cannot effectively support complex ADLs that involve constant motion and object interactions, such as cooking in the kitchen, or safely navigating a crowded street. The overall objective of this proposal is to fundamentally advance low vision technology by leveraging state-of-the-art Al techniques and presenting unobtrusive scene-aware AR augmentations that are tailored to users’ environments and visual abilities. Contextualized in two key ADLs (meal preparation, outdoor mobility), the research will investigate how to recognize and augment visual elements in AR from three fundamental dimensions via three specific aims: (A 1) Enhancing perception of visual salience, (A2) Enhancing object affordance for safe and efficient interactions, (A3) Enhancing awareness of environmental dynamics. In each aim, the research will contribute to both HCI and Al by designing scene-aware AR augmentations with low vision users and rehabilitation professionals, and developing egocentric video datasets and Al models to enable fast and robust scene recognition in AR. In the final year, a holistic AR system will be developed and evaluated for real world impact and limitations via field studies (A4). With the Al-powered AR systems for low vision, the project will contribute to the mission of NEI by fundamentally transforming conventional low vision aids and improving independence and quality of life for low vision people in various challenging ADLs. The involvement of low vision participants and rehabilitation professionals throughout the research will also increase their exposure to state-of-the-art technology, potentially increasing the technology acceptance and adoption in low vision rehabilitation. 

[14]  Presented at ECCV ‘24

https://eccv.ecva.net/

“Photon Inhibition for Energy-Efficient Single-Photon Imaging” (link to details)

Lucas J. Koerner, Shantanu Gupta, Atul Ingle, Mohit Gupta 

Single-photon cameras (SPCs) are emerging as sensors of choice for various challenging imaging applications. One class of SPCs based on the single-photon avalanche diode (SPAD) detects individual photons using an avalanche process; the raw photon data can then be processed to extract scene information under extremely low light, high dynamic range, and rapid motion. Yet, single-photon sensitivity in SPADs comes at a cost — each photon detection consumes more energy than that of a CMOS camera. This avalanche power significantly limits sensor resolution and could restrict widespread adoption of SPAD-based SPCs. We propose a computational-imaging approach called photon inhibition to address this challenge. Photon inhibition strategically allocates detections in space and time based on downstream inference task goals and resource constraints. We develop lightweight, on-sensor computational inhibition policies that use past photon data to disable SPAD pixels in real-time, to select the most informative future photons. As case studies, we design policies tailored for image reconstruction and edge detection, and demonstrate, both via simulations and real SPC captured data, considerable reduction in photon detections (over 90% of photons) while maintaining task performance metrics. Our work raises the question of “which photons should be detected?”, and paves the way for future energy-efficient single-photon imaging.

“Light-in-Flight for a World-in-Motion” (link to details)

Jongho Lee, Ryan J. Suess, Mohit Gupta

Although time-of-flight (ToF) cameras are becoming the sensor-of-choice for numerous 3D imaging applications in robotics, augmented reality (AR) and human-computer interfaces (HCI), they do not explicitly consider scene or camera motion. Consequently, current ToF cameras do not provide 3D motion information, and the estimated depth and intensity often suffer from significant motion artifacts in dynamic scenes. In this paper, we propose a novel ToF imaging method for dynamic scenes, with the goal of simultaneously estimating 3D geometry, intensity, and 3D motion using a single indirect ToF (I-ToF) camera. Our key observation is that we can estimate 3D motion, as well as motion artifact-free depth and intensity by designing optical-flow-like algorithms that operate on coded correlation images captured by an I-ToF camera. Through the integration of a multi-frequency I-ToF approach with burst imaging, we demonstrate high-quality all-in-one (3D geometry, intensity, 3D motion) imaging even in challenging low signal-to-noise ratio scenarios. We show the effectiveness of our approach through thorough simulations and real experiments conducted across a wide range of motion and imaging scenarios, including indoor and outdoor dynamic scenes.

[15] Prof Matt Sinclair gave an invited talk at the DOE Energy-Efficient Computing for Science Workshop (https://web.cvent.com/event/a3dd901a-699e-408c-8a84-81445e6ea64f/summary). 
 

Title: Creating Flexible, High Fidelity Energy Modeling for Future HPC Systems

Authors: Matthew D. Sinclair, Bobby Bruce, William Godoy,  Oscar Hernandez, Jason Lowe-Power, and Shivaram Venkataraman

Abstract: Energy efficient, large scale, co-designed systems will be necessary in the future. As we increasingly adopt AI, workloads are increasingly bounded by new scaling laws, where the loss scales as a power-law with model size, dataset size, and the amount of compute used for training. Thus, to improve accuracy, scaling requires exponentially increasing resources. This shift means that AI growth requires more energy at a faster pace in a manner distinct from prior patterns, making energy efficiency even more critical. Accordingly, future HPC system designs need to optimize increasingly heterogeneous hardware and applications for energy consumption, without significantly compromising performance. To drive these efforts, there is a need for credible, open-source infrastructure that can study the energy effects of new systems early in the design process, before silicon is produced. Such infrastructure will help study concrete changes to the existing state-of-the-art and evaluate the potential delta of more radical changes. Traditionally, developers rely on simulation and modeling techniques to model and prototype energy consumption.  Unfortunately, the tools needed to co-design systems for performance and energy are lacking for both existing and future systems. In this talk, we discuss our vision to design new energy models which can improve accuracy for post-tape out hardware, first principle methods for estimating energy use in future systems by leveraging open-source hardware, and finally integration of energy models into tools such as simulators and development environments.

Link: https://drive.google.com/file/d/1aVnUe9hOqNw8DmWRP9aCbf6P9CTb_9B4/view

[16] CDIS transition announced: https://cdis.wisc.edu/erickson-steps-down-arpaci-dusseau-to-lead-school-of-computer-data-information-sciences/

Tom Erickson, founding director of the School of Computer, Data & Information Sciences (CDIS) and executive associate dean for strategy and innovation at the University of Wisconsin–Madison’s College of Letters & Science, has announced he will step down from his leadership roles effective January 2025. Remzi Arpaci-Dusseau, the Grace Wahba and Vilas Distinguished Achievement Professor of Computer Sciences, will assume leadership of CDIS.

November 2024

[1] ACSAC 2024 Test of Time Award

Year: ACSAC 2009

Title: Semantically Rich Application-Centric Security in Android

Authors: Ongtang, Machigar; McLaughlin, Stephen; Enck, William; McDaniel, Patrick

“On the occasion of ACSAC’s 35th Anniversary, the Steering Committee instituted the inaugural Test of Time Awards. These awards provide an opportunity to honor papers that have been published at ACSAC that have had enduring significance and impact to the security community. The committee considered papers published more than 15 years ago, and discussed their impact on academia, industry, and government.”

From: https://www.acsac.org/2023/program/test_of_time/

[2] Today was the regional round of this year’s International Collegiate Programming Contest (ICPC). UW-Madison participated with 6 teams at a site on the Epic campus. Our teams placed 1st, 4th, 7th, 8th, 10th, and 20th among the 96 teams in our region. Our top team was the only to solve 9 of the problems. See the scoreboard for more details.

This is the 6th year in a row that UW-Madison places first in the regional competition, and the 16th year since we started participating in 2001. Our top team advances to the North America Championship, where about 15 teams qualify for the world finals. Given our silver medal at the world finals last year, we’re going for gold this year! 😊

Congratulations to all our teams and thanks to the people at Epic for running a regional site on their campus!

The UW-Madison teams that participated in the regional competition:
– Gryffindor (1st place:Anton Cheng, Harry Huang, Tuan Tai Nguyen 

– .sort()[-1] (4th place): Tan Bui, HuaiYuan Jing, Merlin Morton 

– Mini Soda (7th place): Guanlin Chen, Xinchen Shen, Enze Zhang 

– Apples are great (8th place): Brandon Han, Ishan Joshi, Xuan Nghia Nguyen 

– JQKA (10th place): Anda He, Kalyan Sriram, Jierui Xu 

– chr(sum(range(ord(min(str(not())))))) (20th place): Ansh Aggarwal, Anna Sun, Austin Tsai

[3] Four papers accepted at NeurIPS:

The AlCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators (Spotlight)

Tzu-Heng Huang, Catherine Cao, Vaishnavi Bhargava, Frederic Sala

Abstract: Abstract: Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, Alchemist, obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a 12.9% enhancement while the total labeling costs across all datasets are reduced by a factor of approximately 500×.

https://arxiv.org/abs/2407.11004

OTTER: Effortless Label Distribution Adaptation of Zero-shot Models 

Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala

Abstract: Popular zero-shot models suffer due to artifacts inherited from pretraining. One particularly detrimental issue, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have mismatching requirements, such as needing access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like prior matching—often by significant margins—in 17 out of 21 datasets.

https://arxiv.org/abs/2404.08461

Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks 

Tianyi Zhang, Linrong Cai, Jeffrey Li, Nicholas Roberts, Neel Guha, Frederic Sala

Abstract: Weak supervision (WS) is a popular approach for label-efficient learning, leveraging diverse sources of noisy but inexpensive weak labels to automatically annotate training data. Despite heavy usage, the value of WS is challenging to benchmark due to its complexity: the knobs involved include data sources, labeling functions (LFs), aggregation techniques, called label models (LMs), and end model pipelines. Existing evaluation suites tend to be limited, focusing on particular components or specialized use cases, or relying on simplistic benchmark datasets with poor LFs, producing insights that may not generalize to real-world settings. We address these by introducing a new benchmark, BoxWRENCH, designed to more accurately reflect real-world usage of WS. This benchmark features (1) higher class cardinality and imbalance, (2) substantial domain expertise requirements, and (3) linguistic variations found in parallel corpora. We improve upon existing benchmark LFs using a rigorous procedure aimed at mimicking real-world settings. In contrast to existing WS benchmarks, we show that in many practical settings supervised learning requires substantial amounts of labeled data to match WS performance.

https://openreview.net/forum?id=c7SApXZz4b#discussion

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling 

Harit Vishwakarma, Yi Chen, Sui Jiet Tay, Satya Sai Srinath Namburi, Frederic Sala, Ramya Korlakai Vinayak

Abstract: Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual annotation. A prominent variant, threshold-based auto-labeling (TBAL), works by finding thresholds on a model’s confidence scores above which it can accurately automatically label unlabeled data. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, we show that such methods fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the optimal TBAL confidence function. We develop a tractable version of the framework to obtain Colander (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of Colander and compare it against methods designed for calibration. Colander achieves up to 60% improvement on coverage over the baselines while maintaining error level below 5% and using the same amount of labeled data.

https://arxiv.org/abs/2404.16188

[4] On Friday, Nov. 1, CS Assistant Professor Sharon Li gave a “Friday Lunch” talk with the UW-Madison Center for the Humanities, titled: “Steering Large Language Models by Human Preferences”.

These informal, dynamic (and popular!) Friday Lunch discussions are hosted over a catered lunch at Memorial Union to foster academic discussion and provide a space for interdisciplinary dialogue and community building across campus. Seats are limited – registration is required and available on a first-come basis.

Abstract: Large language models (LLMs) trained on massive datasets exhibit remarkable abilities, but at the same time, these models can inadvertently generate misinformation and harmful outputs. This concern underscores the urgent challenge of language model alignment: ensuring these models’ behaviors agree with human preferences and safety considerations. In recent years, a spectrum of alignment strategies have emerged, with prominent methods showcasing the effectiveness of reinforcement learning with human feedback (RLHF). RLHF has gained widespread adoption among state-of-the-art models, including OpenAI’s GPT-4, Anthropic’s Claude, Google’s Bard, and Meta’s Llama 2-Chat. A pivotal component within RLHF is proximal policy optimization, which employs an external reward model that mirrors human preferences for its optimization process. Despite the promise, RLHF suffers from unstable and resource-intensive training. Furthermore, the need to repeat PPO training when altering the reward model hinders rapid customization to evolving datasets and emerging needs. In this Friday Lunch talk, Assistant Professor Sharon Li will discuss alternative paradigms on how we can achieve alignment without RL training. 

Bio: Sharon Yixuan Li is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. She received a Ph.D. from Cornell University in 2017, advised by John E. Hopcroft. Subsequently, she was a postdoctoral scholar in the Computer Science department at Stanford University. Her research focuses on the algorithmic and theoretical foundations of learning in open worlds. She is the recipient of the AFOSR Young Investigator Program (YIP) award, the NSF CAREER award, the MIT Technology Review TR-35 Award, Forbes “30 Under 30” in Science, and multiple faculty research awards from Google, Meta, and Amazon. Her works received a NeurIPS Outstanding Paper Award, and an ICLR Outstanding Paper Award Honorable Mention in 2022.

[5] “Three Chairs and a Research Booth: How One PhD Advisor Shaped the SC Committee”

by Melyssa Franklin

October, 2024

“An affable, earnest guy, Dr. Barton Miller, the Vilas Distinguished Achievement Professor in the Computer Sciences Department at the University of Wisconsin-Madison (UW), has made an oversized impact on the SC Conference. Since 2012, three of Miller’s graduate students have served as Conference Chair: Dr. Jeffrey Hollingsworth (SC12), Dr. Dorian Arnold (SC23), and now Dr. Phillip Roth (SC24). In addition, many of his advisees have served on the committee in other roles, and their PhD students are now coming along as SC volunteers as well. 

Is there something in the water in Madison? What magic words of wisdom has Dr. Miller imparted to his students to spur them to participate in the SC committee and aspire to leadership roles? 

Speaking to Dr. Miller, it’s clear his graduate students are some of his favorite people. They become good friends – almost family. And to him, SC is like a family reunion. ‘SC is the center of the HPC universe,” he enthused. “Everything that’s happening in the field is there. It creates community.’” 


Read the rest here: https://sc24.supercomputing.org/2024/10/three-chairs-and-a-research-booth-how-one-phd-advisor-shaped-the-sc-committee/

[6] “Apple Day” @ Wisc

November 13th, 2024

“I wanted to share that we hosted an Apple Day on campus next week Wednesday, November 13th.  They will provide an information session and then invite students for a 1:1 if they email their resumes to the email provided on the flyer.  

Please encourage all students to attend to learn more.  Although software development background is their focus for interns, there are other opportunities also.  

This is the first engagement since we secured a partnership with their CoreOS team and have lots of opportunity to grow this collaboration.  Thanks for sharing and let me know if you have any questions.”  -Justin Hines

[7] ‘Trial by fire’: UW-Madison hackathon generates tech ideas in 24 hours
Becky Jacobs, Nov 12 2024

Cap Times

“After 24 hours of programming and coding — fueled by plenty of snacks and energy drinks — the organizers of an annual hackathon event at the University of Wisconsin-Madison announced which of the 55 projects would move on as finalists.”

Read the rest:

https://captimes.com/news/education/trial-by-fire-uw-madison-hackathon-generates-tech-ideas-in-24-hours/article_c5c0e150-a06a-11ef-bb55-7f08b2a130ec.html

MadHacks:
https://www.madhacks.io/

[8] “Contracting with a Learning Agent”

Guru Guruganesh · Yoav Kolumbus · Jon Schneider · Inbal Talgam-Cohen · Emmanouil-Vasileios Vlatakis-Gkaragkounis · Joshua Wang · S. Weinberg

Abstract: Real-life contractual relations typically involve repeated interactions between the principal and agent, where, despite theoretical appeal, players rarely use complex dynamic strategies and instead manage uncertainty through learning algorithms.In this paper, we initiate the study of repeated contracts with learning agents, focusing on those achieving no-regret outcomes. For the canonical setting where the agent’s actions result in success or failure, we present a simple, optimal solution for the principal: Initially provide a linear contract, then switch to a zero-scalar contract. This shift causes the agent to “free-fall” through their action space, yielding non-zero rewards for the principal at zero cost. Interestingly, despite the apparent exploitation, there are instances where our dynamic contract can make \emph{both} players better off compared to the best static contract. We then broaden the scope of our results to general linearly-scaled contracts, and, finally, to the best of our knowledge, we provide the first analysis of optimization against learning agents with uncertainty about the time horizon.

Paper: https://arxiv.org/abs/2401.16198

[9] PRIME grant

PI: Ming LiuAbstract: EFabric provides on-demand supercomputing solutions to empower AI innovation and deployment. We offer agile, flexible, and low-cost access to customized supercomputing clusters for startups, research labs,  grad students, and enterprises. EFabric is built atop emerging load-store interconnects, instead of traditional Ethernet/Infiniband. It organizes computation resources into different pools and independently scales them based on users’ needs. You can just focus on building AI applications, tuning AI models, and exploring AI deployments without worrying about the underlying computing infrastructure.

https://d2p.wisc.edu/funding/prime/

[10] “Adversarial Machine Learning: A 10-year Perspective”

Patrick McDaniel

Distinguished Lecture, UIUC

Abstract: Machine learning is revolutionizing technology and society. However, like any technology, it can be misused by adversaries and our understanding of the potential threats remains limited. If machine learning is to part of any critical decision making process, we must be able to detect and mitigate threats at both training and inference phases. Over the past decade, I have studied a myriad of threats to machine learning at all stages of its lifecycle and proposed a variety of methods that bring us closer towards trustworthy machine learning systems. In this talk, I will discuss the arc of adversarial machine learning, how threat models have evolved, and how the community has gained new insights into the root causes of the vulnerabilities present in machine learning systems today.

Bio: Patrick McDaniel is the Tsun-Ming Shih Professor of Computer Sciences in the School of Computer, Data & Information Sciences at the University of Wisconsin-Madison. Professor McDaniel is a Fellow of IEEE, ACM and AAAS, a recipient of the SIGOPS Hall of Fame Award and SIGSAC Outstanding Innovation Award, and the director of the NSF Frontier Center for Trustworthy Machine Learning. He also served as the program manager and lead scientist for the Army Research Laboratory’s Cyber-Security Collaborative Research Alliance from 2013 to 2018. Patrick’s research focuses on a wide range of topics in computer and network security and technical public policy. Prior to joining Wisconsin in 2022, he was the William L. Weiss Professor of Information and Communications Technology and Director of the Institute for Networking and Security Research at Pennsylvania State University. 

https://calendars.illinois.edu/detail/1570?eventId=33502561

[11] Reducing the GAP: Improving the Fidelity and Scalability of gem5’s GPU Models

Matt Sinclair

Invited Talk at CAMS ‘24

The breakdown in Moore’s Law and Dennard Scaling is leading to drastic changes in the makeup and constitution of computing systems. For example, a single chip integrates 10-100s of cores and has a heterogeneous mix of general-purpose compute engines and highly specialized accelerators. Traditionally, computer architects have relied on tools like architectural simulators to accurately perform early-stage prototyping and optimizations for the proposed research. However, as systems become increasingly complex and heterogeneous, architectural tools are straining to keep up. In particular, publicly available architectural simulators are often not very representative of the industry parts they intend to represent. This leads to a mismatch in expectations; when prototyping new optimizations researchers may draw the wrong conclusions about the efficacy of proposed optimizations if the tool’s models do not provide high fidelity. Moreover, modeling and simulation tools are also struggling to keep pace with increasingly large, complex workloads from domains such as machine learning (ML). 

In this talk, I will discuss our work on improving the open source, publicly available GPU models in the widely used gem5 simulator. gem5 can run entire systems, including CPUs, GPUs, and accelerators, as well as the operating system, runtime, network, and other related components. Thus, gem5 has the potential to allow users to study the behavior of the entire heterogeneous systems. Unfortunately, some of gem5’s publicly available models do not always provide high accuracy relative to their ”real” counterparts, especially for the memory subsystem. I will discuss my group’s efforts to overcome these challenges and improve the fidelity of gem5’s GPU models, as well as our ongoing efforts to scalably run modern ML and HPC workloads in frameworks such as PyTorch and TensorFlow in gem5. Collectively, this work significantly enhances the state-of-the-art and enables more widespread adoption of gem5 as an accurate platform for heterogeneous architecture research.

[12] Two papers at Asiacrypt ‘24

https://asiacrypt.iacr.org/2024/

“Non-interactive Blind Signatures: Post-quantum and Stronger Security”

Foteini Baldimtsi, Jiaqi Cheng, Rishab Goyal, Aayush Yadav

“Leakage-Resilient Incompressible Cryptography: Constructions and Barriers”

Kaartik Bhushan, Rishab Goyal, Venkata Koppula, Varun Narayanan, Manoj Prabhakaran, Mahesh Sreekumar Rajasree

December 2024

[1]  “From Foe to Friend: The Surprising Turn of Mega Constellations in Radio Astronomy”

HotNets ‘25

Cheap spaceflight has ushered in an explosive growth era for Low Earth Orbit (LEO) satellites. While this has brought us LEO satellite megaconstellations for ubiquitous highspeed data, it has also enabled a proliferation of nanosatellites (e.g. CubeSats) launched by diverse organizations. An unfortunate side-effect is harmful interference to sensitive receivers like those of radio astronomy — no place on Earth is safe. How can we enjoy the fruits of the satellite revolution without blinding ourselves to the secrets of the universe?

Networking is the key. This paper proposes InOrbitNet, which aggregates and backhauls traffic from low-capability nanosatellites using highly-capable LEO megaconstellations. By simulating LEO and nanosatellite orbit transitions, we show that orders-of-magnitude reductions in latency and significant increases in capacity are possible as compared to the current non-networked direct-to-ground approach. But more importantly, because LEO megaconstellations are highly capable and tightly managed, this consolidation of RF footprints also allows radio astronomy to be protected from interference.

https://conferences.sigcomm.org/hotnets/2024/papers/hotnets24-62.pdf

and

“WiFi Physical Layer Stays Awake and Responds When it Should Not”

WF-IoT

WiFi communication should be possible only between devices inside the same network. However, we find that all existing WiFi devices send back acknowledgments (ACKs) to even fake packets received from unauthorized WiFi devices outside of their network. Moreover, we find that an unauthorized device can manipulate the power-saving mechanism of WiFi radios and keep them continuously awake by sending specific fake beacon frames to them. Our evaluation of over 5000 devices from 186 vendors confirms that these are widespread issues. We believe these loopholes cannot be prevented, and hence they create privacy and security concerns. Finally, to show the importance of these issues and their consequences, we implement and demonstrate two attacks where an adversary performs battery drain and WiFi sensing attacks just using a tiny WiFi module which costs less than ten dollars.

https://arxiv.org/abs/2301.00269

 

[2] https://chtc.cs.wisc.edu/unravelling-antibiotic-resistance.html

“I have stuck with the OSPool because the access is unprecedented—the amount of compute time you can get, the number of jobs you can run at a time”

Erik Wright, Associate Professor of Biomedical Informatics at the University of Pittsburgh

As bacterial infections evolve to develop resistance to drugs, it is critical to investigate methodologies to slow resistance and, potentially, even reverse it. In his quest to discover new antibiotics to counter resistance, Erik Wright’s project, BioMedInfo, aims to compare bacteria with resistant genomes by grouping them into a network of genes with related functions.

Read more in the link above!

[3] https://new.nsf.gov/events/

[4]  From our lab, led by UW CS students:

Exploring the Use of Robots for Diary Studies

Michael F Xu, Bilge Mutlu

Designing Telepresence Robots to Support Place Attachment

Yaxin Hu, Anjun Zhu, Catalina Toma, Bilge Mutlu

With collaborators:

Making Sense of Public Space for Robot Design

Hannah Pelikan, Bilge Mutlu, Stuart Reeves

(w/ European collaborators)

The Connection-Coordination Rapport (CCR) Scale

Ting-Han Lin, Hannah Dinner, Tsz Long Leung, Bilge Mutlu, Greg Trafton, Sarah Sebo

(w/ U Chicago collaborators)

ImageInThat: Manipulating Images to Convey User Instructions to Robots

Karthik Mahadevan, Blaine Lewis, Jiannan Li, Bilge Mutlu, Anthony Tang, Tovi Grossman

(w/ U Toronto collaborators)

[5] “SYSTEMS, METHODS, AND MEDIA FOR EULERIAN SINGLE-PHOTON COMPUTER VISION”

Authors: Shantanu Gupta, Mohit Gupta

https://www.freepatentsonline.com/y2024/0370974.html

In accordance with some embodiments, systems, methods, and media for Eulerian single-photon computer vision are provided. In some embodiments, the system comprises: an image sensor comprising detectors configured to detect arrival of individual photons, and arranged in an array; a processor programmed to: cause the image sensor to generate a sequence of images representing a scene, each of the images comprising a plurality of pixels; perform, for each of a plurality of three dimensional filters, a convolution between the three dimensional filter and a plurality of frames, wherein each of the plurality of frames is based on one or more of the images of the sequence of images; generate, for each of the plurality of frames, a plurality of filter bank responses each corresponding to a three dimensional filter of the plurality of three dimensional filters; and perform a computer vision process based on the plurality of filter responses.

[6] Matthew Zurek is the winner of the INFORMS Applied Probability Society Best Student Paper Prize.

https://connect.informs.org/aps/apsawards/awards

The relevant paper is https://arxiv.org/abs/2403.11477

[7] “Non-interactive Blind Signatures: Post-quantum and Stronger Security”

with my students and external collaborator (Foteini Baldimtsi, Jiaqi Cheng, Rishab Goyal, Aayush Yadav)

Asiacrypt 2024

https://eprint.iacr.org/2024/614

“Leakage-Resilient Incompressible Cryptography: Constructions and Barriers”

with my co-advised postdoc and external collaborators (Kaartik Bhushan, Rishab Goyal, Venkata Koppula, Varun Narayanan, Manoj Prabhakaran, Mahesh Sreekumar Rajasree)

https://www.iacr.org/cryptodb/data/paper.php?pubkey=34710

“Multi-Authority Functional Encryption with Bounded Collusions from Standard Assumptions”

with my student (Rishab Goyal, Saikumar Yadugiri)”

TCC 2024

https://link.springer.com/chapter/10.1007/978-3-031-78020-2_1

“Incompressible Functional Encryption”

with my co-advised postdoc and external collaborators (Rishab Goyal; Venkata Koppula, Mahesh Sreekumar Rajasree, Aman Verma)

ITCS 2025

https://eprint.iacr.org/2024/798

[8] “Simulating Machine Learning Models at Scale”

Authors: Vishnu Ramadas, Matthew D. Sinclair

Abstract: Traditionally, architectural simulators are used to perform early exploration. However, detailed simulation of modern systems can take extremely long times in existing tools. Furthermore, prototyping optimizations at scale can also be challenging, especially for newly proposed accelerators.  In this talk we discuss our efforts to enhance gem5’s support to make simulating AI/ML workloads practical to run while retaining accuracy. This will enable users to use the same early exploration platform, gem5, for both homogeneous and heterogeneous systems. Specifically, we are extending gem5’s existing simulation infrastructure to create a scalable simulation framework that models diverse system components with varying levels of detail. We will also develop novel mechanisms that identify the most important parts of the applications to simulate at high fidelity while reducing simulation time and add support for multi-chiplet and multi-node (e.g., multi-GPU and multi-accelerator) simulations to drive a flexible, adaptable simulation infrastructure that enables rapid prototyping of different optimized AI and ML algorithms and architectures while ensuring that the changes are representative of real, modern hardware and software.

[9] “RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator”

Jie Zhang, Hongjing Huang, Xuzheng Xu, Xiang Li, Ming Liu, Zeke Wang

Abstract: The emerging microservice/serverless-based cloud programming paradigm and the rising networking speeds leave the RPC stack as the predominant data center tax. Domain-specific hardware acceleration holds the potential to disentangle the overhead and save host CPU cycles. However, state-of-the-art RPC accelerators integrate RPC logic into the CPU or use specialized low-latency interconnects, hardly adopted in commodity servers. To this end, we design and implement RPCAcc, a software-hardware co-designed RPC on-NIC accelerator that enables reconfigurable RPC kernel offloading. RPCAcc connects to the server through the most widely used PCIe interconnect. To grapple with the ramifications of PCIe-induced challenges, RPCAcc introduces three techniques:(a) a target-aware deserializer that effectively batches cross-PCIe writes on the accelerator’s on-chip memory using compacted hardware data structures; (b) a memory-affinity CPU-accelerator collaborative serializer, which trades additional host memory copies for slow cross-PCIe transfers; (c) an automatic field update technique that transparently codifies the schema based on dynamic reconfigure RPC kernels to minimize superfluous PCIe traversals. We prototype RPCAcc using the Xilinx U280 FPGA card. On HyperProtoBench, RPCAcc achieves 3.2× lower serialization time than a comparable RPC accelerator baseline and demonstrates up to 2.6× throughput improvement in the end-to-end cloud workload.

https://arxiv.org/abs/2411.07632

[10] “Optimizing Quantum Circuits, Fast and Slow”

Amanda Xu, Abtin Molavi, Swamit Tannu, Aws Albarghouthi

ASPLOS 2025

https://pages.cs.wisc.edu/~aws/papers/asplos25.pdf

Optimizing quantum circuits is critical: the number of quan- tum operations needs to be minimized for a successful eval- uation of a circuit on a quantum processor. In this paper we unify two disparate ideas for optimizing quantum circuits, rewrite rules, which are fast standard optimizer passes, and unitary synthesis, which is slow, requiring a search through the space of circuits. We present a clean, unifying framework for thinking of rewriting and resynthesis as abstract circuit transformations. We then present a radically simple algo- rithm, guoq, for optimizing quantum circuits that exploits the synergies of rewriting and resynthesis. Our extensive evaluation demonstrates the ability of guoq to strongly out- perform existing optimizers on a wide range of benchmarks.

[11] “Making Serverless Pay-For-Use a Reality with Leopard”

Tingjia Cao, A. Arpaci-Dusseau, R. Arpaci-Dusseau, T. Caraza-Harter

NSDI ‘25

Abstract: Serverless computing has gained traction due to its event-driven architecture and ”pay for use” (PFU) billing model. However, our analysis reveals that current billing practices do not align with the true resource consumption. This paper challenges the prevailing SLIM (static, linear, interactive-only model) assumptions that underpin existing billing models, demonstrating that current billing does not realize PFU for realistic workloads. We introduce the Nearly Pay-for-Use (NPFU) billing model, which accommodates varying CPU and memory demands, spot cores, and preemptible memory. We also introduce Leopard, an NPFU-based serverless platform that integrates billing awareness into several major subsystems: CPU scheduler, OOM killer, admission controller, and cluster scheduler. Experimental results indicate that Leopard benefits both providers and users, increasing throughput by more that 2x and enabling cost reductions.

[12] https://badgerherald.com/opinion/column/2024/12/04/computer-science-student-enrollment-woes-help-department-pull-off-a-miracle/

It begins: “If you spend enough time around computer science majors, one of the things you’re likely to hear is a long series of complaints about course enrollment. That’s been my experience living with two CS majors, and after two years of hearing about the shortcomings of the course enrollment process in the CS department, I decided to take a closer look at the problem myself.”Read the rest at the link above!

February 2025

[1] Vilas Distinguished Achievement Professor awarded to Prof Jerry Zhu.

https://provost.wisc.edu/vilas-distinguished-achievement-professorship/

Vilas Distinguished Achievement Professorships (VDAP) recognize UW–Madison faculty members whose distinguished scholarship has advanced the confines of knowledge, and whose excellence also includes teaching or service. Faculty members receiving this award keep the Vilas Distinguished Achievement Professorship title for the duration of their careers.

[2] Prof Jelena Diakonikolas wins NSF CAREER award.

https://www.cs.wisc.edu/2025/01/29/jelena-diakonikolas-receives-nsf-career-award/

Assistant Professor Jelena Diakonikolas, a researcher in the field of large-scale optimization, recently received a National Science Foundation (NSF) CAREER Award for her proposal “Optimization and Learning with Changing Distributions.” The award comes with $700,000 in funding to support research and education initiatives through 2029.

[3] Prof Paul Barford wins Chancellor’s Teaching Innovation Award.

https://news.wisc.edu/2025-distinguished-teaching-award-recipients-announced/

Twelve faculty members have been chosen to receive this year’s Distinguished Teaching Awards, an honor given out since 1953 to recognize some of the university’s finest educators.

[4] Prof Patrick McDaniel gives Distinguished Lecture at Iowa State

Adversarial Machine Learning: A 10-year Perspective, Distinguished Lecture,

Iowa State University, Ames, IA, January, 2025.

Older video of said talk from UIUC: https://mediaspace.illinois.edu/media/t/1_19ei0f5b/180208991

[5] Prof Miron Livny gives talk on OSG and PATh

OSG (Open Science Grid) and PATh (Partnership to Advance Throughput Computing). PATh is a partnership launched by the NSF in 2020 between UW-Madison Center for High Throughput Computing (CHTC) and the OSG Consortium to advance Throughput Computing. Established in 2005, the OSG Consortium operates a fabric of distributed High Throughput Computing (dHTC) services in support of the National Science & Engineering community. The research collaborations, campuses, national laboratories, and software providers that form the consortium are unified in their commitment to advance open science via these services.

[6]  https://www.acm.org/media-center/2025/january/fellows-2024

Dr. Satish Chandra (Google), who did his Ph.D. under former Prof Jim Larus, is one of this year’s ACM Fellows.  Congratulations Satish! Satish received our department’s Outstanding Graduate Student Research Award in 1997 (see here). To our knowledge (thanks Prof Tom Reps!), Satish is the third UW CS Ph.D. from the PL group who has been named an ACM Fellow, along with Tom Ball (Microsoft) and G. Ramalingam (Microsoft).

[7] The ACM India Council is delighted to announce Dr. Arkaprava (Arka) Basu of IISc as the recipient of the ACM India Early Career Researcher Award for 2024.

https://www.acm.org/articles/acm-india-bulletins/2025/ecr-award-2024

The ACM India Early Career Researcher (ECR) Award recognizes individuals in early stages of their careers, who have made fundamental, innovative, and impactful contributions to the computing field while primarily working in India. The ECR award carries a prize of ₹15 lakhs. Financial support for this award is provided by Persistent Foundation.

Basu is being recognized “for creative and impactful contributions towards efficient memory management for GPUs and CPUs through innovations spanning the software and hardware stacks.” Arkaprava is a leading systems researcher in India with a track record of consistently publishing impactful research in flagship conferences and exhibiting leadership roles at the top conferences of his field. Arkaprava’s research focuses on the software and the hardware of Graphics Processing Units (GPUs), which is the workhorse for most AI tasks in today’s data-driven world. His work takes a holistic view of a system design by cutting across the software and hardware stack from the compiler, runtimes, and operating system to the hardware architecture. Besides top academic publications, he also has deep collaborations with and long term consultancy for tech giants in the field.

Basu is an associate professor at the Indian Institute of Science (IISc), Bangalore. Before joining IISc in 2018, he worked as a full-time member of the research staff at AMD Research in Austin, USA, for four years. Earlier, Arkaprava obtained his PhD in computer science from the University of Wisconsin-Madison, USA in December 2013.

Dissertation: https://research.cs.wisc.edu/multifacet/theses/arka_basu_phd.pdf

The ECR award recipient is chosen by a jury comprising eminent personalities in the computing community. The jury follows a rigorous process wherein each nomination is individually reviewed by all the committee members, then collated and deliberated upon in multiple evaluation rounds, and finally decided in the presence of a Steering Committee-appointed observer.

[8] SET-PAiREd: Designing for Parental Involvement in Learning with an AI-Assisted Educational Robot

Hui-Ru Ho, Nitigya Kargeti, Ziqi Liu, Bilge Mutlu

https://arxiv.org/pdf/2305.02525

AI-assisted learning companion robots are increasingly used in early education. Many parents express concerns about content appropriateness, while they also value how AI and robots could supplement their limited skill, time, and energy to support their children’s learning. We designed a card-based kit, SET, to systematically capture scenarios that have different extents of parental involvement. We developed a prototype interface, PAiREd, with a learning companion robot to deliver LLM-generated educational content that can be reviewed and revised by parents. Parents can flexibly adjust their involvement in the activity by determining what they want the robot to help with. We conducted an in-home field study involving 20 families with children aged 3–5. Our work contributes to an empirical understanding of the level of support parents with different expectations may need from AI and robots and a prototype that demonstrates an innovative interaction paradigm for flexibly including parents in supporting their children.

VeriPlan: Integrating Formal Verification and LLMs into End-User Planning

Christine Lee, David Porfirio, Jessica Wang, Kevin Chenkai Zhao, Bilge Mutlu

Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features—a rule translator, flexibility sliders, and a model checker—that engage users in the verification process. Through a user study ($n=12$), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

Bridging Generations using AI-Supported Co-Creative Activities

Callie Kim, Arissa Sato, Nathan White, Hui-Ru Ho, Christine Lee, Yuna Hwang, Bilge Mutlu

Intergenerational collaboration between grandparents and grandchildren can be challenging due to differences in technological familiarity. AI has emerged as a promising tool to support collaborative activities, offering flexibility and creative assistance, but its role in facilitating intergenerational collaboration remains underexplored. In this study, we conducted a user study with 29 grandparent-grandchild groups engaged in AI-supported story creation to examine how AI-assisted co-creation can facilitate collaboration. Our findings show that grandchildren managed the technical aspects, while grandparents contributed creative ideas and guided the storytelling. AI played a key role in structuring the activity, facilitating brainstorming, enhancing storytelling, and balancing the contributions of both generations. The process fostered mutual appreciation, with each generation recognizing the strengths of the other, leading to an engaging and cohesive co-creation process. We offer design implications for integrating AI into intergenerational co-creative activities, emphasizing how AI can enhance collaboration across skill levels and technological familiarity.

[9] “GOLDYLOC: Global Optimizations & Lightweight Dynamic Logic for Concurrency”

Authors: Suchita Pati, Shaizeen Aga, Nuwan Jayasena, Matthew D. Sinclair

Venue: ACM Transactions on Architecture & Code Optimization (ACM TACO)

Abstract: Modern accelerators like GPUs are increasingly executing independent operations concurrently to improve the device’s compute utilization. However, effectively harnessing it on GPUs for important primitives such as general matrix multiplications (GEMMs) remains challenging. Although modern GPUs have significant hardware and software support for GEMMs, their kernel implementations and optimizations typically assume each kernel executes in isolation and can utilize all GPU resources. This approach is highly efficient when kernels execute in isolation, but causes significant resource contention and slowdowns when kernels execute concurrently. Moreover, current approaches often only statically expose and control parallelism within an application, without considering runtime information such as varying input size and concurrent applications — often exacerbating contention. These issues limit performance benefits from concurrently executing independent operations. Accordingly, we propose GOLDYLOC , which considers the global resources across all concurrent operations to identify performant GEMM kernels, which we call globally optimized (GO)-Kernels. Moreover, GOLDYLOC introduces a lightweight dynamic logic which considers the dynamic execution environment for available parallelism and input sizes to execute performant combinations of concurrent GEMMs on the GPU. Overall, GOLDYLOC improves performance of concurrent GEMMs on a real GPU by up to 2× (18% geomean per workload) and provides up to 2.5x (43% geomean per workload) speedups over sequential execution.

[10] Prof Matt Sinclair participated in a panel entitled “AI’s Energy Future: Balancing Growth, the Grid, and the Environment” and gave a talk “Designing Hardware to Meet the Demands of Machine Learning & Artificial Intelligence” at a WEI event on AI and energy.

Abstract: As use of artificial intelligence grows, so does the demand for electricity and water to power and cool hyper-scale data centers. Many utilities have proposed building additional fossil fuel power plants or keeping older ones running to meet this new demand. Meanwhile, most major computing companies have made commitments to power their operations with clean energy. In this Forum, we’ll explore the potential demands that the growth of AI and data centers may place on the grid and what can be done to mitigate its environmental impact (e.g. improving energy efficiency of computing, or locating and operating data centers in ways that maximize use of renewable energy). How much will electricity demand from data centers truly grow? How might that growth affect both the environment and other electricity customers? And what can we do to ensure that we address that growth in the most sustainable way?

Link: https://energy.wisc.edu/events/ais-energy-future-balancing-growth-grid-and-environment-forward-energy-forum

[11] PhD Theses recently completed include:

Rishabh Khandelwal

Assessing and Enhancing Users’ Online Data Privacy

Advisor: Fawaz, Kassem

Committee: Ramanathan, Parameswaran (Parmesh); Fawaz, Kassem; Chatterjee; Rahul; Zhao, Yuhang

https://search.library.wisc.edu/digital/AGUNKPQPZ23VHP8Y

Anthony Lawrence George Rebello

Correctness and Performance with Declarative System Calls

Advisor: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi

Committee: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi; Swift, Michael; Venkataraman, Shivaram; Sinclair, Matthew

https://search.library.wisc.edu/digital/AWAYYAKZRTR7KC8U

Roger Waleffe

Algorithms and Systems for Scalable Machine Learning over Graphs

Advisor: Wright, Steve

Committee: Wright, Steve; Rekatsinas, Theodoros; Venkataraman, Shivaram; Papailiopoulos, Dimitris

https://asset.library.wisc.edu/1711.dl/ZVOS5EFYR2LHA8L/R/file-000e8.pdf

Edward Barton

Practical Single Shot High Dynamic Range Imaging

Advisor: Hu, Yu Hen

Committee: Hu, Yu Hen; Lee, Yong Jae; Li, Yin; Chen, Yudong

https://www.linkedin.com/in/edwardabarton/

Zhenmei Shi

Understanding Training and Adptation in Feature Learning: From Two-layer Networks to Foundation Models

Advisor: Liang, Yingyu

Committee: Liang, Yingyu; Zhu, Xiaojin; Sala, Frederic; Li, Yin

https://search.library.wisc.edu/digital/ASBHFA7AP6WTKM8K

Jing Liu

Scale, Performance, and Fault-Tolerance in a Filesystem Semi-Microkernel

Advisor: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi

Committee: Arpaci-Dusseau, Andrea; Arpaci-Dusseau, Remzi; Swift, Michael; Yu, Xiangyao; Fawaz, Kassem; Zhang, Irene (MSR)

https://asset.library.wisc.edu/1711.dl/4GNA6MNYXHEDM8Y/R/file-2b9e2.pdf

[12] Masters Theses (or Projects)

Tommy Chang

Shivaram Venkataraman

EVA: Cost-Efficient Cloud-Based Cluster Scheduling

Neeraj Surawar

Matt Sinclair

(Project) NUMA-Aware Queue Scheduler for Multi-Chiplet GPUs

Jean-Charles Noirot Ferrand

Patrick McDaniel

Extracting the Harmfulness Classifier of Aligned LLMs

Kunyang Li

Patrick McDaniel

On Adversarial and Common Robustness of Parameter-Efficient Fine-Tuning Strategies

Jingxin Du

Prof. Kevin Ponto

Towards the Recreation of More Realistic 3D Virtual Museum Tours

Muhammad Danial Saleem

Thesis (6 total credits)

Xiangyao Yu

Evaluating and Improving Bioinformatics Tools Using DBMS Techniques

Sriram Ashokkumar

Thesis (6 total credits)

Dan Negrut

Project Topic: Multimodal Models and LLMs with robotics using Simulation for Training

Shashwatha Mitra Ghante Banuprakash

Thesis (6 total credits)

Rishab Goyal

Lattice based Cryptography

Binwei Yao

Thesis (6 total credits)

Junjie Hu

Multi-Community Preference Alignment of LLMs

Aditya Das Sarma

Thesis (6 total credits)

Rahul Chatterjee

Multi-bit flip-flop clustering for power optimization

Jaden Park

Thesis (6 total credits)

Yong Jae Lee

Data Contamination in Vision Language Models

Rose Ceccio

Thesis (6 total credits)

Rahul Chatterjee

The technology used in intimate partner surveillance

Shengyuan Yang

Thesis (6 total credits)

Thomas Reps

Precise Pointer Analysis for Java Streams

Tara Bobinac

Thesis (6 total credits)

Junjie Hu

Leveraging LLMs to Gauge Perspectives on Social Issues in Latinx and American Communities

Prakriti Garg

Thesis (6 total credits)

Sushmita Roy

Graph representation learning framework for principled analysis of orthologous regulatory regions

David Parra

Project (3 total credits)

Mohit Gupta

Practical 3D Time-of-flight imaging with single-photon cameras.

Keith Johnson

Thesis (6 total credits)

Aws Albarghouthi

Tools and Techniques for Semantics-Guided Synthesis

Alexander Peseckis

Project (3 total credits)

Michael Gleicher

Occlusion-Conscious Viewpoint Optimization

Binwei Yao

Thesis (6 total credits)

Junjie Hu

Pluralistic Alignment of Large Language Models

Suenggwan Jo

Thesis (6 total credits)

Fred Sala

Foundation Model usage for forecasting

[13] ICRA 2025 paper

“Shape-programming Robotic Reflectors for Wireless Networks”

Yawen Liu, Akarsh Prabhakara, Jiangyifei Zhu, Shenyi Qiao, Swarun Kumar

Shape-programming Robotic Reflectors for Wireless Networks
(link not working, yet?)

[14] We had one paper accepted at ICLR and one at NAACL. The latter was led by Yijing Zhang, who was an undergraduate in our program.

Changho Shin, John Cooper, Frederic Sala, “Weak-to-Strong Generalization Through the Data-Centric Lens,” ICLR 2025

https://arxiv.org/abs/2412.03881

Weak-to-Strong Generalization Through the Data-Centric Lens: The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.

Yijing Zhang, Dyah Adila, Changho Shin, Frederic Sala, “Personalize Your LLM: Fake It Then Align It,” NAACL 2025 Findings

https://yijingz02.github.io/files/CHAMELEON_NAACL_.pdf

Personalize Your LLM: Fake It Then Align It: Personalizing large language models (LLMs) is essential for delivering tailored interactions that improve user experience. Many existing personalization methods require fine-tuning LLMs for each user, rendering them prohibitively expensive for widespread adoption. Although retrieval-based approaches offer a more compute-efficient alternative, they still depend on large, high-quality datasets that are not consistently available for all users. To address this challenge, we propose CHAMELEON, a scalable and efficient personalization approach that uses (1) self-generated personal preference data and (2) representation editing to enable quick and cost-effective personalization.  Our experiments on various tasks, including those from the LaMP personalization benchmark, show that CHAMELEON efficiently adapts models to personal preferences, improving instruction-tuned models and outperforms two personalization baselines by an average of 40% across two model architectures.