Fall 2009
The operating systems seminar is held every other Monday afternoon
from 4:00 - 5:00 PM in Computer Sciences & Statistics room 2310.
Keeping up-to-date with current research is a critical task for both students and faculty. A weekly seminar is a fun and social way to keep in touch with other's work. At the seminar, you can eat a few cookies, chitchat about the finer points of mutual exclusion, and exchange ideas with students and faculty working in your field. The seminar schedule is a mix of original research being carried
out at the University of Wisconsin, visitors to our department who will update us on their work, and short presentations and discussions led by current UW students of current operating systems papers.
To subscribe to our mailing list, please visit the mailman page at https://lists.cs.wisc.edu/mailman/listinfo/os-seminar The list traffic is about one or two messages per week to announce the next seminar, and the occasional newsworthy item about operating
systems.
Questions about the seminar and arrangements may be directed to Guoliang Jin (aliang @ [thisServer]).
Schedule
|
|
|
|
Monday
September 14th
4:00 PM 1221 CS
|
Scalable Middleware for Large (Really Large) Scale Systems
I will discuss the problem of developing tools for large scale parallel environments. We are especially interested in systems, both leadership class parallel computers and clusters that have 10,000's or even millions of processors. The infrastructure that we have developed to address this problem is called MRNet, the Multicast/Reduction Network. MRNet's approach to scale is to structure control and data flow in a tree-based overlay network (TBON) that allows for efficient request distribution and flexible data reductions.
The second part of this talk will present an overview of the MRNet design, architecture, and computational model and then discuss several of the applications of MRNet. The applications include scalable automated performance analysis in Paradyn, a vision clustering application and, most recently, an effort to develop our first petascale tool, STAT, a scalable stack trace analyzer running currently on 100,000's of processors.
|
|
Wednesday
September 16th
4:00 PM 1221 CS
|
Addressing Usability Challenges in Current and Future Supercomputers
Luiz Derose
Programming Environments Director, Cray, Inc.
The scale of current and future high end systems, as well as the increasing system software and architecture complexity, brings a new set of challenges for application developers. In order to achieve high performance on peta-scale systems, application developers need a programming environment that can address and hide the issues of scale and complexity of high end HPC systems. Users must be supported by intelligent compilers, automatic performance analysis tools, adaptive libraries, and scalable software. In this talk I will present the Cray programming environment features that are being developed and deployed to improve user's productivity on Cray Supercomputers. I will also discuss some of the challenges and open research problems that need to be addressed to build the system software for peta-scale systems.
Bio: Dr. Luiz DeRose is a Senior Principal Engineer and the Programming Environments Director at Cray Inc, where he is responsible for the programming environment strategy for all Cray systems. Before joining Cray in 2004, he was a research staff member and the Tools Group Leader at the Advanced Computing Technology Center at IBM Research. Dr. DeRose had a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. With more than 20 years of high performance computing experience and a deep knowledge of its programming environments, he has published more than 40 peer-review articles in scientific journals, conferences, and book chapters, primarily on the topics of compilers and tools for high performance computing.
|
|
Monday
October 5th
4:00 PM 2310 CS
|
Heterogeneous Multiprocessing with Satellite Kernels
In this talk I will describe Helios, which is a new operating system designed to simplify the task of writing, deploying, and tuning applications for heterogeneous platforms. Helios introduces satellite kernels, which export a single, uniform set of OS abstractions across CPUs of disparate architectures and performance characteristics. Access to I/O services such as file systems are made transparent via remote message passing, which extends a standard microkernel message-passing abstraction to a satellite kernel infrastructure. Helios retargets applications to available ISAs by compiling from an intermediate language. To simplify deploying and tuning application performance, Helios exposes an affinity metric to developers. Affinity provides a hint to the operating system about whether a process would benefit from executing on the same platform as a service it depends upon.
We developed satellite kernels for an XScale programmable I/O card and for cache-coherent NUMA architectures. We offloaded several applications and operating system components, often by changing only a single line of metadata. We show up to a 28% performance improvement by offloading tasks to the XScale I/O card. On a mail-server benchmark, we show a 39% improvement in performance by automatically splitting the application among multiple NUMA domains.
Bio: Ed Nightingale is a researcher at Microsoft Research in Redmond, WA. His interests include just about anything involving operating systems, distributed systems, or mobile computing.
|
|
Tuesday
October 6th
4:00 PM 2310 CS
|
Tolerating Hardware Device Failures in Software
Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While major operating system and device vendors recommend that drivers detect and recover from hardware failures, we find that there are many drivers that will crash or hang when a device fails. Such bugs cannot easily be detected by regular stress testing because the failures are induced by the device and not the software load.
This paper describes Carburizer, a code-manipulation tool and associated runtime that improves system reliability in the presence of faulty devices. Carburizer analyzes driver source code to find locations where the driver incorrectly trusts the hardware to behave. Carburizer identified almost 1000 such bugs in Linux drivers with a false positive rate of less than 8 percent. With the aid of shadow drivers for recovery, Carburizer can automatically repair 840 of these bugs with no programmer involvement. To facilitate proactive management of device failures, Carburizer can also locate existing driver code that detects device failures and inserts missing failure-reporting code. Finally, the Carburizer runtime can detect and tolerate interrupt-related bugs, such as stuck or missing interrupts.
This is a practice talk for SOSP'09.
|
|
Monday
December 7th
4:00 PM 2310 CS
|
Group File Operations for Scalable Tools and Middleware
This is a practice talk for HiPC 2009. It would be a short presentation (25 minutes including questions).
|
Instructions to Speakers
Two weeks before your talk, mail a title and abstract to the seminar coordinators.
Plan to speak for forty-five minutes and answer questions for fifteen. (Shorter practice talks are also welcome.)
You may use whatever medium you prefer. We will provide a Linux/Windows machine, a digital projector, and an analog projector.
After your talk, mail a copy of your slides (.ps or .ppt) to the coordinators to be archived.
Speakers should bring cookies or a snack to share!
Suggestions for Giving a Good (or a Bad) Talk
by Mark D. Hill
by David A. Patterson
by David Messerschmit
by David Stock
by Bruce Donald
by Peyton et. al.
by Ian Parberry
|