Picture of Doug

Douglas Thain

thain at cs dot wisc dot edu
Room 3381, 1210 W. Dayton St., Madison WI 53706
phone: 608-262-4050 fax: 608-262-9777

You may now find me at the University of Notre Dame.

Research Summary

My research aims to empower users to discover, harness, and share computing and storage in large scale computer systems. Such systems, whether they are called clusters, metacomputers, or grids, provide an unfriendly home for applications. Mismatched interfaces, missing resources, and unpredicable performance are all to be expected. In short, failure is the normal case. How can we accomplish significant work in such a hostile environment?

I am creating a variety of tools and techniques to attack this problem within the context of the Condor Project. The foundation is the notion of interposition agents, which allow us to transparently insert new pieces of software under existing applications, providing new services in old abstractions. Two examples of these devices are Parrot, which presents new storage systems as ordinary file system entries, and the Grid Console, which provides an interactive-when-possible terminal for remote applications. Using these tools, one may attach an application into new systems with powerful new ways of discovering, moving, and accessing data, while dealing with failures. One example is the I/O community, which binds execution and storage sites together using the ClassAd language. Another is the Kangaroo distributed file system, which allows for controlled reconciliation of newly-written data over a long time period in the presence of failures. Throughout, the propagation and expression of failures is a persistent complication. To address this, I am developing a body of philosophy on error management in remote execution and distributed I/O as well as a language, the fault tolerant shell.

Research Software

  • Bypass
  • Condor
  • Fault Tolerant Shell
  • Kangaroo
  • Parrot
  • Publications

    Dissertation

  • Douglas Thain, "Coordinating Access to Computation and Data in Distributed Systems", Ph.D. Thesis, University of Wisconsin-Madison, 2004.
  • Book Chapters

  • Douglas Thain and Miron Livny, "Building Reliable Clients and Servers", in Ian Foster and Carl Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 2003, 2nd edition. ISBN: 1-55860-933-4
  • Douglas Thain, Todd Tannenbaum, and Miron Livny, "Condor and the Grid", in Fran Berman, Anthony J.G. Hey, Geoffrey Fox, editors, Grid Computing: Making The Global Infrastructure a Reality, John Wiley, 2003. ISBN: 0-470-85319-0
  • Invited Journal Articles

  • Douglas Thain, Todd Tannenbaum, and Miron Livny, "Distributed Computing in Practice: The Condor Experience", Concurrency and Computation: Practice and Experience, to appear in 2004.
  • Douglas Thain and Miron Livny, "Multiple Bypass: Interposition Agents for Distributed Computing", Journal of Cluster Computing, Volume 4, Pages 39-47, 2001.
  • Refereed Conference and Workshop Papers

  • John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny, "Explicit Control in a Batch Aware Distributed File System", in Proceedings of the First USENIX/ACM Conference on Networked Systems Design and Implementation, San Francisco, CA, March 2004.
  • Douglas Thain and Miron Livny, Parrot: Transparent User-Level Middleware for Data-Intensive Computing, Workshop on Adaptive Grid Middleware, New Orleans, Louisiana, September 2003.
  • Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau and Miron Livny, "Pipeline and Batch Sharing in Grid Workloads", in Proceedings of the Twelfth IEEE Symposium on High Performance Distributed Computing, Seattle, WA, June 2003.
  • Douglas Thain and Miron Livny, "The Ethernet Approach to Grid Computing", in Proceedings of the Twelfth IEEE Symposium on High Performance Distributed Computing, Seattle, WA, June 2003.
  • Oleg Lodygensky, Gilles Fedak, Vincent Néri, Franck Cappello, Miron Livny, and Douglas Thain, "XtremWeb & Condor : sharing resources between internet connected Condor Pool", in Proceedings of Global and Peer-to-Peer Computing on Large Scale Distributed Systems, Tokyo, Japan, May 2003.
  • Douglas Thain and Miron Livny, "Error Scope on a Computational Grid", Proceedings of the Eleventh IEEE Symposium on High Performance Distributed Computing, Edinburgh, Scotland, July 23-26, 2002.
  • Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny, "Gathering at the Well: Creating Communities for Grid I/O", Proceedings of Supercomputing 2001, Denver, Colorado, November 2001.
  • Douglas Thain, Jim Basney, Se-Chang Son, and Miron Livny, "The Kangaroo Approach to Data Movement on the Grid", Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing, San Francisco, California, August 7-9, 2001.
  • Douglas Thain and Miron Livny, "Bypass: A tool for building split execution systems", in Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing, Pittsburg, Pennsylvania, pp 79-85, August 1-4, 2000.
  • Technical Reports

  • Douglas Thain and Miron Livny, "Parrot: Transparent User-Level Middleware for Data-Intensive Computing", Technical Report 1493(a), Computer Sciences Department, University of Wisconsin, December 2003.
  • Douglas Thain, "The Problem with Grand Unified Frameworks", Technical Report 1492, Computer Sciences Department, University of Wisconsin, November 2003.
  • Douglas Thain and Miron Livny, "The Case for Sparse Files", Technical Report 1464, Computer Sciences Department, University of Wisconsin, January 2003.
  • Douglas Thain, John Bent, Andrea C. Arpaci-Dusseau, Remzi H Arpaci-Dusseau, and Miron Livny, "The Architectural Implications of Pipeline and Batch Sharing in Scientific Workloads", Technical Report 1463, Computer Sciences Department, University of Wisconsin, January 2003.
  • Douglas Thain and Miron Livny, "Error Management in the Pluggable File System", Technical Report 1448, Computer Sciences Department, University of Wisconsin, October 2002.
  • (Or, see partial lists recorded by DBLP and ResearchIndex.)

    Slides from Selected Talks

    Please contact me if you would like to use or modify these slides in your own work.
  • Parrot: Transparent User-Level Middleware for Data-Intensive Computing, AGM 2003, New Orleans, September 2003.
  • The Ethernet Approach to Grid Computing, HPDC 12, Seattle WA, June 2003.
  • Reliability and Troubleshooting with Condor, PPDG Troubleshooting Workshop, December 2002.
  • Error Scope: Theory and Practice, Griphyn Reliability Workshop, July 2002.
  • Error Management in a Virtual Operating System, OS Seminar, UW, October 2001.
  • The Kangaroo Approach to Data Movement on the Grid, HPDC 10, San Francisco, August 2001.
  • Multiple Bypass: Interposition Agents for Distributed Systems, OS Seminar, UW, September 2000.
  • Bypass: A Tool for Building Distributed Systems, HPDC 9, Pittsburgh, August 2000.
  • Classes Taught

    I teach a variety of classes from the freshman to the graduate level. Of course, my teaching activities are on hold while I write my dissertation.
  • Coordinator, UW Systems Seminar
  • NEEP 602, High Performance Computing (Fall 2001)
  • CS 838, Grid Computing (Spring 2001)
  • CS 537, Introduction to Operating Systems (Spring 2000, Fall 1998, Summer 1998)
  • CS 302, Introduction to Algebraic Language Programming (Fall 1997, Spring 1998)
  • Hobbies

    I enjoy ballroom dancing, teaching inland sailing, and playing the euphonium (also known as the baritone horn.) In the winter months, I play some (very) amateur hockey.