
The following graphs compare the checkpoint totals over 10 minute intervals in the day. These graphs show daily checkpointing patterns.
The following graphs show cumulative distributions of checkpoint sizes in our pool.
The following graphs show the daily matchmaking totals. The preemption totals show the number of total matches which required preemption. The matchmaker estimates the bandwidth requirements of job placement and preemption, and rejects matches when the bandwidth requirements for a match would exceed a configured limit.
The following graphs compare the matchmaking totals over 10 minute intervals in the day. These graphs show daily matchmaking patterns.
The following graph shows the cumulative distribution of machine idle times.
The following graph shows the cumulative distribution of machine idle periods divided into short (less than 1 hour), medium (between 1 and 3 hours), and long (longer than 3 hours) for each machine. The three percentages do not add up to 100% for each X value because the three datasets are sorted independently.
The following graph shows the cumulative distribution of total machine idle time divided into short, medium, and long idle periods. Since the percentages in this graph are based on the total time instead of the number of periods, the proportion of longer periods grows. Again, the three datasets are sorted independently.
The following graphs show goodput statistics computed by tracking the behavior of a representative set of jobs in the pool. The current representative set contains 10 Intel and Sparc Solaris jobs, with checkpoint sizes of 4, 24, 48, 96, and 180 MB. The cost of remote file access is not included in the overhead statistic (i.e., the representative jobs perform no file I/O), since this cost varies from job to job. The data points are plotted at the first day of each seven day period.
The following graphs show cumulative distributions for length of CPU allocation and throughput lost when migration fails for the representative jobs, between 7/16/98 and 3/26/99.
(X, Y) = Y% of CPU allocations were X minutes or shorter in length.
(X, Y) = Y% of total allocated throughput was obtained from CPU allocations X minutes or shorter in length.

(X, Y) = Y% of failures lost X minutes or less of CPU time.
(X, Y) = Y% of total lost throughput resulted from individual losses of X minutes or less.