Condor - High Throughput Computing

UW-Madison CS Dept. Condor Pool Goodput Statistics

The following graphs show the daily totals for checkpoint data transfers to the checkpoint server (Received) and from the checkpoint server (Sent).

Checkpoint Transfer Graph (GB) Checkpoint Successful Transfer Graph Checkpoint Failed Transfer Graph Checkpoint Data Throughput Graph (Mb/s)

The following graphs compare the checkpoint totals over 10 minute intervals in the day. These graphs show daily checkpointing patterns.

TOD Checkpoint Transfer Graph TOD Checkpoint Successful Transfer Graph TOD Checkpoint Failed Transfer Graph TOD Checkpoint Data Throughput Graph (Mb/s)

The following graphs show cumulative distributions of checkpoint sizes in our pool.

Checkpoint Size Graph Checkpoint Size Graph

The following graphs show the daily matchmaking totals. The preemption totals show the number of total matches which required preemption. The matchmaker estimates the bandwidth requirements of job placement and preemption, and rejects matches when the bandwidth requirements for a match would exceed a configured limit.

Daily Total Matches Graph Daily Total Rejected Matches Graph Allocated Bandwidth Graph

The following graphs compare the matchmaking totals over 10 minute intervals in the day. These graphs show daily matchmaking patterns.

TOD Matches Graph TOD Rejected Matches Graph TOD Allocated Bandwidth Graph

The following graph shows the cumulative distribution of machine idle times.

Percent Idle Graph

The following graph shows the cumulative distribution of machine idle periods divided into short (less than 1 hour), medium (between 1 and 3 hours), and long (longer than 3 hours) for each machine. The three percentages do not add up to 100% for each X value because the three datasets are sorted independently.

Idle Periods Graph

The following graph shows the cumulative distribution of total machine idle time divided into short, medium, and long idle periods. Since the percentages in this graph are based on the total time instead of the number of periods, the proportion of longer periods grows. Again, the three datasets are sorted independently.

Idle Time Graph

The following graphs show goodput statistics computed by tracking the behavior of a representative set of jobs in the pool. The current representative set contains 10 Intel and Sparc Solaris jobs, with checkpoint sizes of 4, 24, 48, 96, and 180 MB. The cost of remote file access is not included in the overhead statistic (i.e., the representative jobs perform no file I/O), since this cost varies from job to job. The data points are plotted at the first day of each seven day period.

Goodput Graph Badput Graph

The following graphs show cumulative distributions for length of CPU allocation and throughput lost when migration fails for the representative jobs, between 7/16/98 and 3/26/99.

(X, Y) = Y% of CPU allocations were X minutes or shorter in length.

Allocation Distribution Graph

(X, Y) = Y% of total allocated throughput was obtained from CPU allocations X minutes or shorter in length.

Allocation Distribution
Graph

(X, Y) = Y% of failures lost X minutes or less of CPU time.

Failed Migration Distribution Graph

(X, Y) = Y% of total lost throughput resulted from individual losses of X minutes or less.

Failed Migration Distribution Graph


condor-admin@cs.wisc.edu