
How to run Condor as a cycle scavenger under PBS
Overview
The Condor system is designed to scavenge compute cycles on desktop
workstations when interactive users are idle. This same concept can
be applied to scavenging cycles from another batch system running
on the same computer. The main idea is that instead of configuring
Condor to notice when an interactive user is idle, to configure Condor
to notice when the other batch system is idle on the machine. When
the other system is idle, Condor is free to run jobs, until such time
as the other batch system has work to do. Then, Condor must preempt
or checkpoint the current work.
This page discusses how to configure Condor to do this with PBS, though
the concept works for other batch systems as well.
Condor and PBS
First, configure the condor startd to only run jobs when the
attribute PBSRunning is set. We'll set this dynamically with the
condor_config_val -rset command.
On the worker nodes, define in the condor config:
ENABLE_RUNTIME_CONFIG = TRUE
STARTD_SETTABLE_ATTRS_OWNER = PBSRunning
PBSRunning = False
# Only start jobs if PBS is not currently running a job
START_NOPBS = ( $(PBSRunning) == False )
START = $(START) && $(START_NOPBS)
so that Condor will only start if START is true and there are no PBS
jobs running.
In the PBS world, again on the worker side, have PBS tell Condor when
it is running, by adding the following to the PBS prologue.
if [ -x /opt/condor/bin/condor_config_val ]; then
/opt/condor/bin/condor_config_val -rset -startd PBSRunning=True > /dev/null
/opt/condor/sbin/condor_reconfig -startd > /dev/null
sleep 2
if ( /opt/condor/bin/condor_status -format '%s '
Name -format '%s \n' State $(hostname) 2> /dev/ null | grep -q Claimed )
then
/opt/condor/sbin/condor_vacate > /dev/null
sleep 2
fi
fi
In the PBS Epilogue, tell condor that it is OK to use this machine again:
if [ -x /opt/condor/bin/condor_config_val ]; then
/opt/condor/bin/condor_config_val -rset -startd PBSRunning=False > /dev/null
/opt/condor/sbin/condor_reconfig -startd > / dev/null
fi
Acknowledgements
This code comes from Preston Smith, a smart guy who runs several mixed clusters
at Purdue.
condor-admin@cs.wisc.edu