Next: 5. Condor for Microsoft
Up: 4. Miscellaneous Concepts
Previous: 4.2 An Introduction to
Subsections
4.3
The Condor Perl Module
The Condor perl module facilitates automatic submitting and monitoring of
condor jobs, along with automated administration of condor. The most common
use of the perl module is the monitoring of condor jobs. The condor perl
module uses the user log of a condor job for monitoring.
The Condor perl module is made up of several subroutines. Many subroutines
take other subroutines as arguments. These subroutines are used as callbacks
which are called when interesting events happen.
- 1.
- Submit(command_file)
The submit subroutine takes a command file name as an argument and
submits it to condor. The condor_submit program should be in the
path of the user. If the user wishes to monitor the job with condor
they must specify a log file in the command file. The cluster
submitted is returned. For more information
see the condor_submit man page.
- 2.
- Vacate(machine)
Vacate the machine specified. The machine may be specified
either by hostname, or by sinful string. For more information
see the condor_vacate man page.
- 3.
- Reschedule(machine)
Reschedule the machine specified. The machine may be specified either
by hostname, or by sinful string. For more information see
the condor_reschedule man page.
- 4.
- RegisterEvicted(sub)
Register an eviction handler that will be called anytime a job from
the specified cluster is evicted. The eviction handler will be
called with two arguments: cluster and job. The cluster
and job are the cluster number and process number of the job that
was evicted.
- 5.
- RegisterEvictedWithCheckpoint(sub)
Same as RegisterEvicted except that the handler is called when the
evicted job was checkpointed.
- 6.
- RegisterEvictedWithoutCheckpoint(sub)
Same as RegisterEvicted except that the handler is called when the
evicted job was not checkpointed.
- 7.
- RegisterExit(sub)
Register a termination handler that is called when a job exits.
The termination handler will be called with two arguments: cluster and
job. The cluster and job are the cluster and process numbers of the
existing job.
- 8.
- RegisterExitSuccess(sub)
Register a termination handler that is called when a job exits without
errors. The termination handler will be called with two arguments:
cluster and job The cluster and job are the cluster and process
numbers of the existing job.
- 9.
- RegisterExitFailure(sub)
Register a termination handler that is called when a job exits with
errors. The termination handler will be called with three arguments:
cluster, job and retval. The cluster and job are the cluster
and process numbers of the existing job and the retval is the exit
code of the job.
- 10.
- RegisterExitAbnormal(sub)
Register an termination handler that is called when a job abnormally
exits (segmentation fault, bus error, ...). The termination handler
will be called with four arguments: cluster, job signal and
core. The cluster and job are the cluster and process numbers of
the existing job. The signal indicates the signal that the job
died with and core indicates whether a core file was created and if
so, what the full path to the core file is.
- 11.
- RegisterAbort(sub)
Register a handler that is called when a job is aborted by a user.
- 12.
- RegisterJobErr(sub)
Register a handler that is called when a job is not executable.
- 13.
- RegisterExecute(sub)
Register an execution handler that is called whenever a job starts
running on a given host. The handler is called with four arguments:
cluster, job host, and sinful. Cluster and job are the cluster and
process numbers for the job, host is the Internet address of the
machine running the job, and sinful is the Internet address and
command port of the condor_starter supervising the job.
- 14.
- RegisterSubmit(sub)
Register a submit handler that is called whenever a job is submitted
with the given cluster. The handler is called with cluster, job
host, and sinful. Cluster and job are the cluster and
process numbers for the job, host is the Internet address of the
machine running the job, and sinful is the Internet address and
command port of the condor_schedd responsible for the job.
- 15.
- Monitor(cluster)
Begin monitoring this cluster. This process starts a sub process
in order to monitor the child, so other actions may proceed in the
main loop of the perl script. However, handlers cannot rely on
being able to communicate back to the main script by simply changing
variables latter on.
- 16.
- Wait()
Wait until all monitors finish and exit.
- 17.
- DebugOn()
Turn debug messages on. This may be useful if you don't understand
what your script is doing.
- 18.
- DebugOff()
Turn debug messages off.
The following is a simple example of using the condor perl module.
#!/usr/bin/perl
use Condor;
$CMD_FILE = 'mycmdfile.cmd';
$evicts = 0;
$vacates = 0;
# A subroutine that will be used as the normal execution callback
$normal = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "Job $cluster.$job exited normally without errors.\n";
print "Job was vacated $vacates times and evicted $evicts times\n";
exit(0);
};
$evicted = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "Job $cluster, $job was evicted.\n";
$evicts++;
&Condor::Reschedule();
};
$execute = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
$host = $parameters{'host'};
$sinful = $parameters{'sinful'};
print "Job running on $sinful, vacating...\n";
&Condor::Vacate($sinful);
$vacates++;
};
$cluster = Condor::Submit($CMD_FILE);
&Condor::RegisterExitSuccess($normal);
&Condor::RegisterEvicted($evicted);
&Condor::RegisterExecute($execute);
&Condor::Monitor($cluster);
&Condor::Wait();
This example program will submit the command file 'mycmdfile.cmd' and attempt
to vacate any machine that the job runs on. The termination
handler then prints out a summary of what has happened.
Next: 5. Condor for Microsoft
Up: 4. Miscellaneous Concepts
Previous: 4.2 An Introduction to
condor-admin@cs.wisc.edu