condor_ submit is the program for submitting jobs for execution under Condor. condor_ submit requires a submit description file which contains commands to direct the queuing of jobs. One submit description file may contain specifications for the queuing of many Condor jobs at once. A single invocation of condor_ submit may cause one or more clusters. A cluster is a set of jobs specified in the submit description file between queue commands for which the executable is not changed. It is advantageous to submit multiple jobs as a single cluster because:
Multiple clusters may be specified within a single submit description file. Each cluster must specify a single executable.
The job ClassAd attribute ClusterId identifies a cluster. See section 2.5.2 for specifics on this attribute.
Note that submission of jobs from a Windows machine requires
a stashed password to allow Condor to impersonate the user submitting
the job.
To stash a password, use the condor_ store_cred command.
See the manual page at
page
for details.
SUBMIT DESCRIPTION FILE COMMANDS
Each submit description file describes one cluster of jobs to be placed in the Condor execution pool. All jobs in a cluster must share the same executable, but they may have different input and output files, and different program arguments. The submit description file is the only command-line argument to condor_ submit.
The submit description file must contain one executable command and at least one queue command. All of the other commands have default actions.
The commands which can appear in the submit description file are numerous. They are listed here in alphabetical order.
#! /bin/sh
# get the host name of the machine
$host=`uname -n`
# grab a standard universe executable designed specifically
# for this host
scp elsewhere@cs.wisc.edu:${host} executable
# The PID MUST stay the same, so exec the new standard universe process.
exec executable ${1+"$@"}
If this command is not present (defined), then the value
defaults to false.
If your job attempts to access a file mentioned in this list, Condor will force all writes to that file to be appended to the end. Furthermore, condor_submit will not truncate it. This list uses the same syntax as compress_files, shown above.
This option may yield some surprising results. If several jobs attempt to write to the same file, their output may be intermixed. If a job is evicted from one or more machines during the course of its lifetime, such an output file might contain several copies of the results. This option should be only be used when you wish a certain file to be treated as a running log instead of a precise result.
This option only applies to standard-universe jobs.
These options only apply to standard-universe jobs.
If needed, you may set the buffer controls individually for each file using the buffer_files option. For example, to set the buffer size to 1 Mbyte and the block size to 256 KBytes for the file input.data, use this command:
buffer_files = "input.data=(1000000,256000)"
Alternatively, you may use these two options to set the default sizes for all files used by your job:
buffer_size = 1000000 buffer_block_size = 256000
If you do not set these, Condor will use the values given by these two configuration file macros:
DEFAULT_IO_BUFFER_SIZE = 1000000 DEFAULT_IO_BUFFER_BLOCK_SIZE = 256000
Finally, if no other settings are present, Condor will use a buffer of 512 Kbytes and a block size of 32 Kbytes.
If your job attempts to access any of the files mentioned in this list, Condor will automatically compress them (if writing) or decompress them (if reading). The compress format is the same as used by GNU gzip.
The files given in this list may be simple file names or complete paths and may include * as a wildcard. For example, this list causes the file /tmp/data.gz, any file named event.gz, and any file ending in .gzip to be automatically compressed or decompressed as needed:
compress_files = /tmp/data.gz, event.gz, *.gzipDue to the nature of the compression format, compressed files must only be accessed sequentially. Random access reading is allowed but is very slow, while random access writing is simply not possible. This restriction may be avoided by using both compress_files and fetch_files at the same time. When this is done, a file is kept in the decompressed state at the execution machine, but is compressed for transfer to its original location.
This option only applies to standard-universe jobs.
The following example will set a job to run at exactly on January 1st, 2006 at 12:00 pm:
DeferralTime = 1136138400
This example will cause a job to always wait 60 seconds after it arrives at the execution machine before executing:
DeferralTime = (CurrentTime + 60)
To allow for jobs to run even if the deferral time is missed, please refer to deferral_window.
Please note that scheduler universe jobs are unable to use this feature because they are not executed by the condor_ starter. A scheduler job will fail to be submitted if condor_ deferral_time is defined in its submission file.
In the example below, the deferral_time always evaluate to 60 seconds in the past from the current time, but the job is still allowed to execute because the deferral_window is 120 seconds:
DeferralWindow = 120 DeferralTime = (CurrentTime - 60)
<parameter>=<value>
Multiple environment variables can be specified by separating them with a
semicolon (;) when submitting from a Unix platform.
Multiple environment variables can be specified by separating them with a
vertical bar (| ) when submitting from a Windows platform.
These environment variables will be placed (as given) into the
job's environment before execution. The length of all characters
specified in the environment is currently limited to 10240 characters.
Note that spaces are accepted, but rarely desired,
characters within parameter names and values.
Place spaces within the parameter list only if required.
If your job attempts to access a file mentioned in this list, Condor will automatically copy the whole file to the executing machine, where it can be accessed quickly. When your job closes the file, it will be copied back to its original location. This list uses the same syntax as compress_files, shown above.
This option only applies to standard-universe jobs.
Directs Condor to use a new file name in place of an old one. name
describes a file name that your job may attempt to open, and newname
describes the file name it should be replaced with.
newname may include an optional leading
access specifier, local: or remote:. If left unspecified,
the default access specifier is remote:. Multiple remaps can be
specified by separating each with a semicolon.
This option only applies to standard universe jobs.
If you wish to remap file names that contain equals signs or semicolons, these special characters may be escaped with a backslash.
file_remaps = "dataset.1=other.dataset"
file_remaps = "very.big = local:/bigdisk/bigfile"
file_remaps = "very.big = local:/bigdisk/bigfile ; dataset.1 = other.dataset"
More than one queue command allows job submission to separate resources.
The globusscheduler command is required when the universe is grid, and grid_type is one of gt2, gt3, or gt4.
For a grid_type-string of gt2, the single parameter is the required value that may be set with the globusscheduler command. See section 5.3.2 for syntax of the globusscheduler command.
For a grid_type-string of gt3, the single parameter is the required value that may be set with the globusscheduler command. See section 5.3.2 for syntax of the globusscheduler command.
For a grid_type-string of gt4, the first parameter is the required value that may be set with the globusscheduler command. The second parameter is the required value that may be set with the jobmanager_type command. See section 5.3.2 for the syntax of these commands.
For a grid_type-string of condor, the first parameter is the required value that may be set with the remote_schedd command. The second parameter is the required value that may be set with the remote_pool command. See section 5.3.1 for the syntax of these commands.
For a grid_type-string of nordugrid, the single parameter is the required value that may be set with the nordugrid_resource command. See section 5.3.3 for the syntax of this command.
For a grid_type-string of unicore, the first parameter is the required value that may be set with the unicore__u_site command. The second parameter is the required value that may be set with the unicore__v_site command. See section 5.3.4 for the syntax of these commands.
For vanilla or MPI universe jobs where there is a shared file system, it is the current working directory on the machine where the job is executed.
For vanilla, grid, or MPI universe jobs where file transfer mechanisms are utilized (there is not a shared file system), it is the directory on the machine from which the job is submitted where the input files come from, and where the job's output files go to.
For standard universe jobs, it is the directory on the machine from which the job is submitted where the condor_ shadow daemon runs; the current working directory for file input and output accomplished through remote system calls.
For scheduler universe jobs, it is the directory on the machine from which the job is submitted where the job runs; the current working directory for file input and output with respect to relative path names.
Note that the path to the executable is not relative to initialdir; if it is a relative path, it is relative to the directory in which the condor_ submit command is run.
Note that this command does not refer to the command-line arguments of the program. The command-line arguments are specified by the arguments command.
SIGTSTP which tells the Condor libraries to initiate a checkpoint
of the process. For jobs submitted to the vanilla universe,
the default
is SIGTERM which is the standard way to terminate a program in Unix.
If your job attempts to access a file mentioned in this list, Condor will cause it to be read or written at the execution machine. This is most useful for temporary files not used for input or output. This list uses the same syntax as compress_files, shown above.
local_files = /tmp/*
This option only applies to standard-universe jobs.
For the MPI universe, a single value (max) is required. It is neither a maximum or minimum, but the number of machines to be dedicated toward running the job.
LastMatchName0 = "most-recent-Name" LastMatchName1 = "next-most-recent-Name"
The value for each introduced ClassAd is given by the
value of the Name attribute
from the machine ClassAd of a previous execution (match).
As a job is matched, the definitions for these attributes
will roll,
with LastMatchName1 becoming LastMatchName2,
LastMatchName0 becoming LastMatchName1,
and LastMatchName0 being set by the most recent
value of the Name attribute.
An intended use of these job attributes is in the requirements expression. The requirements can allow a job to prefer a match with either the same or a different resource than a previous match.
When a resource claim is being preempted, this expression in the submit file specifies the maximum runtime of the job (in seconds since the job started). This expression has no effect if it is greater than the maximum retirement time provided by the machine policy. If the resource claim is not preempted, this expression and the machine retirement policy are irrelevant. If the resource claim is preempted and your job finishes sooner than the maximum time, then the claim closes gracefully and all is well. If the resource claim is preempted and your job does not finish in time, then the usual preemption procedure is followed (typically a soft kill signal, followed by some time to gracefully shut down, followed by a hard kill signal).
Standard universe jobs and any jobs running with nice_user priority have a default MaxJobRetirementTime of 0, so no retirement time is utilized by default. In all other cases, no default value is provided, so the maximum amount of retirement time is utilized by default.
Setting this expression does not affect your job's resource requirements or preferences. If you want your job to only run on machines with some minimal MaxJobRetirementTime or if you just want to preferentially run on such machines, you must explicitly specify what you want in the requirements and/or rank expressions.
This is a required submit file entry when the universe is grid, and the grid_type is nordugrid.
.
job-owner@UID_DOMAIN
where UID_DOMAIN is specified by the Condor site administrator. If
UID_DOMAIN has not been specified, Condor will send the email
to :
job-owner@submit-machine-name
For example: Suppose a job is known to run for a minimum of an hour. If the job exits after less than an hour, the job should be placed on hold and an e-mail notification sent, instead of being allowed to leave the queue.
on_exit_hold = (CurrentTime - JobStartDate) < (60 * $(MINUTE))
This expression places the job on hold if it exits for any reason before running for an hour. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became True.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
If left unspecified, this will default to False.
This expression is available for the vanilla, java, and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe.
For example, suppose you have a job that occasionally segfaults, but you know if you run the job again with the same data, chances are that the will finish successfully. This is how you would represent that with on_exit_remove (assuming the signal identifier for segmentation fault is 11 on the platform where your job will be running):
on_exit_remove = (ExitBySignal == False) || (ExitSignal != 11)
This expression will only let the job leave the queue if the job was
not killed by a signal (it exited normally on its own) or if it was
killed by a signal other than 11 (representing segmentation fault).
So, if it was killed by signal 11, it will stay in the job queue.
In any other case of the job exiting,
the job will leave the queue as it normally would have done.
As another example, if your job should only leave the queue if it exited on its own with status 0, you would use this on_exit_remove expression:
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
If the job was killed by a signal or exited with a non-zero exit
status, Condor would leave the job in the queue to run again.
If left unspecified, the on_exit_remove expression will default to True.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
This expression is available for the vanilla, java, and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_ schedd daemon, by default, only checks these periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
Note that if your program explicitly opens and writes to a file, that file should not be specified as the output file.
See the Examples section for an example of a periodic_* expression.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
This expression is available for the vanilla, java and grid universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the schedd, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
This expression is available for the vanilla, java, and grid universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_ schedd daemon, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
See the Examples section for an example of a periodic_* expression.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions. So, the periodic_remove expression takes precedent over the on_exit_remove expression, if the two describe conflicting actions.
This expression is available for the vanilla, java and grid universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the schedd, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
requirements = Memory > 60
rank = Memory
asks Condor to find all available machines with more than 60 megabytes of memory
and give the job the one with the most amount of memory. See the Condor Users
Manual for complete information on the syntax and available attributes
that can be used in the ClassAd expression.
remove_kill_sig = SIGUSR1 remove_kill_sig = 10If this command is not present, the value of kill_sig is used.
requirements = Memory >= 64 && Mips > 45Only one requirements command may be present in a submit description file. By default, condor_ submit appends the following clauses to the requirements expression:
. Also, see the Condor Users
Manual for complete information on the syntax and available attributes
that can be used in the ClassAd expression.
If you define should_transfer_files you must also
define when_to_transfer_output (described below).
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
Note that should_transfer_files is not supported for jobs submitted to the grid universe.
For transferring files other than stdin, see transfer_input_files.
Only the transfer of files is available; the transfer of subdirectories is not supported.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
For transferring files other than stdout, see transfer_output_files.
For grid universe jobs, to have files other than standard output and standard error transferred from the execute machine back to the submit machine, do use transfer_output_files, listing all files to be transferred. These files are found on the execute machine in the working directory of the job.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
name describes an output file name produced by your job, and newname describes the file name it should be downloaded to. Multiple remaps can be specified by separating each with a semicolon. If you wish to remap file names that contain equals signs or semicolons, these special characters may be escaped with a backslash.
This is a required submit file entry when the universe is grid, and grid_type is unicore.
This is a required submit file entry when the universe is grid, and grid_type is unicore.
Setting when_to_transfer_output equal to ON_EXIT will cause Condor to transfer the job's output files back to the submitting machine only when the job completes (exits on its own).
The ON_EXIT_OR_EVICT option is intended for fault tolerant jobs which periodically save their own state and can restart where they left off. In this case, files are spooled to the submit machine any time the job leaves a remote site, either because it exited on its own, or was evicted by the Condor system for any reason prior to job completion. The files spooled back are placed in a directory defined by the value of the SPOOL configuration variable. Any output files transferred back to the submit machine are automatically sent back out again as input files if the job restarts.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
x509userproxy is relevant when the universe is grid, and grid_type is one of gt2, gt3, gt4, or nordugrid.
In addition to commands, the submit description file can contain macros and comments:
<macro_name> = <string>
Three pre-defined macros are supplied by the submit description file parser.
The third of the pre-defined macros is only relevant to MPI universe
jobs.
The
$(Cluster) macro supplies the value of the
ClusterId job
ClassAd attribute, and the
$(Process) macro supplies the value of the
ProcId job
ClassAd attribute.
These macros are
intended to aid in the specification of input/output files, arguments,
etc., for clusters with lots of jobs, and/or could be used to supply a
Condor process with its own cluster and process numbers on the command
line. The $(Process) macro should not be used for PVM jobs.
The
$(Node) macro is defined only for MPI universe jobs.
It is a unique value assigned for the duration of the job
that essentially identifies the machine on which a program is
executing.
If the dollar sign ($) is desired as a literal character,
then use
$(DOLLAR)
In addition to the normal macro, there is also a special kind of macro called a substitution macro that allows the substitution of expressions defined on the resource machine itself (gotten after a match to the machine has been made) into specific expressions within the submit description file. The substitution macro is of the form:
$$(attribute)
A common use of this macro is for the heterogeneous submission of an executable:
executable = povray.$$(opsys).$$(arch)Values for the opsys and arch attributes are substituted at match time for any given resource. This allows Condor to automatically choose the correct executable for the matched machine.
An extension to the syntax of the substitution macro provides an alternative string to use if the machine attribute within the substitution macro is undefined. The syntax appears as:
$$(attribute:string_if_attribute_undefined)
An example using this extended syntax provides a path name to a required input file. Since the file can be placed in different locations on different machines, the file's path name is given as an argument to the program.
argument = $$(input_file_path:/usr/foo)On the machine, if the attribute input_file_path is not defined, then the path /usr/foo is used instead.
The environment macro, $ENV, allows the evaluation of an environment variable to be used in setting a submit description file command. The syntax used is
$ENV(variable)An example submit description file command that uses this functionality evaluates the submittor's home directory in order to set the path and file name of a log file:
log = $ENV(HOME)/jobs/logfileThe environment variable is evaluated when the submit description file is processed.
The $RANDOM_CHOICE macro allows a random choice to be made from a given list of parameters at submission time. For an expression, if some randomness needs to be generated, the macro may appear as
$RANDOM_CHOICE(0,1,2,3,4,5,6)
When evaluated, one of the parameters values will be chosen.
condor_ submit will exit with a status value of 0 (zero) upon success, and a non-zero value upon failure.
####################
#
# submit description file
# Example 1: queuing multiple jobs with differing
# command line arguments and output files.
#
####################
Executable = foo
Universe = standard
Arguments = 15 2000
Output = foo.out1
Error = foo.err1
Queue
Arguments = 30 2000
Output = foo.out2
Error = foo.err2
Queue
Arguments = 45 6000
Output = foo.out3
Error = foo.err3
Queue
####################
#
# Example 2: Show off some fancy features including
# use of pre-defined macros and logging.
#
####################
Executable = foo
Universe = standard
Requirements = Memory >= 32 && OpSys == "IRIX6" && Arch =="SGI"
Rank = Memory >= 64
Image_Size = 28 Meg
Error = err.$(Process)
Input = in.$(Process)
Output = out.$(Process)
Log = foo.log
Queue 150
condor_submit -a "log = out.log" -a "error = error.log" mysubmitfileNote that each of the added commands is contained within quote marks because there are space characters within the command.
Including the command
periodic_remove = CumulativeSuspensionTime >
((RemoteWallClockTime - CumulativeSuspensionTime) / 2.0)
in the submit description file causes this to happen.
+WantCheckpoint = False
in the submit description file before the queue command(s).
U.S. Government Rights Restrictions: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of The Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 or subparagraphs (c)(1) and (2) of Commercial Computer Software-Restricted Rights at 48 CFR 52.227-19, as applicable, Condor Team, Attention: Professor Miron Livny, 7367 Computer Sciences, 1210 W. Dayton St., Madison, WI 53706-1685, (608) 262-0856 or miron@cs.wisc.edu.
See the Condor Version 6.7.14 Manual for additional notices.