next up previous contents index
Next: 2.6 Managing a Job Up: 2. Users' Manual Previous: 2.4 Road-map for Running   Contents   Index

Subsections

2.5 Submitting a Job

A job is submitted for execution to Condor using the condor_submit command. condor_submit takes as an argument the name of a file called a submit description file. This file contains commands and keywords to direct the queuing of jobs. In the submit description file, Condor finds everything it needs to know about the job. Items such as the name of the executable to run, the initial working directory, and command-line arguments to the program all go into the submit description file. condor_submit creates a job ClassAd based upon the information, and Condor works toward running the job.

The contents of a submit file can save time for Condor users. It is easy to submit multiple runs of a program to Condor. To run the same program 500 times on 500 different input data sets, arrange your data files accordingly so that each run reads its own input, and each run writes its own output. Each individual run may have its own initial working directory, stdin, stdout, stderr, command-line arguments, and shell environment. A program that directly opens its own files will read the file names to use either from stdin or from the command line. A program that opens a static filename every time will need to use a separate subdirectory for the output of each run.

The condor_submit manual page is on page [*] and contains a complete and full description of how to use condor_submit.


2.5.1 Submit Description File Commands

Each submit description file describes one cluster of jobs to be placed in the Condor execution pool. All jobs in a cluster must share the same executable, but they may have different input and output files, and different program arguments. The submit description file is the only command-line argument to condor_submit. If the submit description file argument is omitted, condor_submit will read the submit description from standard input.

The submit description file must contain one executable command and at least one queue command. All of the other commands have default actions.

The commands which can appear in the submit description file are numerous. They are listed here in alphabetical order by category.

BASIC COMMANDS

arguments = <argument_list>
List of arguments to be supplied to the program on the command line. In the Java Universe, the first argument must be the name of the class containing main.

There are two permissible formats for specifying arguments. The new syntax supports uniform quoting of spaces within arguments; the old syntax supports spaces in arguments only in special circumstances.

In the old syntax, arguments are delimited (separated) by space characters. Double-quotes must be escaped with a backslash (i.e. put a backslash in front of each double-quote).

Further interpretation of the argument string differs depending on the operating system. On Windows, your argument string is simply passed verbatim (other than the backslash in front of double-quotes) to the Windows application. Most Windows applications will allow you to put spaces within an argument value by surrounding the argument with double-quotes. In all other cases, there is no further interpretation of the arguments.

Example:

arguments = one \"two\" 'three'

Produces in Unix vanilla universe:

argument 1: one
argument 2: "two"
argument 3: 'three'

Here are the rules for using the new syntax:

  1. Put double quotes around the entire argument string. This distinguishes the new syntax from the old, because these double-quotes are not escaped with backslashes, as required in the old syntax. Any literal double-quotes within the string must be escaped by repeating them.

  2. Use white space (e.g. spaces or tabs) to separate arguments.

  3. To put any white space in an argument, you must surround the space and as much of the surrounding argument as you like with single-quotes.

  4. To insert a literal single-quote, you must repeat it anywhere inside of a single-quoted section.

Example:

arguments = "one ""two"" 'spacey ''quoted'' argument'"

Produces:

argument 1: one
argument 2: "two"
argument 3: spacey 'quoted' argument

Notice that in the new syntax, backslash has no special meaning. This is for the convenience of Windows users.

environment = <parameter_list>
List of environment variables.

There are two different formats for specifying the environment variables: the old format and the new format. The old format is retained for backward-compatibility. It suffers from a platform-dependent syntax and the inability to insert some special characters into the environment.

The new syntax for specifying environment values:

  1. Put double quote marks around the entire argument string. This distinguishes the new syntax from the old. The old syntax does not have double quote marks around it. Any literal double quote marks within the string must be escaped by repeating the double quote mark.

  2. Each environment entry has the form

    <name>=<value>
    

  3. Use white space (space or tab characters) to separate environment entries.

  4. To put any white space in an environment entry, surround the space and as much of the surrounding entry as desired with single quote marks.

  5. To insert a literal single quote mark, repeat the single quote mark anywhere inside of a section surrounded by single quote marks.

Example:

environment = "one=1 two=""2"" three='spacey ''quoted'' value'"

Produces the following environment entries:

one=1
two="2"
three=spacey 'quoted' value

Under the old syntax, there are no double quote marks surrounding the environment specification. Each environment entry remains of the form

<name>=<value>
Under Unix, list multiple environment entries by separating them with a semicolon (;). Under Windows, separate multiple entries with a vertical bar (| ). There is no way to insert a literal semicolon under Unix or a literal vertical bar under Windows. Note that spaces are accepted, but rarely desired, characters within parameter names and values, because they are treated as literal characters, not separators or ignored white space. Place spaces within the parameter list only if required.

A Unix example:

environment = one=1;two=2;three="quotes have no 'special' meaning"

This produces the following:

one=1
two=2
three="quotes have no 'special' meaning"

If the environment is set with the environment command and getenv is also set to true, values specified with environment override values in the submittor's environment (regardless of the order of the environment and getenv commands).

error = <pathname>
A path and file name used by Condor to capture any error messages the program would normally write to the screen (that is, this file becomes stderr). If not specified, the default value of /dev/null is used for submission to a Unix machine. If not specified, error messages are ignored for submission to a Windows machine. More than one job should not use the same error file, since this will cause one job to overwrite the errors of another. The error file and the output file should not be the same file as the outputs will overwrite each other or be lost. For grid universe jobs, error may be a URL that the Globus tool globus_url_copy understands.

executable = <pathname>
An optional path and a required file name of the executable file for this job cluster. Only one executable command within a submit description file is guaranteed to work properly. More than one often works.

If no path or a relative path is used, then the executable file is presumed to be relative to the current working directory of the user as the condor_submit command is issued.

If submitting into the standard universe, then the named executable must have been re-linked with the Condor libraries (such as via the condor_compile command). If submitting into the vanilla universe (the default), then the named executable need not be re-linked and can be any process which can run in the background (shell scripts work fine as well). If submitting into the Java universe, then the argument must be a compiled .class file.

getenv = <True | False>
If getenv is set to True, then condor_submit will copy all of the user's current shell environment variables at the time of job submission into the job ClassAd. The job will therefore execute with the same set of environment variables that the user had at submit time. Defaults to False.

If the environment is set with the environment command and getenv is also set to true, values specified with environment override values in the submittor's environment (regardless of the order of the environment and getenv commands).

input = <pathname>
Condor assumes that its jobs are long-running, and that the user will not wait at the terminal for their completion. Because of this, the standard files which normally access the terminal, (stdin, stdout, and stderr), must refer to files. Thus, the file name specified with input should contain any keyboard input the program requires (that is, this file becomes stdin). If not specified, the default value of /dev/null is used for submission to a Unix machine. If not specified, input is ignored for submission to a Windows machine. For grid universe jobs, input may be a URL that the Globus tool globus_url_copy understands.

Note that this command does not refer to the command-line arguments of the program. The command-line arguments are specified by the arguments command.

log = <pathname>
Use log to specify a file name where Condor will write a log file of what is happening with this job cluster. For example, Condor will place a log entry into this file when and where the job begins running, when the job produces a checkpoint, or moves (migrates) to another machine, and when the job completes. Most users find specifying a log file to be handy; its use is recommended. If no log entry is specified, Condor does not create a log for this cluster.

log_xml = <True | False>
If log_xml is True, then the log file will be written in ClassAd XML. If not specified, XML is not used. Note that the file is an XML fragment; it is missing the file header and footer. Do not mix XML and non-XML within a single file. If multiple jobs write to a single log file, ensure that all of the jobs specify this option in the same way.

notification = <Always | Complete | Error | Never>
Owners of Condor jobs are notified by e-mail when certain events occur. If defined by Always, the owner will be notified whenever the job produces a checkpoint, as well as when the job completes. If defined by Complete (the default), the owner will be notified when the job terminates. If defined by Error, the owner will only be notified if the job terminates abnormally. If defined by Never, the owner will not receive e-mail, regardless to what happens to the job. The statistics included in the e-mail are documented in section 2.6.7 on page [*].

notify_user = <email-address>
Used to specify the e-mail address to use when Condor sends e-mail about a job. If not specified, Condor defaults to using the e-mail address defined by
job-owner@UID_DOMAIN
where the configuration variable UID_DOMAIN is specified by the Condor site administrator. If UID_DOMAIN has not been specified, Condor sends the e-mail to:
job-owner@submit-machine-name

output = <pathname>
The output file captures any information the program would ordinarily write to the screen (that is, this file becomes stdout). If not specified, the default value of /dev/null is used for submission to a Unix machine. If not specified, output is ignored for submission to a Windows machine. Multiple jobs should not use the same output file, since this will cause one job to overwrite the output of another. The output file and the error file should not be the same file as the outputs will overwrite each other or be lost. For grid universe jobs, output may be a URL that the Globus tool globus_url_copy understands.

Note that if a program explicitly opens and writes to a file, that file should not be specified as the output file.

priority = <integer>
A Condor job priority can be any integer, with 0 being the default. Jobs with higher numerical priority will run before jobs with lower numerical priority. Note that this priority is on a per user basis. One user with many jobs may use this command to order his/her own jobs, and this will have no effect on whether or not these jobs will run ahead of another user's jobs.

queue [number-of-procs]
Places one or more copies of the job into the Condor queue. The optional argument number-of-procs specifies how many times to submit the job to the queue, and it defaults to 1. If desired, any commands may be placed between subsequent queue commands, such as new input, output, error, initialdir, or arguments commands. This is handy when submitting multiple runs into one cluster with one submit description file.

universe = <vanilla | standard | scheduler | local | grid | java | vm>
Specifies which Condor Universe to use when running this job. The Condor Universe specifies a Condor execution environment. The standard Universe tells Condor that this job has been re-linked via condor_compile with the Condor libraries and therefore supports checkpointing and remote system calls. The vanilla Universe is the default (except where the configuration variable DEFAULT_UNIVERSE defines it otherwise), and is an execution environment for jobs which have not been linked with the Condor libraries. Note: Use the vanilla Universe to submit shell scripts to Condor. The scheduler is for a job that should act as a metascheduler. The grid universe forwards the job to an external job management system. Further specification of the grid universe is done with the grid_resource command. The java universe is for programs written to the Java Virtual Machine. The vm universe facilitates the execution of a virtual machine.

COMMANDS FOR MATCHMAKING

rank = <ClassAd Float Expression>
A ClassAd Floating-Point expression that states how to rank machines which have already met the requirements expression. Essentially, rank expresses preference. A higher numeric value equals better rank. Condor will give the job the machine with the highest rank. For example,
        requirements = Memory > 60
        rank = Memory
asks Condor to find all available machines with more than 60 megabytes of memory and give to the job the machine with the most amount of memory. See section 2.5.3 within the Condor Users Manual for complete information on the syntax and available attributes that can be used in the ClassAd expression.

requirements = <ClassAd Boolean Expression>
The requirements command is a boolean ClassAd expression which uses C-like operators. In order for any job in this cluster to run on a given machine, this requirements expression must evaluate to true on the given machine. For example, to require that whatever machine executes a Condor job has a least 64 Meg of RAM and has a MIPS performance rating greater than 45, use:
requirements = Memory >= 64 && Mips > 45
For scheduler and local universe jobs, the requirements expression is evaluated against the Scheduler ClassAd which represents the the condor_schedd daemon running on the submit machine, rather than a remote machine. Like all commands in the submit description file, if multiple requirements commands are present, all but the last one are ignored. By default, condor_submit appends the following clauses to the requirements expression:
  1. Arch and OpSys are set equal to the Arch and OpSys of the submit machine. In other words: unless you request otherwise, Condor will give your job machines with the same architecture and operating system version as the machine running condor_submit.
  2. Disk >= DiskUsage. The DiskUsage attribute is initialized to the size of the executable plus the size of any files specified in a transfer_input_files command. It exists to ensure there is enough disk space on the target machine for Condor to copy over both the executable and needed input files. The DiskUsage attribute represents the maximum amount of total disk space required by the job in kilobytes. Condor automatically updates the DiskUsage attribute approximately every 20 minutes while the job runs with the amount of space being used by the job on the execute machine.
  3. (Memory * 1024) >= ImageSize. To ensure the target machine has enough memory to run your job.
  4. If Universe is set to Vanilla, FileSystemDomain is set equal to the submit machine's FileSystemDomain.
View the requirements of a job which has already been submitted (along with everything else about the job ClassAd) with the command condor_q -l; see the command reference for condor_q on page [*]. Also, see the Condor Users Manual for complete information on the syntax and available attributes that can be used in the ClassAd expression.

FILE TRANSFER COMMANDS

should_transfer_files = <YES | NO | IF_NEEDED >
The should_transfer_files setting is used to define if Condor should transfer files to and from the remote machine where the job runs. The file transfer mechanism is used to run jobs which are not in the standard universe (and can therefore use remote system calls for file access) on machines which do not have a shared file system with the submit machine. should_transfer_files equal to YES will cause Condor to always transfer files for the job. NO disables Condor's file transfer mechanism. IF_NEEDED will not transfer files for the job if it is matched with a resource in the same FileSystemDomain as the submit machine (and therefore, on a machine with the same shared file system). If the job is matched with a remote resource in a different FileSystemDomain, Condor will transfer the necessary files.

If defining should_transfer_files you must also define when_to_transfer_output (described below). For more information about this and other settings related to transferring files, see section 2.5.5 on page [*].

Note that should_transfer_files is not supported for jobs submitted to the grid universe.

stream_error = <True | False>
If True, then stderr is streamed back to the machine from which the job was submitted. If False, stderr is stored locally and transferred back when the job completes. This command is ignored if the job ClassAd attribute TransferErr is False. The default value is True in the grid universe and False otherwise. This command must be used in conjunction with error, otherwise stderr will sent to /dev/null on Unix machines and ignored on Windows machines.

stream_input = <True | False>
If True, then stdin is streamed from the machine on which the job was submitted. The default value is False. The command is only relevant for jobs submitted to the vanilla or java universes, and it is ignored by the grid universe. This command must be used in conjunction with input, otherwise stdin will be /dev/null on Unix machines and ignored on Windows machines.

stream_output = <True | False>
If True, then stdout is streamed back to the machine from which the job was submitted. If False, stdout is stored locally and transferred back when the job completes. This command is ignored if the job ClassAd attribute TransferOut is False. The default value is True in the grid universe and False otherwise. This command must be used in conjunction with output, otherwise stdout will sent to /dev/null on Unix machines and ignored on Windows machines.

transfer_executable = <True | False>
This command is applicable to jobs submitted to the grid, and vanilla universes. If transfer_executable is set to False, then Condor looks for the executable on the remote machine, and does not transfer the executable over. This is useful for an already pre-staged executable; Condor behaves more like rsh. The default value is True.

transfer_input_files = < file1,file2,file... >
A comma-delimited list of all the files to be transferred into the working directory for the job before the job is started. By default, the file specified in the executable command and any file specified in the input command (for example, stdin) are transferred.

Only the transfer of files is available; the transfer of subdirectories is not supported.

For more information about this and other settings related to transferring files, see section 2.5.5 on page [*].

transfer_output_files = < file1,file2,file... >
This command forms an explicit list of output files to be transferred back from the temporary working directory on the execute machine to the submit machine. Most of the time, there is no need to use this command. Other than for grid universe jobs, if transfer_output_files is not specified, Condor will automatically transfer back all files in the job's temporary working directory which have been modified or created by the job. This is usually the desired behavior. Explicitly listing output files is typically only done when the job creates many files, and the user wants to keep a subset of those files. If there are multiple files, they must be delimited with commas. WARNING: Do not specify transfer_output_files in the submit description file unless there is a really good reason - it is best to let Condor figure things out by itself based upon what the job produces.

For grid universe jobs, to have files other than standard output and standard error transferred from the execute machine back to the submit machine, do use transfer_output_files, listing all files to be transferred. These files are found on the execute machine in the working directory of the job.

For more information about this and other settings related to transferring files, see section 2.5.5 on page [*].

transfer_output_remaps = < `` name = newname ; name2 = newname2 ... ''>
This specifies the name (and optionally path) to use when downloading output files from the completed job. Normally, output files are transferred back to the initial working directory with the same name they had in the execution directory. This gives you the option to save them with a different path or name. If you specify a relative path, the final path will be relative to the job's initial working directory.

name describes an output file name produced by your job, and newname describes the file name it should be downloaded to. Multiple remaps can be specified by separating each with a semicolon. If you wish to remap file names that contain equals signs or semicolons, these special characters may be escaped with a backslash.

when_to_transfer_output = < ON_EXIT | ON_EXIT_OR_EVICT >

Setting when_to_transfer_output equal to ON_EXIT will cause Condor to transfer the job's output files back to the submitting machine only when the job completes (exits on its own).

The ON_EXIT_OR_EVICT option is intended for fault tolerant jobs which periodically save their own state and can restart where they left off. In this case, files are spooled to the submit machine any time the job leaves a remote site, either because it exited on its own, or was evicted by the Condor system for any reason prior to job completion. The files spooled back are placed in a directory defined by the value of the SPOOL configuration variable. Any output files transferred back to the submit machine are automatically sent back out again as input files if the job restarts.

For more information about this and other settings related to transferring files, see section 2.5.5 on page [*].

POLICY COMMANDS

hold = <True | False>
If hold is set to True, then the submitted job will be placed into the Hold state. Jobs in the Hold state will not run until released by condor_release. Defaults to False.

leave_in_queue = <ClassAd Boolean Expression>
When the ClassAd Expression evaluates to True, the job is not removed from the queue upon completion. This allows the user of a remotely spooled job to retrieve output files in cases where Condor would have removed them as part of the cleanup associated with completion. The job will only exit the queue once it has been marked for removal (via condor_rm, for example) and the leave_in_queue expression has become False. leave_in_queue defaults to False.

on_exit_hold = <ClassAd Boolean Expression>
The ClassAd expression is checked when the job exits, and if True, places the job into the Hold state. If False (the default value when not defined), then nothing happens and the on_exit_remove expression is checked to determine if that needs to be applied.

For example: Suppose a job is known to run for a minimum of an hour. If the job exits after less than an hour, the job should be placed on hold and an e-mail notification sent, instead of being allowed to leave the queue.

  on_exit_hold = (CurrentTime - JobStartDate) < (60 * $(MINUTE))

This expression places the job on hold if it exits for any reason before running for an hour. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became True.

periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.

Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe.

on_exit_remove = <ClassAd Boolean Expression>
The ClassAd expression is checked when the job exits, and if True (the default value when undefined), then it allows the job to leave the queue normally. If False, then the job is placed back into the Idle state. If the user job runs under the vanilla universe, then the job restarts from the beginning. If the user job runs under the standard universe, then it continues from where it left off, using the last checkpoint.

For example, suppose a job occasionally segfaults, but chances are that the job will finish successfully if the job is run again with the same data. The on_exit_remove expression can cause the job to run again with the following command. Assume that the signal identifier for the segmentation fault is 11 on the platform where the job will be running.

  on_exit_remove = (ExitBySignal == False) || (ExitSignal != 11)
This expression lets the job leave the queue if the job was not killed by a signal or if it was killed by a signal other than 11, representing segmentation fault in this example. So, if the exited due to signal 11, it will stay in the job queue. In any other case of the job exiting, the job will leave the queue as it normally would have done.

As another example, if the job should only leave the queue if it exited on its own with status 0, this on_exit_remove expression works well:

  on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
If the job was killed by a signal or exited with a non-zero exit status, Condor would leave the job in the queue to run again.

periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.

Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_schedd daemon, by default, only checks these periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.

periodic_hold = <ClassAd Boolean Expression>
This expression is checked periodically at an interval of the number of seconds set by the configuration variable PERIODIC_EXPR_INTERVAL. If it becomes True, the job will be placed on hold. If unspecified, the default value is False.

periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.

Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_schedd daemon, by default, only checks these periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.

periodic_release = <ClassAd Boolean Expression>
This expression is checked periodically at an interval of the number of seconds set by the configuration variable PERIODIC_EXPR_INTERVAL while the job is in the Hold state. If the expression becomes True, the job will be released.

Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_schedd daemon, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.

periodic_remove = <ClassAd Boolean Expression>
This expression is checked periodically at an interval of the number of seconds set by the configuration variable PERIODIC_EXPR_INTERVAL. If it becomes True, the job is removed from the queue. If unspecified, the default value is False.

See section 9, the Examples section of the condor_submit manual page, for an example of a periodic_remove expression.

periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions. So, the periodic_remove expression takes precedent over the on_exit_remove expression, if the two describe conflicting actions.

Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_schedd daemon, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.

next_job_start_delay = <ClassAd Boolean Expression>
This expression specifies the number of seconds to delay after starting up this job before the next job is started. The maximum allowed delay is specified by the Condor configuration variable MAX_NEXT_JOB_START_DELAY , which defaults to 10 minutes. This command does not apply to scheduler or local universe jobs.

This command has been historically used to implement a form of job start throttling from the job submitter's perspective. It was effective for the case of multiple job submission where the transfer of extremely large input data sets to the execute machine caused machine performance to suffer. This command is no longer useful, as throttling should be accomplished through configuration of the condor_schedd daemon.

COMMANDS SPECIFIC TO THE STANDARD UNIVERSE

allow_startup_script = <True | False>
If True, a standard universe job will execute a script instead of submitting the job, and the consistency check to see if the executable has been linked using condor_compile is omitted. The executable command within the submit description file specifies the name of the script. The script is used to do preprocessing before the job is submitted. The shell script ends with an exec of the job executable, such that the process id of the executable is the same as that of the shell script. Here is an example script that gets a copy of a machine-specific executable before the exec.
 
   #! /bin/sh

   # get the host name of the machine
   $host=`uname -n`

   # grab a standard universe executable designed specifically
   # for this host
   scp elsewhere@cs.wisc.edu:${host} executable

   # The PID MUST stay the same, so exec the new standard universe process.
   exec executable ${1+"$@"}
If this command is not present (defined), then the value defaults to false.

append_files = file1, file2, ...

If your job attempts to access a file mentioned in this list, Condor will force all writes to that file to be appended to the end. Furthermore, condor_submit will not truncate it. This list uses the same syntax as compress_files, shown above.

This option may yield some surprising results. If several jobs attempt to write to the same file, their output may be intermixed. If a job is evicted from one or more machines during the course of its lifetime, such an output file might contain several copies of the results. This option should be only be used when you wish a certain file to be treated as a running log instead of a precise result.

This option only applies to standard-universe jobs.

buffer_files = < `` name = (size,block-size) ; name2 = (size,block-size) ... '' >
buffer_size = <bytes-in-buffer>
buffer_block_size = <bytes-in-block>
Condor keeps a buffer of recently-used data for each file a job accesses. This buffer is used both to cache commonly-used data and to consolidate small reads and writes into larger operations that get better throughput. The default settings should produce reasonable results for most programs.

These options only apply to standard-universe jobs.

If needed, you may set the buffer controls individually for each file using the buffer_files option. For example, to set the buffer size to 1 Mbyte and the block size to 256 Kbytes for the file input.data, use this command:

buffer_files = "input.data=(1000000,256000)"

Alternatively, you may use these two options to set the default sizes for all files used by your job:

buffer_size = 1000000
buffer_block_size = 256000

If you do not set these, Condor will use the values given by these two configuration file macros:

DEFAULT_IO_BUFFER_SIZE = 1000000
DEFAULT_IO_BUFFER_BLOCK_SIZE = 256000

Finally, if no other settings are present, Condor will use a buffer of 512 Kbytes and a block size of 32 Kbytes.

compress_files = file1, file2, ...

If your job attempts to access any of the files mentioned in this list, Condor will automatically compress them (if writing) or decompress them (if reading). The compress format is the same as used by GNU gzip.

The files given in this list may be simple file names or complete paths and may include * as a wild card. For example, this list causes the file /tmp/data.gz, any file named event.gz, and any file ending in .gzip to be automatically compressed or decompressed as needed:

compress_files = /tmp/data.gz, event.gz, *.gzip
Due to the nature of the compression format, compressed files must only be accessed sequentially. Random access reading is allowed but is very slow, while random access writing is simply not possible. This restriction may be avoided by using both compress_files and fetch_files at the same time. When this is done, a file is kept in the decompressed state at the execution machine, but is compressed for transfer to its original location.

This option only applies to standard universe jobs.

fetch_files = file1, file2, ...
If your job attempts to access a file mentioned in this list, Condor will automatically copy the whole file to the executing machine, where it can be accessed quickly. When your job closes the file, it will be copied back to its original location. This list uses the same syntax as compress_files, shown above.

This option only applies to standard universe jobs.

file_remaps = < `` name = newname ; name2 = newname2 ... ''>

Directs Condor to use a new file name in place of an old one. name describes a file name that your job may attempt to open, and newname describes the file name it should be replaced with. newname may include an optional leading access specifier, local: or remote:. If left unspecified, the default access specifier is remote:. Multiple remaps can be specified by separating each with a semicolon.

This option only applies to standard universe jobs.

If you wish to remap file names that contain equals signs or semicolons, these special characters may be escaped with a backslash.

Example One:
Suppose that your job reads a file named dataset.1. To instruct Condor to force your job to read other.dataset instead, add this to the submit file:
file_remaps = "dataset.1=other.dataset"
Example Two:
Suppose that your run many jobs which all read in the same large file, called very.big. If this file can be found in the same place on a local disk in every machine in the pool, (say /bigdisk/bigfile,) you can instruct Condor of this fact by remapping very.big to /bigdisk/bigfile and specifying that the file is to be read locally, which will be much faster than reading over the network.
file_remaps = "very.big = local:/bigdisk/bigfile"
Example Three:
Several remaps can be applied at once by separating each with a semicolon.
file_remaps = "very.big = local:/bigdisk/bigfile ; dataset.1 = other.dataset"

local_files = file1, file2, ...

If your job attempts to access a file mentioned in this list, Condor will cause it to be read or written at the execution machine. This is most useful for temporary files not used for input or output. This list uses the same syntax as compress_files, shown above.

local_files = /tmp/*

This option only applies to standard universe jobs.

want_remote_io = <True | False>
This option controls how a file is opened and manipulated in a standard universe job. If this option is true, which is the default, then the condor_shadow makes all decisions about how each and every file should be opened by the executing job. This entails a network round trip (or more) from the job to the condor_shadow and back again for every single open() in addition to other needed information about the file. If set to false, then when the job queries the condor_shadow for the first time about how to open a file, the condor_shadow will inform the job to automatically perform all of its file manipulation on the local file system on the execute machine and any file remapping will be ignored. This means that there must be a shared file system (such as NFS or AFS) between the execute machine and the submit machine and that ALL paths that the job could open on the execute machine must be valid. The ability of the standard universe job to checkpoint, possibly to a checkpoint server, is not affected by this attribute. However, when the job resumes it will be expecting the same file system conditions that were present when the job checkpointed.

COMMANDS FOR THE GRID

amazon_ami_id = <Amazon EC2 AMI ID>
AMI identifier of the VM image to run for amazon jobs.

amazon_instance_type = <VM Type>
Identifier for the type of VM desired for an amazon job. The default value is ``m1.small''.

amazon_keypair_file = <pathname>
The complete path and filename of a file into which Condor will write an ssh key for use with amazon jobs. The key can be used to ssh into the virtual machine once it is running.

amazon_private_key = <pathname>
Used for amazon jobs. Path and filename of a file containing the private key to be used to authenticate with Amazon's EC2 service via SOAP.

amazon_public_key = <pathname>
Used for amazon jobs. Path and filename of a file containing the public X509 certificate to be used to authenticate with Amazon's EC2 service via SOAP.

amazon_security_groups = group1, group2, ...
Used for amazon jobs. A list of Amazon EC2 security group names, which should be associated with the job.

amazon_user_data = <data>
Used for amazon jobs. A block of data that can be accessed by the virtual machine job inside Amazon EC2.

amazon_user_data_file = <pathname>
Used for amazon jobs. A file containing data that can be accessed by the virtual machine job inside Amazon EC2.

globus_rematch = <ClassAd Boolean Expression>
This expression is evaluated by the condor_gridmanager whenever:
  1. the globus_resubmit expression evaluates to True
  2. the condor_gridmanager decides it needs to retry a submission (as when a previous submission failed to commit)
If globus_rematch evaluates to True, then before the job is submitted again to globus, the condor_gridmanager will request that the condor_schedd daemon renegotiate with the matchmaker (the condor_negotiator). The result is this job will be matched again.

globus_resubmit = <ClassAd Boolean Expression>
The expression is evaluated by the condor_gridmanager each time the condor_gridmanager gets a job ad to manage. Therefore, the expression is evaluated:
  1. when a grid universe job is first submitted to Condor-G
  2. when a grid universe job is released from the hold state
  3. when Condor-G is restarted (specifically, whenever the condor_gridmanager is restarted)
If the expression evaluates to True, then any previous submission to the grid universe will be forgotten and this job will be submitted again as a fresh submission to the grid universe. This may be useful if there is a desire to give up on a previous submission and try again. Note that this may result in the same job running more than once. Do not treat this operation lightly.

globus_rsl = <RSL-string>
Used to provide any additional Globus RSL string attributes which are not covered by other submit description file commands or job attributes. Used for grid universe jobs, where the grid resource has a grid-type-string of gt2.

globus_xml = <XML-string>
Used to provide any additional attributes in the GRAM XML job description that Condor writes which are not covered by regular submit description file parameters. Used for grid type gt4 jobs.

grid_resource = <grid-type-string> <grid-specific-parameter-list>
For each grid-type-string value, there are further type-specific values that must specified. This submit description file command allows each to be given in a space-separated list. Allowable grid-type-string values are amazon, condor, cream, gt2, gt4, gt5, lsf, nordugrid, pbs, and unicore. See section 5.3 for details on the variety of grid types.

For a grid-type-string of amazon, no additional parameters are used. See section 5.3.7 for details.

For a grid-type-string of condor, the first parameter is the name of the remote condor_schedd daemon. The second parameter is the name of the pool to which the remote condor_schedd daemon belongs. See section 5.3.1 for details.

For a grid-type-string of cream, there are three parameters. The first parameter is the web services address of the CREAM server. The second parameter is the name of the batch system that sits behind the CREAM server. The third parameter identifies a site-specific queue within the batch system. See section 5.3.8 for details.

For a grid-type-string of gt2, the single parameter is the name of the pre-WS GRAM resource to be used. See section 5.3.2 for details.

For a grid-type-string of gt4, the first parameter is the name of the WS GRAM service to be used. The second parameter is the name of WS resource to be used (usually the name of the back-end scheduler). See section 5.3.2 for details.

For a grid-type-string of gt5, the single parameter is the name of the pre-WS GRAM resource to be used, which is the same as for the grid-type-string of gt2. See section 5.3.2 for details.

For a grid-type-string of lsf, no additional parameters are used. See section 5.3.6 for details.

For a grid-type-string of nordugrid, the single parameter is the name of the NorduGrid resource to be used. See section 5.3.3 for details.

For a grid-type-string of pbs, no additional parameters are used. See section 5.3.5 for details.

For a grid-type-string of unicore, the first parameter is the name of the Unicore Usite to be used. The second parameter is the name of the Unicore Vsite to be used. See section 5.3.4 for details.

keystore_alias = <name>
A string to locate the certificate in a Java keystore file, as used for a unicore job.

keystore_file = <pathname>
The complete path and file name of the Java keystore file containing the certificate to be used for a unicore job.

keystore_passphrase_file = <pathname>
The complete path and file name to the file containing the passphrase protecting a Java keystore file containing the certificate. Relevant for a unicore job.

MyProxyCredentialName = <symbolic name>
The symbolic name that identifies a credential to the MyProxy server. This symbolic name is set as the credential is initially stored on the server (using myproxy-init).

MyProxyHost = <host>:<port>
The Internet address of the host that is the MyProxy server. The host may be specified by either a host name (as in head.example.com) or an IP address (of the form 123.456.7.8). The port number is an integer.

MyProxyNewProxyLifetime = <number-of-minutes>
The new lifetime (in minutes) of the proxy after it is refreshed.

MyProxyPassword = <password>
The password needed to refresh a credential on the MyProxy server. This password is set when the user initially stores credentials on the server (using myproxy-init). As an alternative to using MyProxyPassword in the submit description file, the password may be specified as a command line argument to condor_submit with the -password argument.

MyProxyRefreshThreshold = <number-of-seconds>
The time (in seconds) before the expiration of a proxy that the proxy should be refreshed. For example, if MyProxyRefreshThreshold is set to the value 600, the proxy will be refreshed 10 minutes before it expires.

MyProxyServerDN = <credential subject>
A string that specifies the expected Distinguished Name (credential subject, abbreviated DN) of the MyProxy server. It must be specified when the MyProxy server DN does not follow the conventional naming scheme of a host credential. This occurs, for example, when the MyProxy server DN begins with a user credential.

nordugrid_rsl = <RSL-string>
Used to provide any additional RSL string attributes which are not covered by regular submit description file parameters. Used when the universe is grid, and the type of grid system is nordugrid.

transfer_error = <True | False>
For jobs submitted to the grid universe only. If True, then the error output (from stderr) from the job is transferred from the remote machine back to the submit machine. The name of the file after transfer is given by the error command. If False, no transfer takes place (from the remote machine to submit machine), and the name of the file is given by the error command. The default value is True.

transfer_input = <True | False>
For jobs submitted to the grid universe only. If True, then the job input (stdin) is transferred from the machine where the job was submitted to the remote machine. The name of the file that is transferred is given by the input command. If False, then the job's input is taken from a pre-staged file on the remote machine, and the name of the file is given by the input command. The default value is True.

For transferring files other than stdin, see transfer_input_files.

transfer_output = <True | False>
For jobs submitted to the grid universe only. If True, then the output (from stdout) from the job is transferred from the remote machine back to the submit machine. The name of the file after transfer is given by the output command. If False, no transfer takes place (from the remote machine to submit machine), and the name of the file is given by the output command. The default value is True.

For transferring files other than stdout, see transfer_output_files.

x509userproxy = <full-pathname>
Used to override the default path name for X.509 user certificates. The default location for X.509 proxies is the /tmp directory, which is generally a local file system. Setting this value would allow Condor to access the proxy in a shared file system (for example, AFS). Condor will use the proxy specified in the submit description file first. If nothing is specified in the submit description file, it will use the environment variable X509_USER_CERT. If that variable is not present, it will search in the default location.

x509userproxy is relevant when the universe is grid, and the type of grid system is one of gt2, gt4, or nordugrid.

COMMANDS FOR PARALLEL, JAVA, and SCHEDULER UNIVERSES

hold_kill_sig = <signal-number>
For the scheduler universe only, signal-number is the signal delivered to the job when the job is put on hold with condor_hold. signal-number may be either the platform-specific name or value of the signal. If this command is not present, the value of kill_sig is used.

jar_files = <file_list>
Specifies a list of additional JAR files to include when using the Java universe. JAR files will be transferred along with the executable and automatically added to the classpath.

java_vm_args = <argument_list>
Specifies a list of additional arguments to the Java VM itself, When Condor runs the Java program, these are the arguments that go before the class name. This can be used to set VM-specific arguments like stack size, garbage-collector arguments and initial property values.

machine_count = <max>
For the parallel universe, a single value (max) is required. It is neither a maximum or minimum, but the number of machines to be dedicated toward running the job.

remove_kill_sig = <signal-number>
For the scheduler universe only, signal-number is the signal delivered to the job when the job is removed with condor_rm. signal-number may be either the platform-specific name or value of the signal. This example shows it both ways for a Linux signal:
remove_kill_sig = SIGUSR1
remove_kill_sig = 10
If this command is not present, the value of kill_sig is used.

COMMANDS FOR THE VM UNIVERSE

vm_cdrom_files = file1, file2, ...
A comma-separated list of input CD-ROM files.

vm_checkpoint = <True | False>
A boolean value specifying whether or not to take checkpoints. If not specified, the default value is False. In the current implementation, setting both vm_checkpoint and vm_networking to True does not yet work in all cases. Networking cannot be used if a vm universe job uses a checkpoint in order to continue execution after migration to another machine.

vm_macaddr = <MACAddr>
Defines that MAC address that the virtual machine's network interface should have, in the standard format of six groups of two hexadecimal digits separated by colons.

vm_memory = <MBytes-of-memory>
The amount of memory in MBytes that a vm universe job requires.

vm_networking = <True | False>
Specifies whether to use networking or not. In the current implementation, setting both vm_checkpoint and vm_networking to True does not yet work in all cases. Networking cannot be used if a vm universe job uses a checkpoint in order to continue execution after migration to another machine.

vm_networking_type = <nat | bridge >
When vm_networking is True, this definition augments the job's requirements to match only machines with the specified networking. If not specified, then either networking type matches.

vm_no_output_vm = <True | False>
When True, prevents Condor from transferring output files back to the machine from which the vm universe job was submitted. If not specified, the default value is False.

vm_should_transfer_cdrom_files = <True | False>
Specifies whether Condor will transfer CD-ROM files to the execute machine (True) or rely on access through a shared file system (False).

vm_type = <vmware | xen>
Specifies the underlying virtual machine software that this job expects.

vmware_dir = <pathname>
The complete path and name of the directory where VMware-specific files and applications such as the VMDK (Virtual Machine Disk Format) and VMX (Virtual Machine Configuration) reside.

vmware_should_transfer_files = <True | False>
Specifies whether Condor will transfer VMware-specific files located as specified by vmware_dir to the execute machine (True) or rely on access through a shared file system (False). Omission of this required command (for VMware vm universe jobs) results in an error message from condor_submit, and the job will not be submitted.

vmware_snapshot_disk = <True | False>
When True, causes Condor to utilize a VMware snapshot disk for new or modified files. If not specified, the default value is True.

xen_cdrom_device = <device>
Describes the Xen CD-ROM device when vm_cdrom_files is defined.

xen_disk = file1:device1:permission1, file2:device2:permission2, ...
A list of comma separated disk files. Each disk file is specified by 3 colon separated fields. The first field is the path and file name of the disk file. The second field specifies the device, and the third field specifies permissions.

An example that specifies two disk files:

xen_disk = /myxen/diskfile.img:sda1:w,/myxen/swap.img:sda2:w

xen_initrd = <image-file>
When xen_kernel gives a path and file name for the kernel image to use, this optional command may specify a path to and ramdisk (initrd) image file.

xen_kernel = <included | any | path-to-kernel>
A value of included specifies that the kernel is included in the disk file. A value of any specifies that the kernel is deployed on the execute machine, and its location is given by configuration. If not one of these values, then the value is a path and file name of the kernel to be used.

xen_kernel_params = <string>
A string that is appended to the Xen kernel command line.

xen_root = <string>
A string that is appended to the Xen kernel command line to specify the root device. This string is required when xen_kernel is any or gives a path to a kernel. Omission for this required case results in an error message from condor_submit, and the job will not be submitted.

xen_transfer_files = <list-of-files>
A comma separated list of all files that Condor is to transfer to the execute machine.

ADVANCED COMMANDS

concurrency_limits = <string-list>
A list of resources that this job needs. The resources are presumed to have concurrency limits placed upon them, thereby limiting the number of concurrent jobs in execution which need the named resource. Commas and space characters delimit the items in the list. Each item in the list may specify a numerical value identifying the integer number of resources required for the job. The syntax follows the resource name by a colon character (:) and the numerical value. See section 3.13.12 for details on concurrency limits.

copy_to_spool = <True | False>
If copy_to_spool is True, then condor_submit copies the executable to the local spool directory before running it on a remote host. As copying can be quite time consuming and unnecessary, the default value is False for all job universes other than the standard universe. When False, condor_submit does not copy the executable to a local spool directory. The default is True in standard universe, because resuming execution from a checkpoint can only be guaranteed to work using precisely the same executable that created the checkpoint.

coresize = <size>
Should the user's program abort and produce a core file, coresize specifies the maximum size in bytes of the core file which the user wishes to keep. If coresize is not specified in the command file, the system's user resource limit ``coredumpsize'' is used. This limit is not used in HP-UX and DUX operating systems.

cron_day_of_month = <Cron-evaluated Day>
The set of days of the month for which a deferral time applies. See section 2.12.2 for further details and examples.

cron_day_of_week = <Cron-evaluated Day>
The set of days of the week for which a deferral time applies. See section 2.12.2 for details, semantics, and examples.

cron_hour = <Cron-evaluated Hour>
The set of hours of the day for which a deferral time applies. See section 2.12.2 for details, semantics, and examples.

cron_minute = <Cron-evaluated Minute>
The set of minutes within an hour for which a deferral time applies. See section 2.12.2 for details, semantics, and examples.

cron_month = <Cron-evaluated Month>
The set of months within a year for which a deferral time applies. See section 2.12.2 for details, semantics, and examples.

cron_prep_time = <ClassAd Integer Expression>
Analogous to deferral_prep_time. The number of seconds prior to a job's deferral time that the job may be matched and sent to an execution machine.

cron_window = <ClassAd Integer Expression>
Analogous to the submit command deferral_window. It allows cron jobs that miss their deferral time to begin execution.

See section 2.12.1 for further details and examples.

deferral_prep_time = <ClassAd Integer Expression>
The number of seconds prior to a job's deferral time that the job may be matched and sent to an execution machine.

See section 2.12.1 for further details.

deferral_time = <ClassAd Integer Expression>
Allows a job to specify the time at which its execution is to begin, instead of beginning execution as soon as it arrives at the execution machine. The deferral time is an expression that evaluates to a Unix Epoch timestamp (the number of seconds elapsed since 00:00:00 on January 1, 1970, Coordinated Universal Time). Deferral time is evaluated with respect to the execution machine. This option delays the start of execution, but not the matching and claiming of a machine for the job. If the job is not available and ready to begin execution at the deferral time, it has missed its deferral time. A job that misses its deferral time will be put on hold in the queue.

See section 2.12.1 for further details and examples.

Due to implementation details, a deferral time may not be used for scheduler universe jobs.

deferral_window = <ClassAd Integer Expression>
The deferral window is used in conjunction with the deferral_time command to allow jobs that miss their deferral time to begin execution.

See section 2.12.1 for further details and examples.

email_attributes = <list-of-job-ad-attributes>
A comma-separated list of attributes from the job ClassAd. These attributes and their values will be included in the e-mail notification of job completion.

image_size = <size>
This command tells Condor the maximum virtual image size to which you believe your program will grow during its execution. Condor will then execute your job only on machines which have enough resources, (such as virtual memory), to support executing your job. If you do not specify the image size of your job in the description file, Condor will automatically make a (reasonably accurate) estimate about its size and adjust this estimate as your program runs. If the image size of your job is underestimated, it may crash due to inability to acquire more address space, e.g. malloc() fails. If the image size is overestimated, Condor may have difficulty finding machines which have the required resources. size must be in Kbytes, e.g. for an image size of 8 megabytes, use a size of 8000.

initialdir = <directory-path>
Used to give jobs a directory with respect to file input and output. Also provides a directory (on the machine from which the job is submitted) for the user log, when a full path is not specified.

For vanilla universe jobs where there is a shared file system, it is the current working directory on the machine where the job is executed.

For vanilla or grid universe jobs where file transfer mechanisms are utilized (there is not a shared file system), it is the directory on the machine from which the job is submitted where the input files come from, and where the job's output files go to.

For standard universe jobs, it is the directory on the machine from which the job is submitted where the condor_shadow daemon runs; the current working directory for file input and output accomplished through remote system calls.

For scheduler universe jobs, it is the directory on the machine from which the job is submitted where the job runs; the current working directory for file input and output with respect to relative path names.

Note that the path to the executable is not relative to initialdir; if it is a relative path, it is relative to the directory in which the condor_submit command is run.

job_lease_duration = <number-of-seconds>
For vanilla and java universe jobs only, the duration (in seconds) of a job lease. The default value is twenty minutes for universes that support it. If a job lease is not desired, the value can be explicitly set to 0 to disable the job lease semantics. See section 2.15.4 for details of job leases.

kill_sig = <signal-number>
When Condor needs to kick a job off of a machine, it will send the job the signal specified by signal-number. signal-number needs to be an integer which represents a valid signal on the execution machine. For jobs submitted to the standard universe, the default value is the number for SIGTSTP which tells the Condor libraries to initiate a checkpoint of the process. For jobs submitted to the vanilla universe, the default is SIGTERM which is the standard way to terminate a program in Unix.

load_profile = <True | False>
When True, loads the account profile of the dedicated run account for Windows jobs. May not be used with run_as_owner.

match_list_length = <integer value>
Defaults to the value zero (0). When match_list_length is defined with an integer value greater than zero (0), attributes are inserted into the job ClassAd. The maximum number of attributes defined is given by the integer value. The job ClassAds introduced are given as
LastMatchName0 = "most-recent-Name"
LastMatchName1 = "next-most-recent-Name"

The value for each introduced ClassAd is given by the value of the Name attribute from the machine ClassAd of a previous execution (match). As a job is matched, the definitions for these attributes will roll, with LastMatchName1 becoming LastMatchName2, LastMatchName0 becoming LastMatchName1, and LastMatchName0 being set by the most recent value of the Name attribute.

An intended use of these job attributes is in the requirements expression. The requirements can allow a job to prefer a match with either the same or a different resource than a previous match.

max_job_retirement_time = <integer expression>
An integer-valued expression (in seconds) that does nothing unless the machine that runs the job has been configured to provide retirement time (see section 3.5.8). Retirement time is a grace period given to a job to finish naturally when a resource claim is about to be preempted. No kill signals are sent during a retirement time. The default behavior in many cases is to take as much retirement time as the machine offers, so this command will rarely appear in a submit description file.

When a resource claim is to be preempted, this expression in the submit file specifies the maximum run time of the job (in seconds, since the job started). This expression has no effect, if it is greater than the maximum retirement time provided by the machine policy. If the resource claim is not preempted, this expression and the machine retirement policy are irrelevant. If the resource claim is preempted and the job finishes sooner than the maximum time, the claim closes gracefully and all is well. If the resource claim is preempted and the job does not finish in time, the usual preemption procedure is followed (typically a soft kill signal, followed by some time to gracefully shut down, followed by a hard kill signal).

Standard universe jobs and any jobs running with nice_user priority have a default max_job_retirement_time of 0, so no retirement time is utilized by default. In all other cases, no default value is provided, so the maximum amount of retirement time is utilized by default.

Setting this expression does not affect the job's resource requirements or preferences. For a job to only run on a machine with a minimum , or to preferentially run on such machines, explicitly specify this in the requirements and/or rank expressions.

nice_user = <True | False>
Normally, when a machine becomes available to Condor, Condor decides which job to run based upon user and job priorities. Setting nice_user equal to True tells Condor not to use your regular user priority, but that this job should have last priority among all users and all jobs. So jobs submitted in this fashion run only on machines which no other non-nice_user job wants -- a true ``bottom-feeder'' job! This is very handy if a user has some jobs they wish to run, but do not wish to use resources that could instead be used to run other people's Condor jobs. Jobs submitted in this fashion have ``nice-user.'' pre-appended in front of the owner name when viewed from condor_q or condor_userprio. The default value is False.

noop_job = <ClassAd Boolean Expression>
When this boolean expression is True, the job is immediately removed from the queue, and Condor makes no attempt at running the job. The log file for the job will show a job submitted event and a job terminated event, along with an exit code of 0, unless the user specifies a different signal or exit code.

noop_job_exit_code = <return value>
When noop_job is in the submit description file and evaluates to True, this command allows the job to specify the return value as shown in the job's log file job terminated event. If not specified, the job will show as having terminated with status 0. This overrides any value specified with noop_job_exit_signal.

noop_job_exit_signal = <signal number>
When noop_job is in the submit description file and evaluates to True, this command allows the job to specify the signal number that the job's log event will show the job having terminated with.

remote_initialdir = <directory-path>
The path specifies the directory in which the job is to be executed on the remote machine. This is currently supported in all universes except for the standard universe.

rendezvousdir = <directory-path>
Used to specify the shared file system directory to be used for file system authentication when submitting to a remote scheduler. Should be a path to a preexisting directory.

request_cpus = <num-cpus>
For pools that enable dynamic condor_startd provisioning (see section 3.13.7), the number of CPUs requested for this job.

request_disk = <quantity>
For pools that enable dynamic condor_startd provisioning (see section 3.13.7), the amount of disk space requested for this job.

request_memory = <quantity>
For pools that enable dynamic condor_startd provisioning (see section 3.13.7), the amount of memory space requested for this job.

run_as_owner = <True | False>
For Windows platform jobs only, a boolean value that causes a correctly configured Windows platform machine to execute the job under the login of the submitter. May not be used with load_profile. See section 6.2.4 for administrative details.

+<attribute> = <value>
A line which begins with a '+' (plus) character instructs condor_submit to insert the following attribute into the job ClassAd with the given value.

In addition to commands, the submit description file can contain macros and comments:

Macros
Parameterless macros in the form of $(macro_name) may be inserted anywhere in Condor submit description files. Macros can be defined by lines in the form of
 
        <macro_name> = <string>
Three pre-defined macros are supplied by the submit description file parser. The third of the pre-defined macros is only relevant to MPI applications under the parallel universe. The $(Cluster) macro supplies the value of the ClusterId job ClassAd attribute, and the $(Process) macro supplies the value of the ProcId job ClassAd attribute. These macros are intended to aid in the specification of input/output files, arguments, etc., for clusters with lots of jobs, and/or could be used to supply a Condor process with its own cluster and process numbers on the command line. The $(Node) macro is defined for MPI applications run as parallel universe jobs. It is a unique value assigned for the duration of the job that essentially identifies the machine on which a program is executing.

To use the dollar sign character ($) as a literal, without macro expansion, use

$(DOLLAR)

In addition to the normal macro, there is also a special kind of macro called a substitution macro that allows the substitution of a ClassAd attribute value defined on the resource machine itself (gotten after a match to the machine has been made) into specific commands within the submit description file. The substitution macro is of the form:

 
$$(attribute)

A common use of this macro is for the heterogeneous submission of an executable:

executable = povray.$$(opsys).$$(arch)
Values for the opsys and arch attributes are substituted at match time for any given resource. This allows Condor to automatically choose the correct executable for the matched machine.

An extension to the syntax of the substitution macro provides an alternative string to use if the machine attribute within the substitution macro is undefined. The syntax appears as:

 
$$(attribute:string_if_attribute_undefined)

An example using this extended syntax provides a path name to a required input file. Since the file can be placed in different locations on different machines, the file's path name is given as an argument to the program.

 
argument = $$(input_file_path:/usr/foo)
On the machine, if the attribute input_file_path is not defined, then the path /usr/foo is used instead.

A further extension to the syntax of the substitution macro allows the evaluation of a ClassAd expression to define the value. As all substitution macros, the expression is evaluated after a match has been made. Therefore, the expression may refer to machine attributes by prefacing them with the scope resolution prefix TARGET., as specified in section 4.1.2. To place a ClassAd expression into the substitution macro, square brackets are added to delimit the expression. The syntax appears as:

 
$$([ClassAd expression])
An example of a job that uses this syntax may be one that wants to know how much memory it can use. The application cannot detect this itself, as it would potentially use all of the memory on a multi-slot machine. So the job determines the memory per slot, reducing it by 10% to account for miscellaneous overhead, and passes this as a command line argument to the application. In the submit description file will be
 
arguments=--memory $$([TARGET.Memory * 0.9])

To insert two dollar sign characters ($$) as literals into a ClassAd string, use

$$(DOLLARDOLLAR)

The environment macro, $ENV, allows the evaluation of an environment variable to be used in setting a submit description file command. The syntax used is

 
$ENV(variable)
An example submit description file command that uses this functionality evaluates the submittor's home directory in order to set the path and file name of a log file:
 
log = $ENV(HOME)/jobs/logfile
The environment variable is evaluated when the submit description file is processed.

The $RANDOM_CHOICE macro allows a random choice to be made from a given list of parameters at submission time. For an expression, if some randomness needs to be generated, the macro may appear as

 
    $RANDOM_CHOICE(0,1,2,3,4,5,6)
When evaluated, one of the parameters values will be chosen.

Comments
Blank lines and lines beginning with a pound sign ('#') character are ignored by the submit description file parser.


2.5.2 Sample submit description files

In addition to the examples of submit description files given in the condor_submit manual page, here are a few more.

2.5.2.1 Example 1

Example 1 is one of the simplest submit description files possible. It queues up one copy of the program foo (which had been created by condor_compile) for execution by Condor. Since no platform is specified, Condor will use its default, which is to run the job on a machine which has the same architecture and operating system as the machine from which it was submitted. No input, output, and error commands are given in the submit description file, so the files stdin, stdout, and stderr will all refer to /dev/null. The program may produce output by explicitly opening a file and writing to it. A log file, foo.log, will also be produced that contains events the job had during its lifetime inside of Condor. When the job finishes, its exit conditions will be noted in the log file. It is recommended that you always have a log file so you know what happened to your jobs.

  ####################                                                    
  # 
  # Example 1                                                            
  # Simple condor job description file                                    
  #                                                                       
  ####################                                                    
                                                                          
  Executable   = foo                                                    
  Universe     = standard                                                    
  Log          = foo.log                                                    
  Queue

2.5.2.2 Example 2

Example 2 queues two copies of the program mathematica. The first copy will run in directory run_1, and the second will run in directory run_2. For both queued copies, stdin will be test.data, stdout will be loop.out, and stderr will be loop.error. There will be two sets of files written, as the files are each written to their own directories. This is a convenient way to organize data if you have a large group of Condor jobs to run. The example file shows program submission of mathematica as a vanilla universe job. This may be necessary if the source and/or object code to program mathematica is not available.

  ####################     
  #                       
  # Example 2: demonstrate use of multiple     
  # directories for data organization.      
  #                                        
  ####################                    
                                         
  Executable = mathematica          
  Universe   = vanilla                   
  input      = test.data                
  output     = loop.out                
  error      = loop.error             
  Log        = loop.log                                                    
                                  
  Initialdir = run_1         
  Queue                         
                               
  Initialdir = run_2      
  Queue

2.5.2.3 Example 3

The submit description file for Example 3 queues 150 runs of program foo which has been compiled and linked for Sun workstations running Solaris 8. This job requires Condor to run the program on machines which have greater than 32 megabytes of physical memory, and expresses a preference to run the program on machines with more than 64 megabytes, if such machines are available. It also advises Condor that it will use up to 28 megabytes of memory when running. Each of the 150 runs of the program is given its own process number, starting with process number 0. So, files stdin, stdout, and stderr will refer to in.0, out.0, and err.0 for the first run of the program, in.1, out.1, and err.1 for the second run of the program, and so forth. A log file containing entries about when and where Condor runs, checkpoints, and migrates processes for all the 150 queued programs will be written into the single file foo.log.

  ####################                    
  #
  # Example 3: Show off some fancy features including
  # use of pre-defined macros and logging.
  #
  ####################                                                    

  Executable    = foo                                                    
  Universe      = standard                                                    
  Requirements  = Memory >= 32 && OpSys == "SOLARIS28" && Arch =="SUN4u"     
  Rank          = Memory >= 64
  Image_Size    = 28 Meg                                                 

  Error   = err.$(Process)                                                
  Input   = in.$(Process)                                                 
  Output  = out.$(Process)                                                
  Log     = foo.log

  Queue 150


2.5.3 About Requirements and Rank

The requirements and rank commands in the submit description file are powerful and flexible. Using them effectively requires care, and this section presents those details.

Both requirements and rank need to be specified as valid Condor ClassAd expressions, however, default values are set by the condor_submit program if these are not defined in the submit description file. From the condor_submit manual page and the above examples, you see that writing ClassAd expressions is intuitive, especially if you are familiar with the programming language C. There are some pretty nifty expressions you can write with ClassAds. A complete description of ClassAds and their expressions can be found in section 4.1 on page [*].

All of the commands in the submit description file are case insensitive, except for the ClassAd attribute string values. ClassAds attribute names are case insensitive, but ClassAd string values are case preserving.

Note that the comparison operators (<, >, <=, >=, and ==) compare strings case insensitively. The special comparison operators =?= and =!= compare strings case sensitively.

A requirements or rank command in the submit description file may utilize attributes that appear in a machine or a job ClassAd. Within the submit description file (for a job) the prefix MY. (on a ClassAd attribute name) causes a reference to the job ClassAd attribute, and the prefix TARGET. causes a reference to a potential machine or matched machine ClassAd attribute.

The condor_status command displays statistics about machines within the pool. The -l option displays the machine ClassAd attributes for all machines in the Condor pool. The job ClassAds, if there are jobs in the queue, can be seen with the condor_q -l command. This shows all the defined attributes for current jobs in the queue.

A list of defined ClassAd attributes for job ClassAds is given in the unnumbered Appendix on page [*]. A list of defined ClassAd attributes for machine ClassAds is given in the unnumbered Appendix on page [*].


2.5.3.1 Rank Expression Examples

When considering the match between a job and a machine, rank is used to choose a match from among all machines that satisfy the job's requirements and are available to the user, after accounting for the user's priority and the machine's rank of the job. The rank expressions, simple or complex, define a numerical value that expresses preferences.

The job's rank expression evaluates to one of three values. It can be UNDEFINED, ERROR, or a floating point value. If rank evaluates to a floating point value, the best match will be the one with the largest, positive value. If no rank is given in the submit description file, then Condor substitutes a default value of 0.0 when considering machines to match. If the job's rank of a given machine evaluates to UNDEFINED or ERROR, this same value of 0.0 is used. Therefore, the machine is still considered for a match, but has no rank above any other.

A boolean expression evaluates to the numerical value of 1.0 if true, and 0.0 if false.

The following rank expressions provide examples to follow.

For a job that desires the machine with the most available memory:

   Rank = memory

For a job that prefers to run on a friend's machine on Saturdays and Sundays:

   Rank = ( (clockday == 0) || (clockday == 6) )
          && (machine == "friend.cs.wisc.edu")

For a job that prefers to run on one of three specific machines:

   Rank = (machine == "friend1.cs.wisc.edu") ||
          (machine == "friend2.cs.wisc.edu") ||
          (machine == "friend3.cs.wisc.edu")

For a job that wants the machine with the best floating point performance (on Linpack benchmarks):

   Rank = kflops
This particular example highlights a difficulty with rank expression evaluation as currently defined. While all machines have floating point processing ability, not all machines will have the kflops attribute defined. For machines where this attribute is not defined, Rank will evaluate to the value UNDEFINED, and Condor will use a default rank of the machine of 0.0. The rank attribute will only rank machines where the attribute is defined. Therefore, the machine with the highest floating point performance may not be the one given the highest rank.

So, it is wise when writing a rank expression to check if the expression's evaluation will lead to the expected resulting ranking of machines. This can be accomplished using the condor_status command with the -constraint argument. This allows the user to see a list of machines that fit a constraint. To see which machines in the pool have kflops defined, use

condor_status -constraint kflops
Alternatively, to see a list of machines where kflops is not defined, use
condor_status -constraint "kflops=?=undefined"

For a job that prefers specific machines in a specific order:

   Rank = ((machine == "friend1.cs.wisc.edu")*3) +
          ((machine == "friend2.cs.wisc.edu")*2) +
           (machine == "friend3.cs.wisc.edu")
If the machine being ranked is "friend1.cs.wisc.edu", then the expression
   (machine == "friend1.cs.wisc.edu")
is true, and gives the value 1.0. The expressions
   (machine == "friend2.cs.wisc.edu")
and
   (machine == "friend3.cs.wisc.edu")
are false, and give the value 0.0. Therefore, rank evaluates to the value 3.0. In this way, machine "friend1.cs.wisc.edu" is ranked higher than machine "friend2.cs.wisc.edu", machine "friend2.cs.wisc.edu" is ranked higher than machine "friend3.cs.wisc.edu", and all three of these machines are ranked higher than others.


2.5.4 Submitting Jobs Using a Shared File System

If vanilla, java, or parallel universe jobs are submitted without using the File Transfer mechanism, Condor must use a shared file system to access input and output files. In this case, the job must be able to access the data files from any machine on which it could potentially run.

As an example, suppose a job is submitted from blackbird.cs.wisc.edu, and the job requires a particular data file called /u/p/s/psilord/data.txt. If the job were to run on cardinal.cs.wisc.edu, the file /u/p/s/psilord/data.txt must be available through either NFS or AFS for the job to run correctly.

Condor allows users to ensure their jobs have access to the right shared files by using the FileSystemDomain and UidDomain machine ClassAd attributes. These attributes specify which machines have access to the same shared file systems. All machines that mount the same shared directories in the same locations are considered to belong to the same file system domain. Similarly, all machines that share the same user information (in particular, the same UID, which is important for file systems like NFS) are considered part of the same UID domain.

The default configuration for Condor places each machine in its own UID domain and file system domain, using the full host name of the machine as the name of the domains. So, if a pool does have access to a shared file system, the pool administrator must correctly configure Condor such that all the machines mounting the same files have the same FileSystemDomain configuration. Similarly, all machines that share common user information must be configured to have the same UidDomain configuration.

When a job relies on a shared file system, Condor uses the requirements expression to ensure that the job runs on a machine in the correct UidDomain and FileSystemDomain. In this case, the default requirements expression specifies that the job must run on a machine with the same UidDomain and FileSystemDomain as the machine from which the job is submitted. This default is almost always correct. However, in a pool spanning multiple UidDomains and/or FileSystemDomains, the user may need to specify a different requirements expression to have the job run on the correct machines.

For example, imagine a pool made up of both desktop workstations and a dedicated compute cluster. Most of the pool, including the compute cluster, has access to a shared file system, but some of the desktop machines do not. In this case, the administrators would probably define the FileSystemDomain to be cs.wisc.edu for all the machines that mounted the shared files, and to the full host name for each machine that did not. An example is jimi.cs.wisc.edu.

In this example, a user wants to submit vanilla universe jobs from her own desktop machine (jimi.cs.wisc.edu) which does not mount the shared file system (and is therefore in its own file system domain, in its own world). But, she wants the jobs to be able to run on more than just her own machine (in particular, the compute cluster), so she puts the program and input files onto the shared file system. When she submits the jobs, she needs to tell Condor to send them to machines that have access to that shared data, so she specifies a different requirements expression than the default:

   Requirements = TARGET.UidDomain == "cs.wisc.edu" && \
                  TARGET.FileSystemDomain == "cs.wisc.edu"

WARNING: If there is no shared file system, or the Condor pool administrator does not configure the FileSystemDomain setting correctly (the default is that each machine in a pool is in its own file system and UID domain), a user submits a job that cannot use remote system calls (for example, a vanilla universe job), and the user does not enable Condor's File Transfer mechanism, the job will only run on the machine from which it was submitted.


2.5.5 Submitting Jobs Without a Shared File System: Condor's File Transfer Mechanism

Condor works well without a shared file system. The Condor file transfer mechanism is utilized by the user when the user submits jobs. Condor will transfer any files needed by a job from the machine where the job was submitted into a temporary working directory on the machine where the job is to be executed. Condor executes the job and transfers output back to the submitting machine. The user specifies which files to transfer, and at what point the output files should be copied back to the submitting machine. This specification is done within the job's submit description file.

The default behavior of the file transfer mechanism varies across the different Condor universes, and it differs between UNIX and Windows machines.

2.5.5.1 Default Behavior across Condor Universes and Platforms

For jobs submitted under the standard universe, the existence of a shared file system is not relevant. Access to files (input and output) is handled through Condor's remote system call mechanism. The executable and checkpoint files are transferred automatically, when needed. Therefore, the user does not need to change the submit description file if there is no shared file system.

For the vanilla, java, and parallel universes, access to files (including the executable) through a shared file system is presumed as a default on UNIX machines. If there is no shared file system, then Condor's file transfer mechanism must be explicitly enabled. When submitting a job from a Windows machine, Condor presumes the opposite: no access to a shared file system. It instead enables the file transfer mechanism by default. Submission of a job might need to specify which files to transfer, and/or when to transfer the output files back.

For the grid universe, jobs are to be executed on remote machines, so there would never be a shared file system between machines. See section 5.3.2 for more details.

For the scheduler universe, Condor is only using the machine from which the job is submitted. Therefore, the existence of a shared file system is not relevant.


2.5.5.2 Specifying If and When to Transfer Files

To enable the file transfer mechanism, two commands are placed in the job's submit description file: should_transfer_files and when_to_transfer_output. An example is:

  should_transfer_files = YES
  when_to_transfer_output = ON_EXIT

The should_transfer_files command specifies whether Condor should transfer input files from the submit machine to the remote machine where the job executes. It also specifies whether the output files are transferred back to the submit machine. The command takes on one of three possible values:

  1. YES: Condor always transfers both input and output files.

  2. IF_NEEDED: Condor transfers files if the job is matched with (and to be executed on) a machine in a different FileSystemDomain than the one the submit machine belongs to. If the job is matched with a machine in the local FileSystemDomain, Condor will not transfer files and relies on a shared file system.

  3. NO: Condor's file transfer mechanism is disabled.

The when_to_transfer_output command tells Condor when output files are to be transferred back to the submit machine after the job has executed on a remote machine. The command takes on one of two possible values:

  1. ON_EXIT: Condor transfers output files back to the submit machine only when the job exits on its own.

  2. ON_EXIT_OR_EVICT: Condor will always do the transfer, whether the job completes on its own, is preempted by another job, vacates the machine, or is killed. As the job completes on its own, files are transferred back to the directory where the job was submitted, as expected. For the other cases, files are transferred back at eviction time. These files are placed in the directory defined by the configuration variable SPOOL, not the directory from which the job was submitted. The transferred files are named using the ClusterId and ProcId job ClassAd attributes. The file name takes the form:
       cluster<X>.proc<Y>.subproc0
    
    where <X> is the value of ClusterId, and <Y> is the value of ProcId. As an example, job 735.0 may produce the file
       $(SPOOL)/cluster735.proc0.subproc0
    

    This is only useful if partial runs of the job are valuable. An example of valuable partial runs is when the application produces its own checkpoints.

There is no default value for when_to_transfer_output. If using the file transfer mechanism, this command must be defined. If when_to_transfer_output is specified in the submit description file, but should_transfer_files is not, Condor assumes a value of YES for should_transfer_files.

NOTE: The combination of:

  should_transfer_files = IF_NEEDED
  when_to_transfer_output = ON_EXIT_OR_EVICT
would produce undefined file access semantics. Therefore, this combination is prohibited by condor_submit.

When submitting from a Unix platform, the file transfer mechanism is unused by default. If neither when_to_transfer_output or should_transfer_files are defined, Condor assumes should_transfer_files = NO.

When submitting from a Windows platform, Condor does not provide any way to use a shared file system for jobs. Therefore, if neither when_to_transfer_output or should_transfer_files are defined, the file transfer mechanism is enabled by default with the following values:

  should_transfer_files = YES
  when_to_transfer_output = ON_EXIT

2.5.5.3 Specifying What Files to Transfer

If the file transfer mechanism is enabled, Condor will transfer the following files before the job is run on a remote machine.

  1. the executable
  2. the input, as defined with the input command
  3. any jar files (for the Java universe)
If the job requires any other input files, the submit description file should utilize the transfer_input_files command. This comma-separated list specifies any other files that Condor is to transfer to a remote site to set up the execution environment for the job before it is run. These files are placed in the same temporary working directory as the job's executable. At this time, directories can not be transferred in this way. For example:

  transfer_input_files = file1,file2

As a default, for jobs other than those submitted to the grid universe, any files that are modified or created by the job in the temporary directory at the remote site are transferred back to the machine from which the job was submitted. Most of the time, this is the best option. To restrict the files that are transferred, specify the exact list of files with transfer_output_files. Delimit these file names with a comma. When this list is defined, and any of the files do not exist as the job exits, Condor considers this an error, and re-runs the job.

WARNING: Do not specify transfer_output_files (for other than grid universe jobs) unless there is a really good reason - it is best to let Condor figure things out by itself based upon what output the job produces.

For grid universe jobs, files to be transferred (other than standard output and standard error) must be specified using transfer_output_files in the submit description file.

2.5.5.4 File Paths for File Transfer

The file transfer mechanism specifies file names and/or paths on both the file system of the submit machine and on the file system of the execute machine. Care must be taken to know which machine (submit or execute) is utilizing the file name and/or path.

Files in the transfer_input_files command are specified as they are accessed on the submit machine. The program (as it executes) accesses files as they are found on the execute machine.

There are three ways to specify files and paths for transfer_input_files:

  1. Relative to the submit directory, if the submit command initialdir is not specified.
  2. Relative to the initial directory, if the submit command initialdir is specified.
  3. Absolute.

Before executing the program, Condor copies the executable, an input file as specified by the submit command input, along with any input files specified by transfer_input_files. All these files are placed into a temporary directory (on the execute machine) in which the program runs. Therefore, the executing program must access input files without paths. Because all transferred files are placed into a single, flat directory, input files must be uniquely named to avoid collision when transferred. A collision causes the last file in the list to overwrite the earlier one.

If the program creates output files during execution, it must create them within the temporary working directory. Condor transfers back all files within the temporary working directory that have been modified or created. To transfer back only a subset of these files, the submit command transfer_output_files is defined. Transfer of files that exist, but are not within the temporary working directory is not supported. Condor's behavior in this instance is undefined.

It is okay to create files outside the temporary working directory on the file system of the execute machine, (in a directory such as /tmp) if this directory is guaranteed to exist and be accessible on all possible execute machines. However, transferring such a file back after execution completes may not be done.

Here are several examples to illustrate the use of file transfer. The program executable is called my_program, and it uses three command-line arguments as it executes: two input file names and an output file name. The program executable and the submit description file for this job are located in directory /scratch/test.

The directory tree for all these examples:

/scratch/test (directory)
      my_program.condor (the submit description file)
      my_program (the executable)
      files (directory)
          logs2 (directory)
          in1 (file)
          in2 (file)
      logs (directory)

Example 1

This simple example explicitly transfers input files. These input files to be transferred are specified relative to the directory where the job is submitted. The single output file, out1, created when the job is executed will be transferred back into the directory /scratch/test, not the files directory.

# file name:  my_program.condor
# Condor submit description file for my_program
Executable      = my_program
Universe        = vanilla
Error           = logs/err.$(cluster)
Output          = logs/out.$(cluster)
Log             = logs/log.$(cluster)

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = files/in1, files/in2

Arguments       = in1 in2 out1
Queue

Example 2

This second example is identical to Example 1, except that absolute paths to the input files are specified, instead of relative paths to the input files.

# file name:  my_program.condor
# Condor submit description file for my_program
Executable      = my_program
Universe        = vanilla
Error           = logs/err.$(cluster)
Output          = logs/out.$(cluster)
Log             = logs/log.$(cluster)

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = /scratch/test/files/in1, /scratch/test/files/in2

Arguments       = in1 in2 out1
Queue

Example 3

This third example illustrates the use of the submit command initialdir, and its effect on the paths used for the various files. The expected location of the executable is not affected by the initialdir command. All other files (specified by input, output, transfer_input_files, as well as files modified or created by the job and automatically transferred back) are located relative to the specified initialdir. Therefore, the output file, out1, will be placed in the files directory. Note that the logs2 directory exists to make this example work correctly.

# file name:  my_program.condor
# Condor submit description file for my_program
Executable      = my_program
Universe        = vanilla
Error           = logs2/err.$(cluster)
Output          = logs2/out.$(cluster)
Log             = logs2/log.$(cluster)

initialdir      = files

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = in1, in2

Arguments       = in1 in2 out1
Queue

Example 4 - Illustrates an Error

This example illustrates a job that will fail. The files specified using the transfer_input_files command work correctly (see Example 1). However, relative paths to files in the arguments command cause the executing program to fail. The file system on the submission side may utilize relative paths to files, however those files are placed into a single, flat, temporary directory on the execute machine.

Note that this specification and submission will cause the job to fail and re-execute.

# file name:  my_program.condor
# Condor submit description file for my_program
Executable      = my_program
Universe        = vanilla
Error           = logs/err.$(cluster)
Output          = logs/out.$(cluster)
Log             = logs/log.$(cluster)

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = files/in1, files/in2

Arguments       = files/in1 files/in2 files/out1
Queue

This example fails with the following error:

err: files/out1: No such file or directory.

Example 5 - Illustrates an Error

As with Example 4, this example illustrates a job that will fail. The executing program's use of absolute paths cannot work.

# file name:  my_program.condor
# Condor submit description file for my_program
Executable      = my_program
Universe        = vanilla
Error           = logs/err.$(cluster)
Output          = logs/out.$(cluster)
Log             = logs/log.$(cluster)

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = /scratch/test/files/in1, /scratch/test/files/in2

Arguments = /scratch/test/files/in1 /scratch/test/files/in2 /scratch/test/files/out1
Queue

The job fails with the following error:

err: /scratch/test/files/out1: No such file or directory.

Example 6 - Illustrates an Error

This example illustrates a failure case where the executing program creates an output file in a directory other than within the single, flat, temporary directory that the program executes within. The file creation may or may not cause an error, depending on the existence and permissions of the directories on the remote file system.

Further incorrect usage is seen during the attempt to transfer the output file back using the transfer_output_files command. The behavior of Condor for this case is undefined.

# file name:  my_program.condor
# Condor submit description file for my_program
Executable      = my_program
Universe        = vanilla
Error           = logs/err.$(cluster)
Output          = logs/out.$(cluster)
Log             = logs/log.$(cluster)

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = files/in1, files/in2
transfer_output_files = /tmp/out1

Arguments       = in1 in2 /tmp/out1
Queue

2.5.5.5 Requirements and Rank for File Transfer

The requirements expression for a job must depend on the should_transfer_files command. The job must specify the correct logic to ensure that the job is matched with a resource that meets the file transfer needs. If no requirements expression is in the submit description file, or if the expression specified does not refer to the attributes listed below, condor_submit adds an appropriate clause to the requirements expression for the job. condor_submit appends these clauses with a logical AND, &&, to ensure that the proper conditions are met. Here are the default clauses corresponding to the different values of should_transfer_files:

  1. should_transfer_files = YES results in the addition of the clause (HasFileTransfer). If the job is always going to transfer files, it is required to match with a machine that has the capability to transfer files.

  2. should_transfer_files = NO results in the addition of (TARGET.FileSystemDomain == MY.FileSystemDomain). In addition, Condor automatically adds the FileSystemDomain attribute to the job ad, with whatever string is defined for the condor_schedd to which the job is submitted. If the job is not using the file transfer mechanism, Condor assumes it will need a shared file system, and therefore, a machine in the same FileSystemDomain as the submit machine.

  3. should_transfer_files = IF_NEEDED results in the addition of
      (HasFileTransfer || (TARGET.FileSystemDomain == MY.FileSystemDomain))
    
    If Condor will optionally transfer files, it must require that the machine is either capable of transferring files or in the same file system domain.

To ensure that the job is matched to a machine with enough local disk space to hold all the transferred files, Condor automatically adds the DiskUsage job attribute. This attribute includes the total size of the job's executable and all input files to be transferred. Condor then adds an additional clause to the Requirements expression that states that the remote machine must have at least enough available disk space to hold all these files:

  && (Disk >= DiskUsage)

If should_transfer_files = IF_NEEDED and the job prefers to run on a machine in the local file system domain over transferring files, (but are still willing to allow the job to run remotely and transfer files), the rank expression works well. Use:

rank = (TARGET.FileSystemDomain == MY.FileSystemDomain)

The rank expression is a floating point number, so if other items are considered in ranking the possible machines this job may run on, add the items:

rank = kflops + (TARGET.FileSystemDomain == MY.FileSystemDomain)

The value of kflops can vary widely among machines, so this rank expression will likely not do as it intends. To place emphasis on the job running in the same file system domain, but still consider kflops among the machines in the file system domain, weight the part of the rank expression that is matching the file system domains. For example:

rank = kflops + (10000 * (TARGET.FileSystemDomain == MY.FileSystemDomain))

2.5.6 Environment Variables

The environment under which a job executes often contains information that is potentially useful to the job. Condor allows a user to both set and reference environment variables for a job or job cluster.

Within a submit description file, the user may define environment variables for the job's environment by using the environment command. See the condor_submit manual page at section 9 for more details about this command.

The submittor's entire environment can be copied into the job ClassAd for the job at job submission. The getenv command within the submit description file does this. See the condor_submit manual page at section 9 for more details about this command.

If the environment is set with the environment command and getenv is also set to true, values specified with environment override values in the submittor's environment (regardless of the order of the environment and getenv commands).

Commands within the submit description file may reference the environment variables of the submitter as a job is submitted. Submit description file commands use $ENV(EnvironmentVariableName) to reference the value of an environment variable. Again, see the condor_submit manual page at section 9 for more details about this usage.

Condor sets several additional environment variables for each executing job that may be useful for the job to reference.

2.5.7 Heterogeneous Submit: Execution on Differing Architectures

If executables are available for the different platforms of machines in the Condor pool, Condor can be allowed the choice of a larger number of machines when allocating a machine for a job. Modifications to the submit description file allow this choice of platforms.

A simplified example is a cross submission. An executable is available for one platform, but the submission is done from a different platform. Given the correct executable, the requirements command in the submit description file specifies the target architecture. For example, an executable compiled for a Sun 4, submitted from an Intel architecture running Linux would add the requirement

  requirements = Arch == "SUN4x" && OpSys == "SOLARIS251"
Without this requirement, condor_submit will assume that the program is to be executed on a machine with the same platform as the machine where the job is submitted.

Cross submission works for all universes except scheduler and local. See section 5.3.9 for how matchmaking works in the grid universe. The burden is on the user to both obtain and specify the correct executable for the target architecture. To list the architecture and operating systems of the machines in a pool, run condor_status.

2.5.7.1 Vanilla Universe Example for Execution on Differing Architectures

A more complex example of a heterogeneous submission occurs when a job may be executed on many different architectures to gain full use of a diverse architecture and operating system pool. If the executables are available for the different architectures, then a modification to the submit description file will allow Condor to choose an executable after an available machine is chosen.

A special-purpose Machine Ad substitution macro can be used in string attributes in the submit description file. The macro has the form

  $$(MachineAdAttribute)
The $$() informs Condor to substitute the requested MachineAdAttribute from the machine where the job will be executed.

An example of the heterogeneous job submission has executables available for three platforms: LINUX Intel, Solaris26 Intel, and Solaris 8 Sun. This example uses povray to render images using a popular free rendering engine.

The substitution macro chooses a specific executable after a platform for running the job is chosen. These executables must therefore be named based on the machine attributes that describe a platform. The executables named

  povray.LINUX.INTEL
  povray.SOLARIS26.INTEL
  povray.SOLARIS28.SUN4u
will work correctly for the macro
  povray.$$(OpSys).$$(Arch)

The executables or links to executables with this name are placed into the initial working directory so that they may be found by Condor. A submit description file that queues three jobs for this example:

  ####################
  #
  # Example of heterogeneous submission
  #
  ####################

  universe     = vanilla
  Executable   = povray.$$(OpSys).$$(Arch)
  Log          = povray.log
  Output       = povray.out.$(Process)
  Error        = povray.err.$(Process)

  Requirements = (Arch == "INTEL" && OpSys == "LINUX") || \
                 (Arch == "INTEL" && OpSys =="SOLARIS26") || \
                 (Arch == "SUN4u" && OpSys == "SOLARIS28")

  Arguments    = +W1024 +H768 +Iimage1.pov
  Queue 

  Arguments    = +W1024 +H768 +Iimage2.pov
  Queue 

  Arguments    = +W1024 +H768 +Iimage3.pov
  Queue

These jobs are submitted to the vanilla universe to assure that once a job is started on a specific platform, it will finish running on that platform. Switching platforms in the middle of job execution cannot work correctly.

There are two common errors made with the substitution macro. The first is the use of a non-existent MachineAdAttribute. If the specified MachineAdAttribute does not exist in the machine's ClassAd, then Condor will place the job in the held state until the problem is resolved.

The second common error occurs due to an incomplete job set up. For example, the submit description file given above specifies three available executables. If one is missing, Condor report back that an executable is missing when it happens to match the job with a resource that requires the missing binary.

2.5.7.2 Standard Universe Example for Execution on Differing Architectures

Jobs submitted to the standard universe may produce checkpoints. A checkpoint can then be used to start up and continue execution of a partially completed job. For a partially completed job, the checkpoint and the job are specific to a platform. If migrated to a different machine, correct execution requires that the platform must remain the same.

In previous versions of Condor, the author of the heterogeneous submission file would need to write extra policy expressions in the requirements expression to force Condor to choose the same type of platform when continuing a checkpointed job. However, since it is needed in the common case, this additional policy is now automatically added to the requirements expression. The additional expression is added provided the user does not use CkptArch in the requirements expression. Condor will remain backward compatible for those users who have explicitly specified CkptRequirements-implying use of CkptArch, in their requirements expression.

The expression added when the attribute CkptArch is not specified will default to

  # Added by Condor
  CkptRequirements = ((CkptArch == Arch) || (CkptArch =?= UNDEFINED)) && \
                      ((CkptOpSys == OpSys) || (CkptOpSys =?= UNDEFINED))

  Requirements = (<user specified policy>) && $(CkptRequirements)

The behavior of the CkptRequirements expressions and its addition to requirements is as follows. The CkptRequirements expression guarantees correct operation in the two possible cases for a job. In the first case, the job has not produced a checkpoint. The ClassAd attributes CkptArch and CkptOpSys will be undefined, and therefore the meta operator (=?=) evaluates to true. In the second case, the job has produced a checkpoint. The Machine ClassAd is restricted to require further execution only on a machine of the same platform. The attributes CkptArch and CkptOpSys will be defined, ensuring that the platform chosen for further execution will be the same as the one used just before the checkpoint.

Note that this restriction of platforms also applies to platforms where the executables are binary compatible.

The complete submit description file for this example:

  ####################
  #
  # Example of heterogeneous submission
  #
  ####################

  universe     = standard
  Executable   = povray.$$(OpSys).$$(Arch)
  Log          = povray.log
  Output       = povray.out.$(Process)
  Error        = povray.err.$(Process)

  # Condor automatically adds the correct expressions to insure that the
  # checkpointed jobs will restart on the correct platform types.
  Requirements = ( (Arch == "INTEL" && OpSys == "LINUX") || \
                 (Arch == "INTEL" && OpSys =="SOLARIS26") || \
                 (Arch == "SUN4u" && OpSys == "SOLARIS28") )

  Arguments    = +W1024 +H768 +Iimage1.pov
  Queue 

  Arguments    = +W1024 +H768 +Iimage2.pov
  Queue 

  Arguments    = +W1024 +H768 +Iimage3.pov
  Queue


next up previous contents index
Next: 2.6 Managing a Job Up: 2. Users' Manual Previous: 2.4 Road-map for Running   Contents   Index
condor-admin@cs.wisc.edu