next up previous contents index
Next: 3.4 User Priorities and Up: 3. Administrators' Manual Previous: 3.2 Installation   Contents   Index

Subsections


3.3 Configuration

This section describes how to configure all parts of the Condor system. General information about the configuration files and their syntax is followed by a description of settings that affect all Condor daemons and tools. The settings that control the policy under which Condor will start, suspend, resume, vacate or kill jobs are described in section 3.5 on Startd Policy Configuration.


3.3.1 Introduction to Configuration Files

The Condor configuration files are used to customize how Condor operates at a given site. The basic configuration as shipped with Condor works well for most sites.

Each Condor program will, as part of its initialization process, configure itself by calling a library routine which parses the various configuration files that might be used including pool-wide, platform-specific, machine-specific, and root-owned configuration files. Environment variables may also contribute to the configuration.

The result of configuration is a list of key/value pairs. Each key is a configuration variable name, and each value is a string literal that may utilize macro substitution (as defined below). Note that the string literal value portion of a pair is not an expression, and therefore it is not evaluated. Those configuration variables that express the policy for starting and stopping of jobs appear as expressions in the configuration file. However, these expressions (for configuration) are string literals. At appropriate times, Condor daemons and tools use these strings as expressions, parsing them in order to do evaluation.


3.3.1.1 Ordered Evaluation to Set the Configuration

Multiple files, as well as a program's environment variables determine the configuration. The order in which attributes are defined is important, as later definitions override existing definitions. The order in which the (multiple) configuration files are parsed is designed to ensure the security of the system. Attributes which must be set a specific way must appear in the last file to be parsed. This prevents both the naive and the malicious Condor user from subverting the system through its configuration. The order in which items are parsed is

  1. global configuration file
  2. local configuration file
  3. global root-owned configuration file
  4. local root-owned configuration file
  5. specific environment variables prefixed with _CONDOR_

The locations for these files are as given in section 3.2.2 on page [*].

Some Condor tools utilize environment variables to set their configuration. These tools search for specifically-named environment variables. The variables are prefixed by the string _CONDOR_ or _condor_. The tools strip off the prefix, and utilize what remains as configuration. As the use of environment variables is the last within the ordered evaluation, the environment variable definition is used. The security of the system is not compromised, as only specific variables are considered for definition in this manner, not any environment variables with the _CONDOR_ prefix.


3.3.1.2 Configuration File Macros

Macro definitions are of the form:

<macro_name> = <macro_definition>

NOTE: There must be white space between the macro name, the ``='' sign, and the macro definition.

Macro invocations are of the form:

$(macro_name)

Macro definitions may contain references to other macros, even ones that are not yet defined (as long as they are eventually defined in the configuration files). All macro expansion is done after all configuration files have been parsed (with the exception of macros that reference themselves, described below).

A = xxx
C = $(A)
is a legal set of macro definitions, and the resulting value of C is xxx. Note that C is actually bound to $(A), not its value.

As a further example,

A = xxx
C = $(A)
A = yyy
is also a legal set of macro definitions, and the resulting value of C is yyy.

A macro may be incrementally defined by invoking itself in its definition. For example,

A = xxx
B = $(A)
A = $(A)yyy
A = $(A)zzz
is a legal set of macro definitions, and the resulting value of A is xxxyyyzzz. Note that invocations of a macro in its own definition are immediately expanded. $(A) is immediately expanded in line 3 of the example. If it were not, then the definition would be impossible to evaluate.

Recursively defined macros such as

A = $(B)
B = $(A)
are not allowed. They create definitions that Condor refuses to parse.

NOTE: Macros should not be incrementally defined in the LOCAL_ROOT_CONFIG_FILE for security reasons.

All entries in a configuration file must have an operator, which will be an equals sign (=). Identifiers are alphanumerics combined with the underscore character, optionally with a subsystem name and a period as a prefix. As a special case, a line without an operator that begins with a left square bracket will be ignored. The following two-line example treats the first line as a comment, and correctly handles the second line.

[Condor Settings]
my_classad = [ foo=bar ]

To simplify pool administration, any configuration variable name may be prefixed by a subsystem (see the $(SUBSYSTEM) macro in section 3.3.1 for the list of subsystems) and the period (.) character. For configuration variables defined this way, the value is applied to the specific subsystem. For example, the ports that Condor may use can be restricted to a range using the HIGHPORT and LOWPORT configuration variables. If the range of intended ports is different for specific daemons, this syntax may be used.

  MASTER.LOWPORT   = 20000
  MASTER.HIGHPORT  = 20100
  NEGOTIATOR.LOWPORT   =  22000 
  NEGOTIATOR.HIGHPORT  =  22100

Note that all configuration variables may utilize this syntax, but nonsense configuration variables may result. For example, it makes no sense to define

  NEGOTIATOR.MASTER_UPDATE_INTERVAL = 60
since the condor_ negotiator daemon does not use the MASTER_UPDATE_INTERVAL variable.

It makes little sense to do so, but Condor will configure correctly with a definition such as

  MASTER.MASTER_UPDATE_INTERVAL = 60
The condor_ master uses this configuration variable, and the prefix of MASTER. causes this configuration to be specific to the condor_ master daemon.


3.3.1.3 Comments and Line Continuations

A Condor configuration file may contain comments and line continuations. A comment is any line beginning with a ``#'' character. A continuation is any entry that continues across multiples lines. Line continuation is accomplished by placing the ``$\backslash$'' character at the end of any line to be continued onto another. Valid examples of line continuation are

  START = (KeyboardIdle > 15 * $(MINUTE)) && \
  ((LoadAvg - CondorLoadAvg) <= 0.3)
and
  ADMIN_MACHINES = condor.cs.wisc.edu, raven.cs.wisc.edu, \
  stork.cs.wisc.edu, ostrich.cs.wisc.edu, \
  bigbird.cs.wisc.edu
  HOSTALLOW_ADMIN = $(ADMIN_MACHINES)

Note that a line continuation character may currently be used within a comment, so the following example does not set the configuration variable FOO:

  # This comment includes the following line, so FOO is NOT set \
  FOO = BAR
It is a poor idea to use this functionality, as it is likely to stop working in future Condor releases.


3.3.1.4 Executing a Program to Produce Configuration Macros

Instead of reading from a file, Condor may run a program to obtain configuration macros. The vertical bar character (| ) as the last character defining a file name provides the syntax necessary to tell Condor to run a program. This syntax may only be used in the definition of the CONDOR_CONFIG environment variable, the LOCAL_CONFIG_FILE configuration variable, or the LOCAL_ROOT_CONFIG_FILE configuration variable.

The command line for the program is formed by the characters preceding the vertical bar character. The standard output of the program is parsed as a configuration file would be.

An example:

LOCAL_CONFIG_FILE = /bin/make_the_config|

Program /bin/make_the_config is executed, and its output is the set of configuration macros.

Note that either a program is executed to generate the configuration macros or the configuration is read from one or more files. The syntax uses space characters to separate command line elements, if an executed program produces the configuration macros. Space characters would otherwise separate the list of files. This syntax does not permit distinguishing one from the other, so only one may be specified.


3.3.1.5 Pre-Defined Macros

Condor provides pre-defined macros that help configure Condor. Pre-defined macros are listed as $(macro_name).

This first set are entries whose values are determined at run time and cannot be overwritten. These are inserted automatically by the library routine which parses the configuration files.

$(FULL_HOSTNAME)
The fully qualified hostname of the local machine (hostname plus domain name).

$(HOSTNAME)
The hostname of the local machine (no domain name).

$(IP_ADDRESS)
The ASCII string version of the local machine's IP address.

$(TILDE)
The full path to the home directory of the Unix user condor, if such a user exists on the local machine.

$(SUBSYSTEM)
The subsystem name of the daemon or tool that is evaluating the macro. This is a unique string which identifies a given daemon within the Condor system. The possible subsystem names are:

This second set of macros are entries whose default values are determined automatically at runtime but which can be overwritten.

$(ARCH)
Defines the string used to identify the architecture of the local machine to Condor. The condor_ startd will advertise itself with this attribute so that users can submit binaries compiled for a given platform and force them to run on the correct machines. condor_ submit will append a requirement to the job ClassAd that it must run on the same ARCH and OPSYS of the machine where it was submitted, unless the user specifies ARCH and/or OPSYS explicitly in their submit file. See the the condor_ submit manual page on page [*] for details.

$(OPSYS)
Defines the string used to identify the operating system of the local machine to Condor. If it is not defined in the configuration file, Condor will automatically insert the operating system of this machine as determined by uname.

$(UNAME_ARCH)
The architecture as reported by uname(2)'s machine field. Always the same as ARCH on Windows.

$(UNAME_OPSYS)
The operating system as reported by uname(2)'s sysname field. Always the same as OPSYS on Windows.

$(PID)
The process ID for the daemon or tool.

$(PPID)
The process ID of the parent process for the daemon or tool.

$(USERNAME)
The user name of the UID of the daemon or tool. It is useful for setting GRIDMANAGER_LOG, as that needs to be done on a per-user basis. For daemons started as root, but running under another UID (typically the user condor), this will be the other UID.

$(FILESYSTEM_DOMAIN)
Defaults to the fully qualified hostname of the machine it is evaluated on. See section 3.3.7, Shared File System Configuration File Entries for the full description of its use and under what conditions you would want to change it.

$(UID_DOMAIN)
Defaults to the fully qualified hostname of the machine it is evaluated on. See section 3.3.7 for the full description of this configuration variable.

Since $(ARCH) and $(OPSYS) will automatically be set to the correct values, we recommend that you do not overwrite them. Only do so if you know what you are doing.


3.3.2 The Special Configuration Macros $ENV(), $RANDOM_CHOICE(), and $RANDOM_INTEGER()

References to the Condor process's environment are allowed in the configuration files. Environment references use the ENV macro and are of the form:

  $ENV(environment_variable_name)
For example,
  A = $ENV(HOME)
binds A to the value of the HOME environment variable. Environment references are not currently used in standard Condor configurations. However, they can sometimes be useful in custom configurations.

This same syntax is used in the RANDOM_CHOICE() macro to allow a random choice of a parameter within a configuration file. These references are of the form:

  $RANDOM_CHOICE(list of parameters)
This allows a random choice within the parameter list to be made at configuration time. Of the list of parameters, one is chosen when encountered during configuration. For example, if one of the integers 0-8 (inclusive) should be randomly chosen, the macro usage is
  $RANDOM_CHOICE(0,1,2,3,4,5,6,7,8)

The RANDOM_INTEGER() macro is similar to the RANDOM_CHOICE() macro, and is used to select a random integer within a configuration file. References are of the form:

  $RANDOM_INTEGER(min, max [, step])
A random integer within the range min and max, inclusive, is selected at configuration time. The optional step parameter controls the stride within the range, and it defaults to the value 1. For example, to randomly chose an even integer in the range 0-8 (inclusive), the macro usage is
  $RANDOM_INTEGER(0, 8, 2)

See section 7.2 on page [*] for an actual use of this specialized macro.


3.3.3 Condor-wide Configuration File Entries

This section describes settings which affect all parts of the Condor system. Other system-wide settings can be found in section 3.3.6 on ``Network-Related Configuration File Entries'', and section 3.3.7 on ``Shared File System Configuration File Entries''.

CONDOR_HOST
This macro may be used to define the $(NEGOTIATOR_HOST) and is used to define the $(COLLECTOR_HOST) macro. Normally the condor_ collector and condor_ negotiator would run on the same machine. If for some reason they were not run on the same machine, $(CONDOR_HOST) would not be needed. Some of the host-based security macros use $(CONDOR_HOST) by default. See section 3.6.10, on Setting up IP/host-based security in Condor for details.

COLLECTOR_HOST
The hostname of the machine where the condor_ collector is running for your pool. Normally, it is defined relative to the $(CONDOR_HOST) macro. There is no default value for this macro; COLLECTOR_HOST must be defined for the pool to work properly.

In addition to defining the hostname, this setting can optionally be used to specify the network port of the condor_ collector. The port is separated from the hostname by a colon (':'). For example,

    COLLECTOR_HOST = $(CONDOR_HOST):1234
If no port is specified, the default port of 9618 is used. Using the default port is recommended for most sites. It is only changed if there is a conflict with another service listening on the same network port. For more information about specifying a non-standard port for the condor_ collector daemon, see section 3.7.1 on page [*].

NEGOTIATOR_HOST
This configuration variable is no longer used. The port where the condor_ negotiator is listening is normally dynamically allocated since version 6.7.4.

For pools running 6.7.3 and older versions: The host name of the machine where the condor_ negotiator is running for the pool. Normally, it is defined relative to the $(CONDOR_HOST) macro. There is no default value for this macro; NEGOTIATOR_HOST must be defined for the pool to work properly. This variable may also be used to optionally define a network port for the condor_ negotiator daemon, as explained for the COLLECTOR_HOST variable.

CONDOR_VIEW_HOST
The hostname of the machine where the CondorView server is running. This service is optional, and requires additional configuration if you want to enable it. There is no default value for CONDOR_VIEW_HOST. If CONDOR_VIEW_HOST is not defined, no CondorView server is used. See section 3.12.5 on page [*] for more details.

SCHEDD_HOST
The hostname of the machine where the condor_ schedd is running for your pool. This is the host that queues submitted jobs. Note that, in most condor installations, there is a condor_ schedd running on each host from which jobs are submitted. The default value of SCHEDD_HOST is the current host. For most pools, this macro is not defined.

RELEASE_DIR
The full path to the Condor release directory, which holds the bin, etc, lib, and sbin directories. Other macros are defined relative to this one. There is no default value for RELEASE_DIR .

BIN
This directory points to the Condor directory where user-level programs are installed. It is usually defined relative to the $(RELEASE_DIR) macro. There is no default value for BIN .

LIB
This directory points to the Condor directory where libraries used to link jobs for Condor's standard universe are stored. The condor_ compile program uses this macro to find these libraries, so it must be defined for condor_ compile to function. $(LIB) is usually defined relative to the $(RELEASE_DIR) macro, and has no default value.

LIBEXEC
This directory points to the Condor directory where support commands that Condor needs will be placed. Do not add this directory to a user or system-wide path.

INCLUDE
This directory points to the Condor directory where header files reside. $(INCLUDE) would usually be defined relative to the $(RELEASE_DIR) configuration macro. There is no default value, but if defined, it can make inclusion of necessary header files for compilation of programs (such as those programs that use libcondorapi.a) easier through the use of condor_ config_val.

SBIN
This directory points to the Condor directory where Condor's system binaries (such as the binaries for the Condor daemons) and administrative tools are installed. Whatever directory $(SBIN) points to ought to be in the PATH of users acting as Condor administrators. SBIN has no default value.

LOCAL_DIR
The location of the local Condor directory on each machine in your pool. One common option is to use the condor user's home directory which may be specified with $(TILDE). There is no default value for LOCAL_DIR . For example:
    LOCAL_DIR = $(tilde)

On machines with a shared file system, where either the $(TILDE) directory or another directory you want to use is shared among all machines in your pool, you might use the $(HOSTNAME) macro and have a directory with many subdirectories, one for each machine in your pool, each named by host names. For example:

    LOCAL_DIR = $(tilde)/hosts/$(hostname)
or:
    LOCAL_DIR = $(release_dir)/hosts/$(hostname)

LOG
Used to specify the directory where each Condor daemon writes its log files. The names of the log files themselves are defined with other macros, which use the $(LOG) macro by default. The log directory also acts as the current working directory of the Condor daemons as the run, so if one of them should produce a core file for any reason, it would be placed in the directory defined by this macro. LOG is required to be defined. Normally, $(LOG) is defined in terms of $(LOCAL_DIR).

SPOOL
The spool directory is where certain files used by the condor_ schedd are stored, such as the job queue file and the initial executables of any jobs that have been submitted. In addition, for systems not using a checkpoint server, all the checkpoint files from jobs that have been submitted from a given machine will be store in that machine's spool directory. Therefore, you will want to ensure that the spool directory is located on a partition with enough disk space. If a given machine is only set up to execute Condor jobs and not submit them, it would not need a spool directory (or this macro defined). There is no default value for SPOOL , and the condor_ schedd will not function without it SPOOL defined. Normally, $(SPOOL) is defined in terms of $(LOCAL_DIR).

EXECUTE
This directory acts as the current working directory of any Condor job that is executing on the local machine. If a given machine is only set up to only submit jobs and not execute them, it would not need an execute directory (or this macro defined). There is no default value for EXECUTE, and the condor_ startd will not function if EXECUTE is not defined. Normally, $(EXECUTE) is defined in terms of $(LOCAL_DIR).

LOCAL_CONFIG_FILE
Identifies the location of the local, machine-specific configuration file for each machine in the pool. The two most common choices would be putting this file in the $(LOCAL_DIR), or putting all local configuration files for the pool in a shared directory, each one named by host name. For example,
    LOCAL_CONFIG_FILE = $(LOCAL_DIR)/condor_config.local
or,
    LOCAL_CONFIG_FILE = $(release_dir)/etc/$(hostname).local
or, not using the release directory
    LOCAL_CONFIG_FILE = /full/path/to/configs/$(hostname).local

The value of $(LOCAL_CONFIG_FILE) is treated as a list of files, not a single file. The items in the list are delimited by either commas or space characters. This allows the specification of multiple files as the local configuration file, each one processed in the order given (with parameters set in later files overriding values from previous files). This allows the use of one global configuration file for multiple platforms in the pool, defines a platform-specific configuration file for each platform, and uses a local configuration file for each machine. If the list of files is changed in one of the later read files, the new list replaces the old list, but any files that have already been processed remain processed, and are removed from the new list if they are present to prevent cycles. See section 3.3.1 on page [*] for directions on using a program to generate the configuration macros that would otherwise reside in one or more files as described here. If LOCAL_CONFIG_FILE is not defined, no local configuration files are processed. For more information on this, see section 3.12.2 about Configuring Condor for Multiple Platforms on page [*].

REQUIRE_LOCAL_CONFIG_FILE
Beginning in Condor 6.5.5, it is permissible for files listed in LOCAL_CONFIG_FILE to be missing. This is most useful for sites that have large numbers of machines in the pool, and a local configuration file that uses the $(HOSTNAME) macro in its definition. Instead of having an empty file for every host in the pool, files can simply be omitted. The default value is True, causing Condor to exit with an error, if a file listed as the local configuration file cannot be read. A value of False allows local configuration files to be missing.

LOCAL_CONFIG_DIR
Beginning in Condor 6.7.18, a directory may be used as a container for local configuration files. The files found in the directory are sorted into lexicographical order, and then each file is treated as though it was listed in LOCAL_CONFIG_FILE. LOCAL_CONFIG_DIR is processed before any files listed in LOCAL_CONFIG_FILE, and is checked again after processing the LOCAL_CONFIG_FILE list. It is a list of directories, and each directory is processed in the order it appears in the list. The process is not recursive, so any directories found inside the directory being processed are ignored.

LOCAL_ROOT_CONFIG_FILE
A comma or space separated list of path and file names specifying the local, root configuration files.

CONDOR_IDS
The User ID (UID) and Group ID (GID) pair that the Condor daemons should run as, if the daemons are spawned as root. This value can also be specified in the CONDOR_IDS environment variable. If the Condor daemons are not started as root, then neither this CONDOR_IDS configuration macro nor the CONDOR_IDS environment variable are used. The value is given by two integers, separated by a period. For example, CONDOR_IDS = 1234.1234. If this pair is not specified in either the configuration file or in the environment, and the Condor daemons are spawned as root, then Condor will search for a condor user on the system, and run as that user's UID and GID. See section 3.6.12 on UIDs in Condor for more details.

CONDOR_ADMIN
The email address that Condor will send mail to if something goes wrong in your pool. For example, if a daemon crashes, the condor_ master can send an obituary to this address with the last few lines of that daemon's log file and a brief message that describes what signal or exit status that daemon exited with. There is no default value for CONDOR_ADMIN .

CONDOR_SUPPORT_EMAIL
The email address to be included at the bottom of all email Condor sends out under the label ``Email address of the local Condor administrator:''. This is the address where Condor users at your site should send their questions about Condor and get technical support. If this setting is not defined, Condor will use the address specified in CONDOR_ADMIN (described above).

MAIL
The full path to a mail sending program that uses -s to specify a subject for the message. On all platforms, the default shipped with Condor should work. Only if you installed things in a non-standard location on your system would you need to change this setting. There is no default value for MAIL, and the condor_ schedd will not function unless MAIL is defined.

RESERVED_SWAP
Determines how much swap space you want to reserve for your own machine. Condor will not start up more condor_ shadow processes if the amount of free swap space on your machine falls below this level. RESERVED_SWAP is specified in megabytes. The default value of RESERVED_SWAP is 5 megabytes.

RESERVED_DISK
Determines how much disk space you want to reserve for your own machine. When Condor is reporting the amount of free disk space in a given partition on your machine, it will always subtract this amount. An example is the condor_ startd, which advertises the amount of free space in the $(EXECUTE) directory. The default value of RESERVED_DISK is zero.

LOCK
Condor needs to create lock files to synchronize access to various log files. Because of problems with network file systems and file locking over the years, we highly recommend that you put these lock files on a local partition on each machine. If you do not have your $(LOCAL_DIR) on a local partition, be sure to change this entry.

Whatever user or group Condor is running as needs to have write access to this directory. If you are not running as root, this is whatever user you started up the condor_ master as. If you are running as root, and there is a condor account, it is most likely condor. Otherwise, it is whatever you set in the CONDOR_IDS environment variable, or whatever you define in the CONDOR_IDS setting in the Condor config files. See section 3.6.12 on UIDs in Condor for details.

If no value for LOCK is provided, the value of LOG is used.

HISTORY
Defines the location of the Condor history file, which stores information about all Condor jobs that have completed on a given machine. This macro is used by both the condor_ schedd which appends the information and condor_ history, the user-level program used to view the history file. This configuration macro is given the default value of $(SPOOL)/history in the default configuration. If not defined, no history file is kept.

ENABLE_HISTORY_ROTATION
If this is defined to be true, then the history file will be rotated. If it is false, then it will not be rotated, and it will grow indefinitely, to the limits allowed by the operating system. If this is not defined, it is assumed to be true. The rotated files will be stored in the same directory as the history file.

MAX_HISTORY_LOG
Defines the maximum size for the history file, in bytes. It defaults to 20MB. This parameter is only used if history file rotation is enabled.

MAX_HISTORY_ROTATIONS
When history file rotation is turned on, this controls how many backup files there are. It default to 2, which means that there may be up to three history files (two backups, plus the history file that is being currently written to). When the history file is rotated, and this rotation would cause the number of backups to be too large, the oldest file is removed.

MAX_JOB_QUEUE_LOG_ROTATIONS
The schedd periodically rotates the job queue database file in order to save disk space. This option controls how many rotated files are saved. It defaults to 1, which means there may be up to two history files (the previous one, which was rotated out of use, and the current one that is being written to). When the job queue file is rotated, and this rotation would cause the number of backups to be larger the the maximum specified, the oldest file is removed. The primary reason to save one or more rotated job queue files is if you are using Quill, and you want to ensure that Quill keeps an accurate history of all events logged in the job queue file. Quill keeps track of where it last left off when reading logged events, so when the file is rotated, Quill will resume reading from where it last left off, provided that the rotated file still exists. If Quill finds that it needs to read events from a rotated file that has been deleted, it will be forced to skip the missing events and resume reading in the next chronological job queue file that can be found. Such an event should not lead to an inconsistency in Quill's view of the current queue contents, but it would create a inconsistency in Quill's record of the history of the job queue.

DEFAULT_DOMAIN_NAME
If you do not use a fully qualified name in file /etc/hosts (or NIS, etc.) for either your official hostname or as an alias, Condor would not normally be able to use fully qualified names in places that it wants to. You can set this macro to the domain to be appended to your hostname, if changing your host information is not a good option. This macro must be set in the global configuration file (not the $(LOCAL_CONFIG_FILE). The reason for this is that the special $(FULL_HOSTNAME) macro is used by the configuration file code in Condor needs to know the full hostname. So, for $(DEFAULT_DOMAIN_NAME) to take effect, Condor must already have read in its value. However, Condor must set the $(FULL_HOSTNAME) special macro since you might use that to define where your local configuration file is. After reading the global configuration file, Condor figures out the right values for $(HOSTNAME) and $(FULL_HOSTNAME) and inserts them into its configuration table.

NO_DNS
A boolean value that defaults to False. When True, Condor constructs host names using the host's IP address together with the value defined for DEFAULT_DOMAIN_NAME.

CM_IP_ADDR
If neither COLLECTOR_HOST nor COLLECTOR_IP_ADDR macros are defined, then this macro will be used to determine the IP address of the central manager (collector daemon). This macro is defined by an IP address.

EMAIL_DOMAIN
By default, if a user does not specify notify_user in the submit description file, any email Condor sends about that job will go to "username@UID_DOMAIN". If your machines all share a common UID domain (so that you would set UID_DOMAIN to be the same across all machines in your pool), but email to user@UID_DOMAIN is not the right place for Condor to send email for your site, you can define the default domain to use for email. A common example would be to set EMAIL_DOMAIN to the fully qualified hostname of each machine in your pool, so users submitting jobs from a specific machine would get email sent to user@machine.your.domain, instead of user@your.domain. You would do this by setting EMAIL_DOMAIN to $(FULL_HOSTNAME). In general, you should leave this setting commented out unless two things are true: 1) UID_DOMAIN is set to your domain, not $(FULL_HOSTNAME), and 2) email to user@UID_DOMAIN will not work.

CREATE_CORE_FILES
Defines whether or not Condor daemons are to create a core file in the LOG directory if something really bad happens. It is used to set the resource limit for the size of a core file. If not defined, it leaves in place whatever limit was in effect when the Condor daemons (normally the condor_ master) were started. This allows Condor to inherit the default system core file generation behavior at startup. For Unix operating systems, this behavior can be inherited from the parent shell, or specified in a shell script that starts Condor. If this parameter is set and True, the limit is increased to the maximum. If it is set to False, the limit is set at 0 (which means that no core files are created). Core files greatly help the Condor developers debug any problems you might be having. By using the parameter, you do not have to worry about tracking down where in your boot scripts you need to set the core limit before starting Condor. You set the parameter to whatever behavior you want Condor to enforce. This parameter defaults to undefined to allow the initial operating system default value to take precedence, and is commented out in the default configuration file.

Q_QUERY_TIMEOUT
Defines the timeout (in seconds) that condor_ q uses when trying to connect to the condor_ schedd. Defaults to 20 seconds.
PASSWD_CACHE_REFRESH
Condor can cause NIS servers to become overwhelmed by queries for uid and group information in large pools. In order to avoid this problem, Condor caches UID and group information internally. This setting allows pool administrators to specify (in seconds) how long Condor should wait until refreshes a cache entry. The default is set to 300 seconds, or 5 minutes. This means that if a pool administrator updates the user or group database (for example, /etc/passwd or /etc/group), it can take up to 5 minutes before Condor will have the updated information. This caching feature can be disabled by setting the refresh interval to 0. In addition, the cache can also be flushed explicitly by running the command
    condor_reconfig -full
This configuration variable has no effect on Windows.
SYSAPI_GET_LOADAVG
If set to False, then Condor will not attempt to compute the load average on the system, and instead will always report the system load average to be 0.0. Defaults to True.

NETWORK_MAX_PENDING_CONNECTS
This specifies a limit to the maximum number of simultaneous network connection attempts. This is primarily relevant to condor_ schedd, which may try to connect to large numbers of startds when claiming them. The negotiator may also connect to large numbers of startds when initiating security sessions used for sending MATCH messages. On Unix, the default for this parameter is eighty percent of the process file descriptor limit. On windows, the default is 1600.


3.3.4 Daemon Logging Configuration File Entries

These entries control how and where the Condor daemons write their log files. Each of the entries in this section represents multiple macros. There is one for each subsystem (listed in section 3.3.1). The macro name for each substitutes <SUBSYS> with the name of the subsystem corresponding to the daemon.

<SUBSYS>_LOG
The name of the log file for a given subsystem. For example, $(STARTD_LOG) gives the location of the log file for condor_ startd.

The actual names of the files are also used in the $(VALID_LOG_FILES) entry used by condor_ preen. A change to one of the file names with this setting requires a change to the $(VALID_LOG_FILES) entry as well, or condor_ preen will delete your newly named log files.

MAX_<SUBSYS>_LOG
Controls the maximum length in bytes to which a log will be allowed to grow. Each log file will grow to the specified length, then be saved to a file with the suffix .old. The .old files are overwritten each time the log is saved, thus the maximum space devoted to logging for any one program will be twice the maximum length of its log file. A value of 0 specifies that the file may grow without bounds. The default is 1 Mbyte.

TRUNC_<SUBSYS>_LOG_ON_OPEN
If this macro is defined and set to True, the affected log will be truncated and started from an empty file with each invocation of the program. Otherwise, new invocations of the program will append to the previous log file. By default this setting is False for all daemons.

<SUBSYS>_LOCK
This macro specifies the lock file used to synchronize append operations to the log file for this subsystem. It must be a separate file from the $(<SUBSYS>_LOG) file, since the $(<SUBSYS>_LOG) file may be rotated and you want to be able to synchronize access across log file rotations. A lock file is only required for log files which are accessed by more than one process. Currently, this includes only the SHADOW subsystem. This macro is defined relative to the $(LOCK) macro. If, for some strange reason, you decide to change this setting, be sure to change the $(VALID_LOG_FILES) entry that condor_ preen uses as well.

FILE_LOCK_VIA_MUTEX
This macro setting only works on Win32 - it is ignored on Unix. If set to be True, then log locking is implemented via a kernel mutex instead of via file locking. On Win32, mutex access is FIFO, while obtaining a file lock is non-deterministic. Thus setting to True fixes problems on Win32 where processes (usually shadows) could starve waiting for a lock on a log file. Defaults to True on Win32, and is always False on Unix.

ENABLE_USERLOG_LOCKING
When True (the default value), a user's job log (as specified in a submit description file) will be locked before being written to. If False, Condor will not lock the file before writing.

TOUCH_LOG_INTERVAL
The time interval in seconds between when daemons touch their log files. The change in last modification time for the log file is useful when a daemon restarts after failure or shut down. The last modification date is printed, and it provides an upper bound on the length of time that the daemon was not running. Defaults to 60 seconds.

<SUBSYS>_DEBUG
All of the Condor daemons can produce different levels of output depending on how much information is desired. The various levels of verbosity for a given daemon are determined by this macro. All daemons have the default level D_ ALWAYS, and log messages for that level will be printed to the daemon's log, regardless of this macro's setting. Settings are a comma- or space-separated list of the following values:

D_ ALL
This flag turns on all debugging output by enabling all of the debug levels at once. There is no need to list any other debug levels in addition to D_ ALL; doing so would be redundant. Be warned: this will generate about a HUGE amount of output. To obtain a higher level of output than the default, consider using D_ FULLDEBUG before using this option.

D_ FULLDEBUG
This level provides verbose output of a general nature into the log files. Frequent log messages for very specific debugging purposes would be excluded. In those cases, the messages would be viewed by having that another flag and D_ FULLDEBUG both listed in the configuration file.

D_ DAEMONCORE
Provides log file entries specific to DaemonCore, such as timers the daemons have set and the commands that are registered. If both D_ FULLDEBUG and D_ DAEMONCORE are set, expect very verbose output.

D_ PRIV
This flag provides log messages about the privilege state switching that the daemons do. See section 3.6.12 on UIDs in Condor for details.

D_ COMMAND
With this flag set, any daemon that uses DaemonCore will print out a log message whenever a command comes in. The name and integer of the command, whether the command was sent via UDP or TCP, and where the command was sent from are all logged. Because the messages about the command used by condor_ kbdd to communicate with the condor_ startd whenever there is activity on the X server, and the command used for keep-alives are both only printed with D_ FULLDEBUG enabled, it is best if this setting is used for all daemons.

D_ LOAD
The condor_ startd keeps track of the load average on the machine where it is running. Both the general system load average, and the load average being generated by Condor's activity there are determined. With this flag set, the condor_ startd will log a message with the current state of both of these load averages whenever it computes them. This flag only affects the condor_ startd.

D_ KEYBOARD
With this flag set, the condor_ startd will print out a log message with the current values for remote and local keyboard idle time. This flag affects only the condor_ startd.

D_ JOB
When this flag is set, the condor_ startd will send to its log file the contents of any job ClassAd that the condor_ schedd sends to claim the condor_ startd for its use. This flag affects only the condor_ startd.

D_ MACHINE
When this flag is set, the condor_ startd will send to its log file the contents of its resource ClassAd when the condor_ schedd tries to claim the condor_ startd for its use. This flag affects only the condor_ startd.

D_ SYSCALLS
This flag is used to make the condor_ shadow log remote syscall requests and return values. This can help track down problems a user is having with a particular job by providing the system calls the job is performing. If any are failing, the reason for the failure is given. The condor_ schedd also uses this flag for the server portion of the queue management code. With D_ SYSCALLS defined in SCHEDD_DEBUG there will be verbose logging of all queue management operations the condor_ schedd performs.

D_ MATCH
When this flag is set, the condor_ negotiator logs a message for every match.

D_ NETWORK
When this flag is set, all Condor daemons will log a message on every TCP accept, connect, and close, and on every UDP send and receive. This flag is not yet fully supported in the condor_ shadow.

D_ HOSTNAME
When this flag is set, the Condor daemons and/or tools will print verbose messages explaining how they resolve host names, domain names, and IP addresses. This is useful for sites that are having trouble getting Condor to work because of problems with DNS, NIS or other host name resolving systems in use.

D_ CKPT
When this flag is set, the Condor process checkpoint support code, which is linked into a STANDARD universe user job, will output some low-level details about the checkpoint procedure into the $(SHADOW_LOG).

D_ SECURITY
This flag will enable debug messages pertaining to the setup of secure network communication, including messages for the negotiation of a socket authentication mechanism, the management of a session key cache. and messages about the authentication process itself. See section 3.6.1 for more information about secure communication configuration.

D_ PROCFAMILY
Condor often times needs to manage an entire family of processes, (that is, a process and all descendants of that process). This debug flag will turn on debugging output for the management of families of processes.

D_ ACCOUNTANT
When this flag is set, the condor_ negotiator will output debug messages relating to the computation of user priorities (see section 3.4).

D_ PROTOCOL
Enable debug messages relating to the protocol for Condor's matchmaking and resource claiming framework.

D_ PID
This flag is different from the other flags, because it is used to change the formatting of all log messages that are printed, as opposed to specifying what kinds of messages should be printed. If D_ PID is set, Condor will always print out the process identifier (PID) of the process writing each line to the log file. This is especially helpful for Condor daemons that can fork multiple helper-processes (such as the condor_ schedd or condor_ collector) so the log file will clearly show which thread of execution is generating each log message.

D_ FDS
This flag is different from the other flags, because it is used to change the formatting of all log messages that are printed, as opposed to specifying what kinds of messages should be printed. If D_ FDS is set, Condor will always print out the file descriptor that the open of the log file was allocated by the operating system. This can be helpful in debugging Condor's use of system file descriptors as it will generally track the number of file descriptors that Condor has open.

ALL_DEBUG
Used to make all subsystems share a debug flag. Set the parameter ALL_DEBUG instead of changing all of the individual parameters. For example, to turn on all debugging in all subsystems, set ALL_DEBUG = D_ALL.

TOOL_DEBUG
Uses the same values (debugging levels) as <SUBSYS>_DEBUG to describe the amount of debugging information sent to stderr for Condor tools.

SUBMIT_DEBUG
Uses the same values (debugging levels) as <SUBSYS>_DEBUG to describe the amount of debugging information sent to stderr for condor_ submit.

Log files may optionally be specified per debug level as follows:

<SUBSYS>_<LEVEL>_LOG
This is the name of a log file for messages at a specific debug level for a specific subsystem. If the debug level is included in $(<SUBSYS>_DEBUG), then all messages of this debug level will be written both to the $(<SUBSYS>_LOG) file and the $(<SUBSYS>_<LEVEL>_LOG) file. For example, $(SHADOW_SYSCALLS_LOG) specifies a log file for all remote system call debug messages.

MAX_<SUBSYS>_<LEVEL>_LOG
Similar to MAX_<SUBSYS>_LOG .

TRUNC_<SUBSYS>_<LEVEL>_LOG_ON_OPEN
Similar to TRUNC_<SUBSYS>_LOG_ON_OPEN .


3.3.5 DaemonCore Configuration File Entries

Please read section 3.9 for details on DaemonCore. There are certain configuration file settings that DaemonCore uses which affect all Condor daemons (except the checkpoint server, shadow, and starter, none of which use DaemonCore yet).

HOSTALLOW...
All macros that begin with either HOSTALLOW or HOSTDENY are settings for Condor's host-based security. See section 3.6.10 on Setting up IP/host-based security in Condor for details on these macros and how to configure them.

ENABLE_RUNTIME_CONFIG
The condor_ config_val tool has an option -rset for dynamically setting runtime configuration values (which only effect the in-memory configuration variables). Because of the potential security implications of this feature, by default, Condor daemons will not honor these requests. To use this functionality, Condor administrators must specifically enable it by setting ENABLE_RUNTIME_CONFIG to True, and specify what configuration variables can be changed using the SETTABLE_ATTRS... family of configuration options (described below). Defaults to False.

ENABLE_PERSISTENT_CONFIG
The condor_ config_val tool has a -set option for dynamically setting persistent configuration values. These values override options in the normal Condor configuration files. Because of the potential security implications of this feature, by default, Condor daemons will not honor these requests. To use this functionality, Condor administrators must specifically enable it by setting ENABLE_PERSISTENT_CONFIG to True, creating a directory where the Condor daemons will hold these dynamically-generated persistent configuration files (declared using PERSISTENT_CONFIG_DIR, described below) and specify what configuration variables can be changed using the SETTABLE_ATTRS... family of configuration options (described below). Defaults to False.

PERSISTENT_CONFIG_DIR
Directory where daemons should store dynamically-generated persistent configuration files (used to support condor_ config_val -set) This directory should only be writable by root, or the user the Condor daemons are running as (if non-root). There is no default, administrators that wish to use this functionality must create this directory and define this setting. This directory must not be shared by multiple Condor installations, though it can be shared by all Condor daemons on the same host. Keep in mind that this directory should not be placed on an NFS mount where ``root-squashing'' is in effect, or else Condor daemons running as root will not be able to write to them. A directory (only writable by root) on the local file system is usually the best location for this directory.

SETTABLE_ATTRS...
All macros that begin with SETTABLE_ATTRS or <SUBSYS>_SETTABLE_ATTRS are settings used to restrict the configuration values that can be changed using the condor_ config_val command. Section 3.6.10 on Setting up IP/Host-Based Security in Condor for details on these macros and how to configure them. In particular, section 3.6.10 on page [*] contains details specific to these macros.

SHUTDOWN_GRACEFUL_TIMEOUT
Determines how long Condor will allow daemons try their graceful shutdown methods before they do a hard shutdown. It is defined in terms of seconds. The default is 1800 (30 minutes).

<SUBSYS>_ADDRESS_FILE
A complete path to a file that is to contain an IP address and port number for a daemon. Every Condor daemon that uses DaemonCore has a command port where commands are sent. The IP/port of the daemon is put in that daemon's ClassAd, so that other machines in the pool can query the condor_ collector (which listens on a well-known port) to find the address of a given daemon on a given machine. When tools and daemons are all executing on the same single machine, communications do not require a query of the condor_ collector daemon. Instead, they look in a file on the local disk to find the IP/port. This macro causes daemons to write the IP/port of their command socket to a specified file. In this way, local tools will continue to operate, even if the machine running the condor_ collector crashes. Using this file will also generate slightly less network traffic in the pool, since tools including condor_ q and condor_ rm do not need to send any messages over the network to locate the condor_ schedd daemon. This macro is not necessary for the condor_ collector daemon, since its command socket is at a well-known port.

The macro is named by substituting <SUBSYS> with the appropriate subsystem string as defined in section 3.3.1.

<SUBSYS>_ATTRS or <SUBSYS>_EXPRS
Allows any DaemonCore daemon to advertise arbitrary expressions from the configuration file in its ClassAd. Give the comma-separated list of entries from the configuration file you want in the given daemon's ClassAd. Frequently used to add attributes to machines so that the machines can discriminate between other machines in a job's rank and requirements.

The macro is named by substituting <SUBSYS> with the appropriate subsystem string as defined in section 3.3.1.

<SUBSYS>_EXPRS is a historic setting that functions identically to <SUBSYS>_ATTRS. Use <SUBSYS>_ATTRS.

NOTE: The condor_ kbdd does not send ClassAds now, so this entry does not affect it. The condor_ startd, condor_ schedd, condor_ master, and condor_ collector do send ClassAds, so those would be valid subsystems to set this entry for.

SUBMIT_EXPRS not part of the <SUBSYS>_EXPRS, it is documented in section 3.3.15

Because of the different syntax of the configuration file and ClassAds, a little extra work is required to get a given entry into a ClassAd. In particular, ClassAds require quote marks (") around strings. Numeric values and boolean expressions can go in directly. For example, if the condor_ startd is to advertise a string macro, a numeric macro, and a boolean expression, do something similar to:

    STRING = This is a string 
    NUMBER = 666
    BOOL1 = True
    BOOL2 = CurrentTime >= $(NUMBER) || $(BOOL1)
    MY_STRING = "$(STRING)"
    STARTD_ATTRS = MY_STRING, NUMBER, BOOL1, BOOL2

DAEMON_SHUTDOWN
Starting with Condor version 6.9.3, whenever a daemon is about to publish a ClassAd update to the condor_ collector, it will evaluate this expression. If it evaluates to TRUE, the daemon will gracefully shut itself down and will not be restarted by the condor_ master (as if it sent itself a condor_ off command). The expression is evaluated in the context of the ClassAd that is being sent to the condor_ collector, so it can reference any attributes that can be seen with condor_status -long [-daemon_type] (for example, condor_status -long [-master] for the condor_ master). Since each daemon's ClassAd will contain different attributes, administrators should define these shutdown expressions specific to each daemon, for example:
    STARTD.DAEMON_SHUTDOWN = when to shutdown the startd
    MASTER.DAEMON_SHUTDOWN = when to shutdown the master
Normally, these expressions would not be necessary, so if not defined, they default to FALSE. One possible use case is for Condor glide-in, to have the condor_ startd shut itself down if it has not been claimed by a job after a certain period of time.

NOTE: This functionality does not work in conjunction with Condor's high-availability support (see section 3.10 on page [*] for more information). If you enable high-availability for a particular daemon, you should not define this expression.

DAEMON_SHUTDOWN_FAST
Identical to DAEMON_SHUTDOWN (defined above), except the daemon will use the fast shutdown mode (as if it sent itself a condor_ off command using the -fast option).


3.3.6 Network-Related Configuration File Entries

More information about networking in Condor can be found in section 3.7 on page [*].

BIND_ALL_INTERFACES
For systems with multiple network interfaces, if this configuration setting is not defined, Condor binds all network sockets to first interface found, or the IP address specified with NETWORK_INTERFACE (described below). Starting with version 6.7.13, BIND_ALL_INTERFACES can be set to True to cause Condor to bind to all interfaces on the machine. However, currently Condor is still only able to advertise a single IP address, even if it is listening on multiple interfaces. By default, it will advertise the IP address of the network interface used to contact the collector, since this is the most likely to be accessible to other processes which query information from the same collector. More information about using this setting can be found in section 3.7.2 on page [*].

NETWORK_INTERFACE
For systems with multiple network interfaces, if this configuration setting is not defined, Condor binds all network sockets to first interface found. To bind to a specific network interface other than the first one, this NETWORK_INTERFACE should be set to the IP address to use. When BIND_ALL_INTERFACES is set to True, this setting simply controls what IP address a given Condor host will advertise. More information about configuring Condor on machines with multiple network interfaces can be found in section 3.7.2 on page [*].

HIGHPORT
Specifies an upper limit of given port numbers for Condor to use, such that Condor is restricted to a range of port numbers. If this macro is not explicitly specified, then Condor will not restrict the port numbers that it uses. Condor will use system-assigned port numbers. For this macro to work, both HIGHPORT and LOWPORT (given below) must be defined.

LOWPORT
Specifies a lower limit of given port numbers for Condor to use, such that Condor is restricted to a range of port numbers. If this macro is not explicitly specified, then Condor will not restrict the port numbers that it uses. Condor will use system-assigned port numbers. For this macro to work, both HIGHPORT (given above) and LOWPORT must be defined.

IN_LOWPORT
An integer value that specifies a lower limit of given port numbers for Condor to use on incoming connections (ports for listening), such that Condor is restricted to a range of port numbers. This range implies the use of both IN_LOWPORT and IN_HIGHPORT. A range of port numbers less than 1024 may be used for daemons running as root. Do not specify IN_LOWPORT in combination with IN_HIGHPORT such that the range crosses the port 1024 boundary. Applies only to Unix machine configuration. Use of IN_LOWPORT and IN_HIGHPORT overrides any definition of LOWPORT and HIGHPORT.

IN_HIGHPORT
An integer value that specifies an upper limit of given port numbers for Condor to use on incoming connections (ports for listening), such that Condor is restricted to a range of port numbers. This range implies the use of both IN_LOWPORT and IN_HIGHPORT. A range of port numbers less than 1024 may be used for daemons running as root. Do not specify IN_LOWPORT in combination with IN_HIGHPORT such that the range crosses the port 1024 boundary. Applies only to Unix machine configuration. Use of IN_LOWPORT and IN_HIGHPORT overrides any definition of LOWPORT and HIGHPORT.

OUT_LOWPORT
An integer value that specifies a lower limit of given port numbers for Condor to use on outgoing connections, such that Condor is restricted to a range of port numbers. This range implies the use of both OUT_LOWPORT and OUT_HIGHPORT. A range of port numbers less than 1024 is inappropriate, as not all daemons and tools will be run as root. Applies only to Unix machine configuration. Use of OUT_LOWPORT and OUT_HIGHPORT overrides any definition of LOWPORT and HIGHPORT.

OUT_HIGHPORT
An integer value that specifies an upper limit of given port numbers for Condor to use on outgoing connections, such that Condor is restricted to a range of port numbers. This range implies the use of both OUT_LOWPORT and OUT_HIGHPORT. A range of port numbers less than 1024 is inappropriate, as not all daemons and tools will be run as root. Applies only to Unix machine configuration. Use of OUT_LOWPORT and OUT_HIGHPORT overrides any definition of LOWPORT and HIGHPORT.

UPDATE_COLLECTOR_WITH_TCP
If your site needs to use TCP connections to send ClassAd updates to your collector (which it almost certainly does NOT), set to True to enable this feature. Please read section 3.7.4 on ``Using TCP to Send Collector Updates'' on page [*] for more details and a discussion of when this functionality is needed. At this time, this setting only affects the main condor_ collector for the site, not any sites that a condor_ schedd might flock to. If enabled, also define COLLECTOR_SOCKET_CACHE_SIZE at the central manager, so that the collector will accept TCP connections for updates, and will keep them open for reuse. Defaults to False.

TCP_UPDATE_COLLECTORS
The list of collectors which will be updated with TCP instead of UDP. Please read section 3.7.4 on ``Using TCP to Send Collector Updates'' on page [*] for more details and a discussion of when a site needs this functionality. If not defined, no collectors use TCP instead of UDP.

<SUBSYS>_TIMEOUT_MULTIPLIER
An integer value that defaults to 1. This value multiplies configured timeout values for all targeted subsystem communications, thereby increasing the time until a timeout occurs. This configuration variable is intended for use by developers for debugging purposes, where communication timeouts interfere.

NONBLOCKING_COLLECTOR_UPDATE
A boolean value that defaults to True. When True, the establishment of TCP connections to the condor_ collector daemon for a security-enabled pool are done in a nonblocking manner.

NEGOTIATOR_USE_NONBLOCKING_STARTD_CONTACT
A boolean value that defaults to True. When True, the establishment of TCP connections from the condor_ negotiator daemon to the condor_ startd daemon for a security-enabled pool are done in a nonblocking manner.

The following settings are specific to enabling Generic Connection Brokering or GCB in your Condor pool. More information about GCB and how to configure it can be found in section 3.7.3 on page [*].

NET_REMAP_ENABLE
A boolean variable, that when defined to True, enables a network remapping service for Condor. The service to use is controlled by NET_REMAP_SERVICE. This boolean value defaults to False.

NET_REMAP_SERVICE
If NET_REMAP_ENABLE is defined to True, this setting controls what network remapping service should be used. Currently, the only value supported is GCB. The default is undefined.

NET_REMAP_INAGENT
A comma or space-separated list of IP addresses for GCB brokers. Upon start up, the condor_ master chooses one at random from among the working brokers in the list. There is no default if not defined.

NET_REMAP_ROUTE
Hosts with the GCB network remapping service enabled that would like to use a GCB routing table GCB broker specify the full path to their routing table with this setting. There is no default value if undefined.

MASTER_WAITS_FOR_GCB_BROKER
A boolean value that defaults to True. This variable determines the behavior of the condor_ master with GCB enabled. With no GCB broker working upon either the start up of the condor_ master, or once the condor_ master has successfully communicated with a GCB broker, but the communication fails, if MASTER_WAITS_FOR_GCB_BROKER is True, the condor_ master waits while attempting to find a working GCB broker. With no GCB broker working upon the start up of the condor_ master, if MASTER_WAITS_FOR_GCB_BROKER is False, the condor_ master fails and exits, without restarting. Once the condor_ master has successfully communicated with a GCB broker, but the communication fails, if MASTER_WAITS_FOR_GCB_BROKER is False, the condor_ master kills all its children, exits, and restarts.

The set up task of condor_ glidein explicitly sets MASTER_WAITS_FOR_GCB_BROKER to False in the configuration file it produces.


3.3.7 Shared File System Configuration File Macros

These macros control how Condor interacts with various shared and network file systems. If you are using AFS as your shared file system, be sure to read section 3.12.1 on Using Condor with AFS. For information on submitting jobs under shared file systems, see section 2.5.3.

UID_DOMAIN
The UID_DOMAIN macro is used to decide under which user to run jobs. If the $(UID_DOMAIN) on the submitting machine is different than the $(UID_DOMAIN) on the machine that runs a job, then Condor runs the job as the user nobody. For example, if the submit machine has a $(UID_DOMAIN) of flippy.cs.wisc.edu, and the machine where the job will execute has a $(UID_DOMAIN) of cs.wisc.edu, the job will run as user nobody, because the two $(UID_DOMAIN)s are not the same. If the $(UID_DOMAIN) is the same on both the submit and execute machines, then Condor will run the job as the user that submitted the job.

A further check attempts to assure that the submitting machine can not lie about its UID_DOMAIN. Condor compares the submit machine's claimed value for UID_DOMAIN to its fully qualified name. If the two do not end the same, then the submit machine is presumed to be lying about its UID_DOMAIN. In this case, Condor will run the job as user nobody. For example, a job submission to the Condor pool at the UW Madison from flippy.example.com, claiming a UID_DOMAIN of of cs.wisc.edu, will run the job as the user nobody.

Because of this verification, $(UID_DOMAIN) must be a real domain name. At the Computer Sciences department at the UW Madison, we set the $(UID_DOMAIN) to be cs.wisc.edu to indicate that whenever someone submits from a department machine, we will run the job as the user who submits it.

Also see SOFT_UID_DOMAIN below for information about one more check that Condor performs before running a job as a given user.

A few details:

An administrator could set UID_DOMAIN to *. This will match all domains, but it is a gaping security hole. It is not recommended.

An administrator can also leave UID_DOMAIN undefined. This will force Condor to always run jobs as user nobody. Running standard universe jobs as user nobody enhances security and should cause no problems, because the jobs use remote I/O to access all of their files. However, if vanilla jobs are run as user nobody, then files that need to be accessed by the job will need to be marked as world readable/writable so the user nobody can access them.

When Condor sends e-mail about a job, Condor sends the e-mail to user@$(UID_DOMAIN). If UID_DOMAIN is undefined, the e-mail is sent to user@submitmachinename.

TRUST_UID_DOMAIN
As an added security precaution when Condor is about to spawn a job, it ensures that the UID_DOMAIN of a given submit machine is a substring of that machine's fully-qualified host name. However, at some sites, there may be multiple UID spaces that do not clearly correspond to Internet domain names. In these cases, administrators may wish to use names to describe the UID domains which are not substrings of the host names of the machines. For this to work, Condor must not do this regular security check. If the TRUST_UID_DOMAIN setting is defined to True, Condor will not perform this test, and will trust whatever UID_DOMAIN is presented by the submit machine when trying to spawn a job, instead of making sure the submit machine's host name matches the UID_DOMAIN. When not defined, the default is False, since it is more secure to perform this test.

SOFT_UID_DOMAIN
A boolean variable that defaults to False when not defined. When Condor is about to run a job as a particular user (instead of as user nobody), it verifies that the UID given for the user is in the password file and actually matches the given user name. However, under installations that do not have every user in every machine's password file, this check will fail and the execution attempt will be aborted. To cause Condor not to do this check, set this configuration variable to True. Condor will then run the job under the user's UID.

SLOTx_USER
The name of a user for Condor to use instead of user nobody, as part of a solution that plugs a security hole whereby a lurker process can prey on a subsequent job run as user name nobody. x is an integer associated with slots. On Windows, SLOTx_USER will only work if the credential of the specified user is stored on the execute machine using condor_ store_cred. See Section 3.6.12 for more information.

STARTER_ALLOW_RUNAS_OWNER
This is a boolean expression (evaluated with the job ad as the target) that determins whether the job may run under the job owner's account (true) or whether it will run as SLOTx_USER or nobody (false). In unix, this defaults to true. In windows, it defaults to false. The job ClassAd may also contain an attribute RunAsOwner which is logically ANDed with the starter's boolean value. Under unix, if the job does not specify it, this attribute defaults to true. Under windows, it defaults to false. In unix, if the UidDomain of the machine and job do not match, then there is no possibility to run the job as the owner anyway, so, in that case, this setting has no effect. See Section 3.6.12 for more information.

DEDICATED_EXECUTE_ACCOUNT_REGEXP
This is a regular expression (i.e. a string matching pattern) that matches the account name(s) that are dedicated to running condor jobs on the execute machine and which will never be used for more than one job at a time. The default matches no account name. If you have configured SLOTx_USER to be a different account for each Condor slot, and no non-condor processes will ever be run by these accounts, then this pattern should match the names of all SLOTx_USER accounts. Jobs run under a dedicated execute account are reliably tracked by Condor, whereas other jobs, may spawn processes that Condor fails to detect. Therefore, a dedicated execution account provides more reliable tracking of CPU usage by the job and it also guarantees that when the job exits, no ``lurker'' processes are left behind. When the job exits, condor will attempt to kill all processes owned by the dedicated execution account. Example:

SLOT1_USER = cndrusr1
SLOT2_USER = cndrusr2
STARTER_ALLOW_RUNAS_OWNER = False
DEDICATED_EXECUTE_ACCOUNT_REGEXP = cndrusr[0-9]+

You can tell if the starter is in fact treating the account as a dedicated account, because it will print a line such as the following in its log file:

Tracking process family by login "cndrusr1"

EXECUTE_LOGIN_IS_DEDICATED
This configuration setting is deprecated because it cannot handle the case where some jobs run as dedicated accounts and some do not. Use DEDICATED_EXECUTE_ACCOUNT_REGEXP instead.

A boolean value that defaults to False. When True, Condor knows that all jobs are being run by dedicated execution accounts (whether they are running as the job owner or as nobody or as SLOTx_USER). Therefore, when the job exits, all processes running under the same account will be killed.

FILESYSTEM_DOMAIN
The FILESYSTEM_DOMAIN macro is an arbitrary string that is used to decide if two machines (a submitting machine and an execute machine) share a file system. Although the macro name contains the word ``DOMAIN'', the macro is not required to be a domain name. It often is a domain name.

Note that this implementation is not ideal: machines may share some file systems but not others. Condor currently has no way to express this automatically. You can express the need to use a particular file system by adding additional attributes to your machines and submit files, similar to the example given in Frequently Asked Questions, section 7 on how to run jobs only on machines that have certain software packages.

Note that if you do not set $(FILESYSTEM_DOMAIN), Condor defaults to setting the macro's value to be the fully qualified hostname of the local machine. Since each machine will have a different $(FILESYSTEM_DOMAIN), they will not be considered to have shared file systems.

RESERVE_AFS_CACHE
If your machine is running AFS and the AFS cache lives on the same partition as the other Condor directories, and you want Condor to reserve the space that your AFS cache is configured to use, set this macro to True. It defaults to False.

USE_NFS
This macro influences how Condor jobs running in the standard universe access their files. Condor will redirect the file I/O requests of standard universe jobs to be executed on the machine which submitted the job. Because of this, as a Condor job migrates around the network, the file system always appears to be identical to the file system where the job was submitted. However, consider the case where a user's data files are sitting on an NFS server. The machine running the user's program will send all I/O over the network to the machine which submitted the job, which in turn sends all the I/O over the network a second time back to the NFS file server. Thus, all of the program's I/O is being sent over the network twice.

If this macro to True, then Condor will attempt to read/write files without redirecting I/O back to the submitting machine if both the submitting machine and the machine running the job are both accessing the same NFS servers (if they are both in the same $(FILESYSTEM_DOMAIN) and in the same $(UID_DOMAIN), as described above). The result is I/O performed by Condor standard universe jobs is only sent over the network once. While sending all file operations over the network twice might sound really bad, unless you are operating over networks where bandwidth as at a very high premium, practical experience reveals that this scheme offers very little real performance gain. There are also some (fairly rare) situations where this scheme can break down.

Setting $(USE_NFS) to False is always safe. It may result in slightly more network traffic, but Condor jobs are most often heavy on CPU and light on I/O. It also ensures that a remote standard universe Condor job will always use Condor's remote system calls mechanism to reroute I/O and therefore see the exact same file system that the user sees on the machine where she/he submitted the job.

Some gritty details for folks who want to know: If the you set $(USE_NFS) to True, and the $(FILESYSTEM_DOMAIN) of both the submitting machine and the remote machine about to execute the job match, and the $(FILESYSTEM_DOMAIN) claimed by the submit machine is indeed found to be a subset of what an inverse lookup to a DNS (domain name server) reports as the fully qualified domain name for the submit machine's IP address (this security measure safeguards against the submit machine from lying), then the job will access files using a local system call, without redirecting them to the submitting machine (with NFS). Otherwise, the system call will get routed back to the submitting machine using Condor's remote system call mechanism. NOTE: When submitting a vanilla job, condor_ submit will, by default, append requirements to the Job ClassAd that specify the machine to run the job must be in the same $(FILESYSTEM_DOMAIN) and the same $(UID_DOMAIN).

IGNORE_NFS_LOCK_ERRORS
When set to True, all errors related to file locking errors from NFS are ignored. Defaults to False, not ignoring errors.

USE_AFS
If your machines have AFS, this macro determines whether Condor will use remote system calls for standard universe jobs to send I/O requests to the submit machine, or if it should use local file access on the execute machine (which will then use AFS to get to the submitter's files). Read the setting above on $(USE_NFS) for a discussion of why you might want to use AFS access instead of remote system calls.

One important difference between $(USE_NFS) and $(USE_AFS) is the AFS cache. With $(USE_AFS) set to True, the remote Condor job executing on some machine will start modifying the AFS cache, possibly evicting the machine owner's files from the cache to make room for its own. Generally speaking, since we try to minimize the impact of having a Condor job run on a given machine, we do not recommend using this setting.

While sending all file operations over the network twice might sound really bad, unless you are operating over networks where bandwidth as at a very high premium, practical experience reveals that this scheme offers very little real performance gain. There are also some (fairly rare) situations where this scheme can break down.

Setting $(USE_AFS) to False is always safe. It may result in slightly more network traffic, but Condor jobs are usually heavy on CPU and light on I/O. False ensures that a remote standard universe Condor job will always see the exact same file system that the user on sees on the machine where he/she submitted the job. Plus, it will ensure that the machine where the job executes does not have its AFS cache modified as a result of the Condor job being there.

However, things may be different at your site, which is why the setting is there.


3.3.8 Checkpoint Server Configuration File Macros

These macros control whether or not Condor uses a checkpoint server. If you are using a checkpoint server, this section describes the settings that the checkpoint server itself needs defined. A checkpoint server is installed separately. It is not included in the main Condor binary distribution or installation procedure. See section 3.8 on Installing a Checkpoint Server for details on installing and running a checkpoint server for your pool.

NOTE: If you are setting up a machine to join the UW-Madison CS Department Condor pool, you should configure the machine to use a checkpoint server, and use ``condor-ckpt.cs.wisc.edu'' as the checkpoint server host (see below).

CKPT_SERVER_HOST
The hostname of a checkpoint server.

STARTER_CHOOSES_CKPT_SERVER
If this parameter is True or undefined on the submit machine, the checkpoint server specified by $(CKPT_SERVER_HOST) on the execute machine is used. If it is False on the submit machine, the checkpoint server specified by $(CKPT_SERVER_HOST) on the submit machine is used.

CKPT_SERVER_DIR
The checkpoint server needs this macro defined to the full path of the directory the server should use to store checkpoint files. Depending on the size of your pool and the size of the jobs your users are submitting, this directory (and its subdirectories) might need to store many Mbytes of data.

USE_CKPT_SERVER
A boolean which determines if you want a given submit machine to use a checkpoint server if one is available. If a checkpoint server isn't available or USE_CKPT_SERVER is set to False, checkpoints will be written to the local $(SPOOL) directory on the submission machine.

MAX_DISCARDED_RUN_TIME
If the shadow is unable to read a checkpoint file from the checkpoint server, it keeps trying only if the job has accumulated more than this many seconds of CPU usage. Otherwise, the job is started from scratch. Defaults to 3600 (1 hour). This setting is only used if $(USE_CKPT_SERVER) is True.


3.3.9 condor_ master Configuration File Macros

These macros control the condor_ master.

DAEMON_LIST
This macro determines what daemons the condor_ master will start and keep its watchful eyes on. The list is a comma or space separated list of subsystem names (listed in section 3.3.1). For example,
    DAEMON_LIST = MASTER, STARTD, SCHEDD

NOTE: This configuration variable cannot be changed by using condor_ reconfig or by sending a SIGHUP. To change this configuration variable, restart the condor_ master daemon by using condor_ restart. Only then will the change take effect.

NOTE: On your central manager, your $(DAEMON_LIST) will be different from your regular pool, since it will include entries for the condor_ collector and condor_ negotiator.

NOTE: On machines running Digital Unix, your $(DAEMON_LIST) will also include KBDD, for the condor_ kbdd, which is a special daemon that runs to monitor keyboard and mouse activity on the console. It is only with this special daemon that we can acquire this information on those platforms.

DC_DAEMON_LIST
This macro lists the daemons in DAEMON_LIST which use the Condor DaemonCore library. The condor_ master must differentiate between daemons that use DaemonCore and those that don't so it uses the appropriate inter-process communication mechanisms. This list currently includes all Condor daemons except the checkpoint server by default.

<SUBSYS>
Once you have defined which subsystems you want the condor_ master to start, you must provide it with the full path to each of these binaries. For example:
    MASTER          = $(SBIN)/condor_master
    STARTD          = $(SBIN)/condor_startd
    SCHEDD          = $(SBIN)/condor_schedd
These are most often defined relative to the $(SBIN) macro.

The macro is named by substituting <SUBSYS> with the appropriate subsystem string as defined in section 3.3.1.

DAEMONNAME_ENVIRONMENT
For each subsystem defined in DAEMON_LIST, you may specify changes to the environment that daemon is started with by setting DAEMONNAME_ENVIRONMENT, where DAEMONNAME is the name of a daemon listed in DAEMON_LIST. It should use the same syntax for specifying the environment as the environment specification in a condor_ submit file (see page [*]). For example, if you wish to redefine the TMP and CONDOR_CONFIG environment variables seen by the condor_ schedd, you could place the following in the config file:
    SCHEDD_ENVIRONMENT = "TMP=/new/value CONDOR_CONFIG=/special/config"
When the condor_ schedd was started by the condor_ master, it would see the specified values of TMP and CONDOR_CONFIG.

<SUBSYS>_ARGS
This macro allows the specification of additional command line arguments for any process spawned by the condor_ master. List the desired arguments using the same syntax as the arguments specification in a condor_ submit submit file (see page [*]), with one exception: do not escape double-quotes when using the old-style syntax (this is for backward compatibility). Set the arguments for a specific daemon with this macro, and the macro will affect only that daemon. Define one of these for each daemon the condor_ master is controlling. For example, set $(STARTD_ARGS) to specify any extra command line arguments to the condor_ startd.

The macro is named by substituting <SUBSYS> with the appropriate subsystem string as defined in section 3.3.1.

PREEN
In addition to the daemons defined in $(DAEMON_LIST), the condor_ master also starts up a special process, condor_ preen to clean out junk files that have been left laying around by Condor. This macro determines where the condor_ master finds the condor_ preen binary. Comment out this macro, and condor_ preen will not run.

PREEN_ARGS
Controls how condor_ preen behaves by allowing the specification of command-line arguments. This macro works as $(<SUBSYS>_ARGS) does. The difference is that you must specify this macro for condor_ preen if you want it to do anything. condor_ preen takes action only because of command line arguments. -m means you want e-mail about files condor_ preen finds that it thinks it should remove. -r means you want condor_ preen to actually remove these files.

PREEN_INTERVAL
This macro determines how often condor_ preen should be started. It is defined in terms of seconds and defaults to 86400 (once a day).

PUBLISH_OBITUARIES
When a daemon crashes, the condor_ master can send e-mail to the address specified by $(CONDOR_ADMIN) with an obituary letting the administrator know that the daemon died, the cause of death (which signal or exit status it exited with), and (optionally) the last few entries from that daemon's log file. If you want obituaries, set this macro to True.

OBITUARY_LOG_LENGTH
This macro controls how many lines of the log file are part of obituaries. This macro has a default value of 20 lines.

START_MASTER
If this setting is defined and set to False when the condor_ master starts up, the first thing it will do is exit. This appears strange, but perhaps you do not want Condor to run on certain machines in your pool, yet the boot scripts for your entire pool are handled by a centralized This is an entry you would most likely find in a local configuration file, not a global configuration file.

START_DAEMONS
This macro is similar to the $(START_MASTER) macro described above. However, the condor_ master does not exit; it does not start any of the daemons listed in the $(DAEMON_LIST). The daemons may be started at a later time with a condor_ on command.

MASTER_UPDATE_INTERVAL
This macro determines how often the condor_ master sends a ClassAd update to the condor_ collector. It is defined in seconds and defaults to 300 (every 5 minutes).

MASTER_CHECK_NEW_EXEC_INTERVAL
This macro controls how often the condor_ master checks the timestamps of the running daemons. If any daemons have been modified, the master restarts them. It is defined in seconds and defaults to 300 (every 5 minutes).

MASTER_NEW_BINARY_DELAY
Once the condor_ master has discovered a new binary, this macro controls how long it waits before attempting to execute the new binary. This delay exists because the condor_ master might notice a new binary while it is in the process of being copied, in which case trying to execute it yields unpredictable results. The entry is defined in seconds and defaults to 120 (2 minutes).

SHUTDOWN_FAST_TIMEOUT
This macro determines the maximum amount of time daemons are given to perform their fast shutdown procedure before the condor_ master kills them outright. It is defined in seconds and defaults to 300 (5 minutes).

MASTER_BACKOFF_CONSTANT and MASTER_<name>_BACKOFF_CONSTANT
When a daemon crashes, condor_ master uses an exponential back off delay before restarting it; see the discussion at the end of this section for a detailed discussion on how these parameters work together. These settings define the constant value of the expression used to determine how long to wait before starting the daemon again (and, effectively becomes the initial backoff time). It is an integer in units of seconds, and defaults to 9 seconds.

$(MASTER_<name>_BACKOFF_CONSTANT) is the daemon-specific form of MASTER_BACKOFF_CONSTANT; if this daemon-specific macro is not defined for a specific daemon, the non-daemon-specific value will used.

MASTER_BACKOFF_FACTOR and MASTER_<name>_BACKOFF_FACTOR
When a daemon crashes, condor_ master uses an exponential back off delay before restarting it; see the discussion at the end of this section for a detailed discussion on how these parameters work together. This setting is the base of the exponent used to determine how long to wait before starting the daemon again. It defaults to 2 seconds.

$(MASTER_<name>_BACKOFF_FACTOR) is the daemon-specific form of MASTER_BACKOFF_FACTOR; if this daemon-specific macro is not defined for a specific daemon, the non-daemon-specific value will used.

MASTER_BACKOFF_CEILING and MASTER_<name>_BACKOFF_CEILING
When a daemon crashes, condor_ master uses an exponential back off delay before restarting it; see the discussion at the end of this section for a detailed discussion on how these parameters work together. This entry determines the maximum amount of time you want the master to wait between attempts to start a given daemon. (With 2.0 as the $(MASTER_BACKOFF_FACTOR), 1 hour is obtained in 12 restarts). It is defined in terms of seconds and defaults to 3600 (1 hour).

$(MASTER_<name>_BACKOFF_CEILING) is the daemon-specific form of MASTER_BACKOFF_CEILING; if this daemon-specific macro is not defined for a specific daemon, the non-daemon-specific value will used.

MASTER_RECOVER_FACTOR and MASTER_<name>_RECOVER_FACTOR
A macro to set how long a daemon needs to run without crashing before it is considered recovered. Once a daemon has recovered, the number of restarts is reset, so the exponential back off returns to its initial state. The macro is defined in terms of seconds and defaults to 300 (5 minutes).

$(MASTER_<name>_RECOVER_FACTOR) is the daemon-specific form of MASTER_RECOVER_FACTOR; if this daemon-specific macro is not defined for a specific daemon, the non-daemon-specific value will used.

When a daemon crashes, condor_ master will restart the daemon after a delay (a back off). The length of this delay is based on how many times it has been restarted, and gets larger after each crashes. The equation for calculating this backoff time is given by:

t = c + kn

where t is the calculated time, c is the constant defined by $(MASTER_BACKOFF_CONSTANT), k is the ``factor'' defined by $(MASTER_BACKOFF_FACTOR), and n is the number of restarts already attempted (0 for the first restart, 1 for the next, etc.).

With default values, after the first crash, the delay would be t = 9 + 2.00, giving 10 seconds (remember, n = 0). If the daemon keeps crashing, the delay increases.

For example, take the $(MASTER_BACKOFF_FACTOR) (which defaults to 2.0) to the power the number of times the daemon has restarted, and add $(MASTER_BACKOFF_CONSTANT) (which defaults to 9). Thus:

1st crash: n = 0, so: $t = 9 + 2^0 = 9 + 1 = 10\ seconds$

2nd crash: n = 1, so: $t = 9 + 2^1 = 9 + 2 = 11\ seconds$

3rd crash: n = 2, so: $t = 9 + 2^2 = 9 + 4 = 13\ seconds$

...

6th crash: n = 5, so: $t = 9 + 2^5 = 9 + 32 = 41\ seconds$

...

9th crash: n = 8, so: $t = 9 + 2^8 = 9 + 256 = 265\ seconds$

And, after the 13 crashes, it would be:

13th crash: n = 12, so: $t = 9 + 2^{12} = 9 + 4096 = 4105\ seconds$

This is bigger than the $(MASTER_BACKOFF_CEILING), which defaults to 3600, so the daemon would really be restarted after only 3600 seconds, not 4105. The condor_ master tries again every hour (since the num