This section describes the configuration of the condor_startd to implement the desired policy for when remote jobs should start, be suspended, (possibly) resumed, vacate (with a checkpoint) or be killed (no checkpoint). This policy is the heart of Condor's balancing act between the needs and wishes of resource owners (machine owners) and resource users (people submitting their jobs to Condor). Please read this section carefully if you plan to change any of the settings described here, as getting it wrong can have a severe impact on either the owners of machines in your pool (they may ask to be removed from the pool entirely) or the users of your pool (they may stop using Condor).
Before we get into the details, there are a few things to note:
which explains the differences
between the version 6.0 policy expressions and later versions.
To define your policy, you set expressions in the configuration file (see section 3.3 on Configuring Condor for an introduction to Condor's configuration files). The expressions are evaluated in the context of the machine's ClassAd and a job ClassAd. The expressions can therefore reference attributes from either ClassAd. Listed in this section are the attributes that are included in the machine's ClassAd and the attributes that are included in a job ClassAd. The START expression, which describes to Condor what conditions must be met for a machine to start a job are explained. The RANK expression is described. It allows the specification of the kinds of jobs a machine prefers to run. A final discussion details how the condor_startd works. Included are the machine states and activities, to give an idea of what is possible in policy decisions. Two example policy settings are presented.
The condor_startd represents the machine on which it is running to the Condor pool. It publishes characteristics about the machine in its ClassAd to aid matchmaking with resource requests. The values of these attributes can be found by using condor_status -l hostname. On an SMP machine, the startd will break the machine up and advertise it as separate virtual machines, each with its own name and ClassAd. The attributes themselves and what they represent are described below:
In addition, there are a few attributes that are automatically inserted into the machine ClassAd whenever a resource is in the Claimed state:
Finally, there are a few attributes that are only inserted into the machine ClassAd if a job is currently executing. If the resource is claimed but no job are running, none of these attributes will be defined.
152.3), like you would see in condor_q on the submitting machine.
The most important expression in the startd (and possibly in all of Condor) is the START expression. This expression describes the conditions to must be met for a machine to service a resource request (in other words, to start a job). This expression (like other expressions) can reference attributes in the machine's ClassAd (such as KeyboardIdle and LoadAvg) or attributes in a job ClassAd (such as Owner, Imagesize, and even Cmd, the name of the executable the requester wants to run). What the START expression evaluates to plays a crucial role in determining the state and activity of a machine.
It is the Requirements expression that is used for matching with other jobs. The startd defines the Requirements expression as the START expression. However, in situations where a machine wants to make itself unavailable for further matches, it sets its Requirements expression to FALSE, not its START expression. When the START expression locally evaluates to TRUE, the machine advertises the Requirements expression as TRUE and does not publish the START expression.
Normally, the expressions in the machine ClassAd are evaluated against certain request ClassAds in the condor_negotiator to see if there is a match, or against whatever request ClassAd currently has claimed the machine. However, by locally evaluating an expression, the machine only evaluates the expression against its own ClassAd. If an expression cannot be locally evaluated (because it references other expressions that are only found in a request ad, such as Owner or Imagesize), the expression is (usually) undefined. See section 4.1 for specifics on how undefined terms are handled in ClassAd expression evaluation.
NOTE: If you have machines with lots of real memory and swap space so the only scarce resource is CPU time, you could use JOB_RENICE_INCREMENT (see section 3.3.12 on condor_starter Configuration File Macros for details) so that Condor starts jobs on your machine with low priority. Then, set up your machines with:
START = True
SUSPEND = False
PREEMPT = False
KILL = False
In this way, Condor jobs always run and never be kicked off.
However, because they would run with ``nice priority'', interactive
response on your machines will not suffer.
You probably would not notice Condor was running the jobs,
assuming you had enough free memory for the Condor jobs that there
was little swapping.
A machine may be configured to prefer certain jobs over others using the RANK expression. It is an expression, like any other in a machine ClassAd. It can reference any attribute found in either the machine ClassAd or a request ad (normally, in fact, it references things in the request ad). The most common use of this expression is likely to configure a machine to prefer to run jobs from the owner of that machine, or by extension, a group of machines to prefer jobs from the owners of those machines.
For example, imagine there is a small research group with 4 machines called tenorsax, piano, bass, and drums. These machines are owned by the 4 users coltrane, tyner, garrison, and jones, respectively.
Assume that there is a large Condor pool in your department, but you spent a lot of money on really fast machines for your group. You want to implement a policy that gives priority on your machines to anyone in your group. To achieve this, set the RANK expression on your machines to reference the Owner attribute and prefer requests where that attribute matches one of the people in your group as in
RANK = Owner == "coltrane" || Owner == "tyner" \
|| Owner == "garrison" || Owner == "jones"
The RANK expression is evaluated as a floating point number. However, like in C, boolean expressions evaluate to either 1 or 0 depending on if they are TRUE or FALSE. So, if this expression evaluated to 1 (because the remote job was owned by one of the preferred users), it would be a larger value than any other user (for whom the expression would evaluate to 0).
A more complex RANK expression has the same basic set up, where anyone from your group has priority on your machines. Its difference is that the machine owner has better priority on their own machine. To set this up for Jimmy Garrison, place the following entry in Jimmy Garrison's local configuration file bass.local:
RANK = (Owner == "coltrane") + (Owner == "tyner") \
+ ((Owner == "garrison") * 10) + (Owner == "jones")
NOTE: The parentheses in this expression are important, because ``+''
operator has higher default precedence than ``==''.
The use of ``+'' instead of ``| | '' allows us to distinguish which terms matched and which ones didn't. If anyone not in the John Coltrane quartet was running a job on the machine called bass, the RANK would evaluate numerically to 0, since none of the boolean terms evaluates to 1, and 0+0+0+0 still equals 0.
Suppose Elvin Jones submits a job. His job would match this machine (assuming the START was True for him at that time) and the RANK would numerically evaluate to 1. Therefore, Elvin would preempt the Condor job currently running. Assume that later Jimmy submits a job. The RANK evaluates to 10, since the boolean that matches Jimmy gets multiplied by 10. Jimmy would preempt Elvin, and Jimmy's job would run on Jimmy's machine.
The RANK expression is not required to reference the Owner of the jobs. Perhaps there is one machine with an enormous amount of memory, and others with not much at all. You can configure your large-memory machine to prefer to run jobs with larger memory requirements:
RANK = ImageSize
That's all there is to it. The bigger the job, the more this machine wants to run it. It is an altruistic preference, always servicing the largest of jobs, no matter who submitted them. A little less altruistic is John's RANK that prefers his jobs over those with the largest Imagesize:
RANK = (Owner == "coltrane" * 1000000000000) + Imagesize
This RANK breaks if a job is submitted with an image
size of more 1012 Kbytes.
However, with that size, this RANK expression
preferring that job would not be Condors
only problem!
A machine is assigned a state by Condor. The state depends on whether or not the machine is available to run Condor jobs, and if so, what point in the negotiations has been reached. The possible states are
Figure 3.3 shows the states and the possible transitions between the states.
Within some machine states, activities of the machine are defined. The state has meaning regardless of activity. Differences between activities are significant. Therefore, a ``state/activity'' pair describes a machine. The following list describes all the possible state/activity pairs.
Figure 3.4 on
page
gives the overall view of all
machine states and activities and shows the possible transitions
from one to another within the Condor system.
Each transition is labeled with a number on the diagram, and
transition numbers referred to in this manual will be bold.
Various expressions are used to determine when and if many of these state and activity transitions occur. Other transitions are initiated by parts of the Condor protocol (such as when the condor_negotiator matches a machine with a schedd). The following section describes the conditions that lead to the various state and activity transitions.
This section traces through all possible state and activity transitions within a machine and describes the conditions under which each one occurs. Whenever a transition occurs, Condor records when the machine entered its new activity and/or new state. These times are often used to write expressions that determine when further transitions occurred. For example, enter the Killing activity if a machine has been in the Vacating activity longer than a specified amount of time.
When the startd is first spawned, the machine it represents enters the Owner state. The machine will remain in this state as long as the START expression locally evaluates to FALSE. If the START locally evaluates to TRUE or cannot be locally evaluated (it evaluates to UNDEFINED, transition 1 occurs and the machine enters the Unclaimed state.
As long as the START expression evaluates locally to FALSE, there is no possible request in the Condor system that could match it. The machine is unavailable to Condor and stays in the Owner state. For example, if the START expression is
START = KeyboardIdle > 15 * $(MINUTE) && Owner == "coltrane"and if KeyboardIdle is 34 seconds, then the machine would remain in the Owner state. Owner is undefined, and
anything && FALSE is FALSE.
If, however, the START expression is
START = KeyboardIdle > 15 * $(MINUTE) || Owner == "coltrane"
and KeyboardIdle is 34 seconds, then the machine
leaves the Owner state and becomes Unclaimed.
This is because
FALSE || UNDEFINED is UNDEFINED.
So, while this machine is not available to just anybody,
if user coltrane has jobs submitted, the machine is willing to run them.
Any other user's jobs have to wait
until KeyboardIdle exceeds 15 minutes.
However, since coltrane might claim this resource,
but has not yet, the machine goes to the Unclaimed state.
While in the Owner state, the startd polls the status of the machine every UPDATE_INTERVAL to see if anything has changed that would lead it to a different state. This minimizes the impact on the Owner while the Owner is using the machine. Frequently waking up, computing load averages, checking the access times on files, computing free swap space take time, and there is nothing time critical that the startd needs to be sure to notice as soon as it happens. If the START expression evaluates to TRUE and five minutes pass before the startd notices, that's a drop in the bucket of high-throughput computing.
The machine can only transition to the Unclaimed state from the Owner state. It only does so when the START expression no longer locally evaluates to FALSE. In general, if the START expression locally evaluates to FALSE at any time, the machine will either transition directly to the Owner state or to the Preempting state on its way to the Owner state, if there is a job running that needs preempting.
While in the Unclaimed state, if the START expression locally evaluates to FALSE, the machine returns to the Owner state by transition 2.
When in the Unclaimed state, the RunBenchmarks expression is relevant. If RunBenchmarks evaluates to TRUE while the machine is in the Unclaimed state, then the machine will transition from the Idle activity to the Benchmarking activity (transition 3) and perform benchmarks to determine MIPS and KFLOPS. When the benchmarks complete, the machine returns to the Idle activity (transition 4).
The startd automatically inserts an attribute, LastBenchmark, whenever it runs benchmarks, so commonly RunBenchmarks is defined in terms of this attribute, for example:
BenchmarkTimer = (CurrentTime - LastBenchmark)
RunBenchmarks = $(BenchmarkTimer) >= (4 * $(HOUR))
Here, a macro, BenchmarkTimer is defined to help write the
expression.
This macro holds the time since the last benchmark,
so when this time exceeds 4 hours, we run the benchmarks again.
The startd keeps a weighted average of these benchmarking
results to try to get the most accurate numbers possible.
This is why
it is desirable for
the startd to run them more than once in its lifetime.
NOTE: LastBenchmark is initialized to 0 before benchmarks have ever been run. So, if you want the startd to run benchmarks as soon as the machine is Unclaimed (if it hasn't done so already), include a term for LastBenchmark as in the example above.
NOTE: If RunBenchmarks is defined and set to something other than FALSE, the startd will automatically run one set of benchmarks when it first starts up. To disable benchmarks, both at startup and at any time thereafter, set RunBenchmarks to FALSE or comment it out of the configuration file.
From the Unclaimed state, the machine can go to two other possible states: Matched or Claimed/Idle. Once the condor_negotiator matches an Unclaimed machine with a requester at a given schedd, the negotiator sends a command to both parties, notifying them of the match. If the schedd receives that notification and initiates the claiming procedure with the machine before the negotiator's message gets to the machine, the Match state is skipped, and the machine goes directly to the Claimed/Idle state (transition 5). However, normally the machine will enter the Matched state (transition 6), even if it is only for a brief period of time.
The Matched state is not very interesting to Condor. Noteworthy in this state is that the machine lies about its START expression while in this state and says that Requirements are false to prevent being matched again before it has been claimed. Also interesting is that the startd starts a timer to make sure it does not stay in the Matched state too long. The timer is set with the MATCH_TIMEOUT configuration file macro. It is specified in seconds and defaults to 300 (5 minutes). If the schedd that was matched with this machine does not claim it within this period of time, the machine gives up, and goes back into the Owner state via transition 7. It will probably leave the Owner state right away for the Unclaimed state again and wait for another match.
At any time while the machine is in the Matched state, if the START expression locally evaluates to FALSE, the machine enters the Owner state directly (transition 7).
If the schedd that was matched with the machine claims it before the MATCH_TIMEOUT expires, the machine goes into the Claimed/Idle state (transition 8).
The Claimed state is certainly the most complex state. It has the most possible activities and the most expressions that determine its next activities. In addition, the condor_checkpoint and condor_vacate commands affect the machine when it is in the Claimed state. In general, there are two sets of expressions that might take effect. They depend on the universe of the request: standard or vanilla. The standard universe expressions are the normal expressions. For example:
WANT_SUSPEND = True
WANT_VACATE = $(ActivationTimer) > 10 * $(MINUTE)
SUSPEND = $(KeyboardBusy) || $(CPUBusy)
...
The vanilla expressions have the string``_VANILLA'' appended to their names. For example:
WANT_SUSPEND_VANILLA = True
WANT_VACATE_VANILLA = True
SUSPEND_VANILLA = $(KeyboardBusy) || $(CPUBusy)
...
Without specific vanilla versions, the normal versions will be used for all jobs, including vanilla jobs. In this manual, the normal expressions are referenced. The difference exists for the the resource owner that might want the machine to behave differently for vanilla jobs, since they cannot checkpoint. For example, owners may want vanilla jobs to remain suspended for longer than standard jobs.
While Claimed, the POLLING_INTERVAL takes effect, and the startd polls the machine much more frequently to evaluate its state.
If the machine owner starts typing on the console again, it is best to notice this as soon as possible to be able to start doing whatever the machine owner wants at that point. For SMP machines, if any virtual machine is in the Claimed state, the startd polls the machine frequently. If already polling one virtual machine, it does not cost much to evaluate the state of all the virtual machines at the same time.
In general, when the startd is going to take a job off a machine (usually because of activity on the machine that signifies that the owner is using the machine again), the startd will go through successive levels of getting the job out of the way. The first and least costly to the job is suspending it. This works for both standard and vanilla jobs. If suspending the job for a short while does not satisfy the machine owner (the owner is still using the machine after a specific period of time), the startd moves on to vacating the job. Vacating a job involves performing a checkpoint so that the work already completed is not lost. If even that does not satisfy the machine owner (usually because it is taking too long and the owner wants their machine back now), the final, most drastic stage is reached: killing. Killing is a quick death to the job, without a checkpoint. For vanilla jobs, vacating and killing are equivalent, although a vanilla job can request to have a specific softkill signal sent to it at vacate time so that the job itself can perform application-specific checkpointing.
The WANT_SUSPEND expression determines if the machine will evaluate the SUSPEND expression to consider entering the Suspended activity. The WANT_VACATE expression determines what happens when the machine enters the Preempting state. It will go to the Vacating activity or directly to Killing. If one or both of these expressions evaluates to FALSE, the machine will skip that stage of getting rid of the job and proceed directly to the more drastic stages.
When the machine first enters the Claimed state, it goes to the Idle activity. From there, it has two options. It can enter the Preempting state via transition 9 (if a condor_vacate arrives, or if the START expression locally evaluates to FALSE), or it can enter the Busy activity (transition 10) if the schedd that has claimed the machine decides to activate the claim and start a job.
From Claimed/Busy, the machine can transition to three other state/activity pairs. The startd evaluates the WANT_SUSPEND expression to decide which other expressions to evaluate. If WANT_SUSPEND is TRUE, then the startd evaluates the SUSPEND expression. If SUSPEND is FALSE, then the startd will evaluate the PREEMPT expression and skip the Suspended activity entirely. By transition, the possible state/activity destinations from Claimed/Busy:
From the Claimed/Suspended state, the following transitions may occur:
The Preempting state is less complex than the Claimed state. There are two activities. Depending on the value of WANT_VACATE, a machine will be in the Vacating activity (if TRUE) or the Killing activity (if FALSE).
While in the Preempting state (regardless of activity) the machine advertises its Requirements expression as FALSE to signify that it is not available for further matches, either because it is about to transition to the Owner state, or because it has already been matched with one preempting match, and further preempting matches are disallowed until the machine has been claimed by the new match.
The main function of the Preempting state is to get rid of the starter associated with the resource. If the condor_starter associated with a given claim exits while the machine is still in the Vacating activity, then the job successfully completed its checkpoint.
If the machine is in the Vacating activity, it keeps evaluating the KILL expression. As soon as this expression evaluates to TRUE, the machine enters the Killing activity (transition 16).
When the starter exits, or if there was no starter running when the machine enters the Preempting state (transition 9), the other purpose of the Preempting state is completed: notifying the schedd that had claimed this machine that the claim is broken.
At this point, the machine enters either the Owner state by transition 17 (if the job was preempted because the machine owner came back) or the Claimed/Idle state by transition 18 (if the job was preempted because a better match was found). The machine enters the Killing activity, and it starts a timer, the length of which is defined by the KILLING_TIMEOUT macro. This macro is defined in seconds and defaults to 30. If this timer expires and the machine is still in the Killing activity, something has gone seriously wrong with the condor_starter and the startd tries to vacate the job immediately by sending SIGKILL to all of the condor_starter's children, and then to the condor_starter itself.
Once the starter is gone and the schedd that had claimed the machine is notified that the claim is broken, the machine will either enter the Owner state by transition 19 (if the job was preempted because the machine owner came back) or the Claimed/Idle state by transition 20 (if the job was preempted because a better match was found).
The following section provides two examples of how you might configure the policy at your pool. Each one is described in English, then the actual macros and expressions used are listed and explained with comments. Finally the entire set of macros and expressions are listed in one block so you can see them in one place for easy reference.
These settings are the default as shipped with Condor. They have been used for many years with no problems. The vanilla expressions are identical to the regular ones. (They are not listed here. If not defined, the standard expressions are used for vanilla jobs as well).
The following are macros to help write the expressions clearly.
## These macros are here to help write legible expressions: MINUTE = 60 HOUR = (60 * $(MINUTE)) StateTimer = (CurrentTime - EnteredCurrentState) ActivityTimer = (CurrentTime - EnteredCurrentActivity) ActivationTimer = (CurrentTime - JobStart) NonCondorLoadAvg = (LoadAvg - CondorLoadAvg) BackgroundLoad = 0.3 HighLoad = 0.5 StartIdleTime = 15 * $(MINUTE) ContinueIdleTime = 5 * $(MINUTE) MaxSuspendTime = 10 * $(MINUTE) MaxVacateTime = 5 * $(MINUTE) KeyboardBusy = KeyboardIdle < $(MINUTE) CPU_Idle = $(NonCondorLoadAvg) <= $(BackgroundLoad) CPU_Busy = $(NonCondorLoadAvg) >= $(HighLoad) MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))
Macros are defined to always want to suspend jobs. If that is not enough, always try to gracefully vacate them, unless they have only been running for less than 10 minutes anyway, in which case just kill them, instead of trying to checkpoint those 10 minutes of work.
WANT_SUSPEND = True WANT_VACATE = $(ActivationTimer) > 10 * $(MINUTE)
Finally, definitions of the actual expressions. Start any job if the CPU is idle (as defined by the macro) and the keyboard has been idle long enough.
START = $(CPU_Idle) && KeyboardIdle > $(StartIdleTime)
Suspend a job if the machine is busy.
SUSPEND = $(MachineBusy)
Continue a suspended job if the CPU is idle and the Keyboard has been idle for long enough.
CONTINUE = $(CPU_Idle) && KeyboardIdle > $(ContinueIdleTime)
There are two conditions that signal preemption. The first condition is if the job is suspended, but it has been suspended too long. The second condition is if suspension is not desired and the machine is busy.
PREEMPT = ( ($(ActivityTimer) > $(MaxSuspendTime)) && \
(Activity == "Suspended") ) || \
( SUSPEND && (WANT_SUSPEND == False) )
Kill a job if it has been vacating for too long.
KILL = $(ActivityTimer) > $(MaxVacateTime)
Finally, specify periodic checkpointing. For jobs smaller than 60 Mbytes, do a periodic checkpoint every 6 hours. For larger jobs, only checkpoint every 12 hours.
PERIODIC_CHECKPOINT = ( (ImageSize < 60000) && \
($(LastCkpt) > (6 * $(HOUR))) ) || \
( $(LastCkpt) > (12 * $(HOUR)) )
For reference, the entire set of policy settings are included once more without comments:
## These macros are here to help write legible expressions:
MINUTE = 60
HOUR = (60 * $(MINUTE))
StateTimer = (CurrentTime - EnteredCurrentState)
ActivityTimer = (CurrentTime - EnteredCurrentActivity)
ActivationTimer = (CurrentTime - JobStart)
LastCkpt = (CurrentTime - LastPeriodicCheckpoint)
NonCondorLoadAvg = (LoadAvg - CondorLoadAvg)
BackgroundLoad = 0.3
HighLoad = 0.5
StartIdleTime = 15 * $(MINUTE)
ContinueIdleTime = 5 * $(MINUTE)
MaxSuspendTime = 10 * $(MINUTE)
MaxVacateTime = 5 * $(MINUTE)
KeyboardBusy = KeyboardIdle < $(MINUTE)
CPU_Idle = $(NonCondorLoadAvg) <= $(BackgroundLoad)
CPU_Busy = $(NonCondorLoadAvg) >= $(HighLoad)
MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))
WANT_SUSPEND = True
WANT_VACATE = $(ActivationTimer) > 10 * $(MINUTE)
START = $(CPU_Idle) && KeyboardIdle > $(StartIdleTime)
SUSPEND = $(MachineBusy)
CONTINUE = $(CPU_Idle) && KeyboardIdle > $(ContinueIdleTime)
PREEMPT = ( ($(ActivityTimer) > $(MaxSuspendTime)) && \
(Activity == "Suspended") ) || \
( $(MachineBusy) && (WANT_SUSPEND == False) )
KILL = $(ActivityTimer) > $(MaxVacateTime)
PERIODIC_CHECKPOINT = ( (ImageSize < 60000) && \
($(LastCkpt) > (6 * $(HOUR))) ) || \
( $(LastCkpt) > (12 * $(HOUR)) )
Due to a recent increase in the number of Condor users and the size of their jobs (many users here are submitting jobs with an Imagesize of more than 100 Mbytes!), we have had to customize our policy to try to better handle this range of Imagesize.
Whether or not we suspend or vacate jobs is now a function of the Imagesize of the job currently running. We divide the Imagesize into three possible categories, which are defined with macros. Imagesize is defined in terms of kilobytes.
BigJob = (ImageSize > (30 * 1024)) MediumJob = (ImageSize <= (30 * 1024) && ImageSize >= (10 * 1024)) SmallJob = (ImageSize < (10 * 1024))
The policy may be summarized by: If the job is Small, it goes through the normal progression of suspend to vacate to kill based on the tried and true times. If the job is Medium, then when a user returns, the job starts vacating the machine right away. The idea is that with an immediate checkpoint, the job will find all its pages still in memory, and checkpointing will be fast. The memory pages will be freed up as soon as the checkpoint completes. If the job was suspended instead, its pages start getting swapped out and when it is time to checkpoint (10 minutes later), the user's pages will be swapped out again, and the user will see reduced performance. In addition, checkpointing will take much longer. If the job is Big, we do not bother checkpointing, since the checkpointing will not finish before the owner gets too upset. It is a waste to put the load on the network and checkpoint server.
The logic for our special policy is tuned with the WANT_ expressions. All other expressions and macros use defaults. We want to suspend jobs if they are Small, and we only want to vacate jobs that are Small or Medium. Vanilla jobs are always suspended, regardless of their size.
WANT_SUSPEND = $(SmallJob) WANT_VACATE = $(MediumJob) || $(SmallJob) WANT_SUSPEND_VANILLA = True WANT_VACATE_VANILLA = True
The following are the expressions. It is done with macros and the expressions are defined using the macros. As strange as it seems, we do this because it makes for easier customized settings (for example, for testing purposes) and still references the defaults. There is a brief example of this at the end of this section.
CS_START = $(CPU_Idle) && KeyboardIdle > $(StartIdleTime)
CS_SUSPEND = $(MachineBusy)
CS_CONTINUE = (KeyboardIdle > $(ContinueIdleTime)) && $(CPU_Idle)
CS_PREEMPT = ( ($(ActivityTimer) > $(MaxSuspendTime)) && \
(Activity == "Suspended") ) || \
( CS_SUSPEND && (WANT_SUSPEND == False) )
CS_KILL = ($(ActivityTimer) > $(MaxVacateTime))
We define the expressions in terms of our special macros.
START = $(CS_START) SUSPEND = $(CS_SUSPEND) CONTINUE = $(CS_CONTINUE) PREEMPT = $(CS_PREEMPT) KILL = $(CS_KILL)
There are no separate vanilla versions of any of these, since we already have a different WANT_SUSPEND for vanilla jobs, and all of the policy expressions are written in terms of that.
Periodic checkpoints also take image size into account. We periodically checkpoint Big jobs more frequently (every 3 hours), since the Big jobs are killed right away at eviction time. Utilization of checkpoints is the only way Big jobs make forward progress. However, with all the Big jobs' periodic checkpoints occurring frequently, we do not want to bog down our network or our checkpoint servers. Small or Medium jobs receive a periodic checkpoint every 12 hours, since they get the privilege of checkpointing at eviction time.
PERIODIC_CHECKPOINT = (($(LastCkpt) > (3 * $(HOUR))) \
&& $(BigJob)) || (($(LastCkpt) > (12 * $(HOUR))) && \
($(SmallJob) || $(MediumJob)))
For reference, the entire set of policy settings are given here, without comments:
ActivationTimer = (CurrentTime - JobStart)
StateTimer = (CurrentTime - EnteredCurrentState)
ActivityTimer = (CurrentTime - EnteredCurrentActivity)
LastCkpt = (CurrentTime - LastPeriodicCheckpoint)
NonCondorLoadAvg = (LoadAvg - CondorLoadAvg)
BackgroundLoad = 0.3
HighLoad = 0.5
StartIdleTime = 15 * $(MINUTE)
ContinueIdleTime = 5 * $(MINUTE)
MaxSuspendTime = 10 * $(MINUTE)
MaxVacateTime = 5 * $(MINUTE)
KeyboardBusy = KeyboardIdle < $(MINUTE)
CPU_Idle = $(NonCondorLoadAvg) <= $(BackgroundLoad)
CPU_Busy = $(NonCondorLoadAvg) >= $(HighLoad)
MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))
BigJob = (ImageSize > (30 * 1024))
MediumJob = (ImageSize <= (30 * 1024) && ImageSize >= (10 * 1024))
SmallJob = (ImageSize < (10 * 1024))
WANT_SUSPEND = $(SmallJob)
WANT_VACATE = $(MediumJob) || $(SmallJob)
WANT_SUSPEND_VANILLA = True
WANT_VACATE_VANILLA = True
CS_START = $(CPU_Idle) && KeyboardIdle > $(StartIdleTime)
CS_SUSPEND = $(CPU_Busy) || $(KeyboardBusy)
CS_CONTINUE = (KeyboardIdle > $(ContinueIdleTime)) && $(CPU_Idle)
CS_PREEMPT = ( ($(ActivityTimer) > $(MaxSuspendTime)) && \
(Activity == "Suspended") ) || \
( CS_SUSPEND && (WANT_SUSPEND == False) )
CS_KILL = ($(ActivityTimer) > $(MaxVacateTime))
START = $(CS_START)
SUSPEND = $(CS_SUSPEND)
CONTINUE = $(CS_CONTINUE)
PREEMPT = $(CS_PREEMPT)
KILL = $(CS_KILL)
PERIODIC_CHECKPOINT = (($(LastCkpt) > (3 * $(HOUR))) \
&& $(BigJob)) || (($(LastCkpt) > (12 * $(HOUR))) && \
($(SmallJob) || $(MediumJob)))
This last example shows how the default macros can be used to set up a machine for testing. Suppose we want the machine to behave normally, except if user coltrane submits a job. In that case, we want that job to start regardless of what is happening on the machine. We do not want the job suspended, vacated or killed. This is reasonable if we know coltrane is submitting very short running programs testing purposes. The jobs should be executed right away. The following configuration works with any machine (or the whole pool, for that matter) with the following 5 expressions:
START = ($(CS_START)) || Owner == "coltrane"
SUSPEND = ($(CS_SUSPEND)) && Owner != "coltrane"
CONTINUE = $(CS_CONTINUE)
PREEMPT = ($(CS_PREEMPT)) && Owner != "coltrane"
KILL = $(CS_KILL)
Notice that there is nothing special in either the
CONTINUE or KILL expressions.
If Coltrane's jobs never suspend, they never look at CONTINE.
Similarly, if they never preempt, they never look at KILL.
This section describes how to configure Condor to only run jobs at certain times of the day. In general, we discourage configuring a system like this, since you can often get lots of good cycles out of machines, even when their owners say ``I'm always using my machine during the day.'' However, if you submit mostly vanilla jobs or other jobs that cannot checkpoint, it might be a good idea to only allow the jobs to run when you know the machines will be idle and when they will not be interrupted.
To configure this kind of policy, you should use the ClockMin and ClockDay attributes, defined in section 3.6.1 on ``Startd ClassAd Attributes''. These are special attributes which are automatically inserted by the condor_startd into its ClassAd, so you can always reference them in your policy expressions. ClockMin defines the number of minutes that have passed since midnight. For example, 8:00am is 8 hours after midnight, or 8 * 60 minutes, or 480. 5:00pm is 17 hours after midnight, or 17 * 60, or 1020. ClockDay defines the day of the week, Sunday = 0, Monday = 1, and so on.
To make the policy expressions easy to read, we recommend using macros to define the time periods when you want jobs to run or not run. For example, assume regular ``work hours'' at your site are from 8:00am until 5:00pm, Monday through Friday:
WorkHours = ( (ClockMin >= 480 && ClockMin < 1020) && \
(ClockDay > 0 && ClockDay < 6) )
AfterHours = ( (ClockMin < 480 || ClockMin >= 1020) || \
(ClockDay == 0 || ClockDay == 6) )
Of course, you can fine-tune these settings by changing the definition of AfterHours and WorkHours for your site.
Assuming you are using the default policy expressions discussed above, there are only a few minor changes required to force Condor jobs to stay off of your machines during work hours:
# Only start jobs after hours. START = $(AfterHours) && $(CPU_Idle) && KeyboardIdle > $(StartIdleTime) # Consider the machine busy during work hours, or if the keyboard or # CPU are busy. MachineBusy = ( $(WorkHours) || $(CPU_Busy) || $(KeyboardBusy) )
By default, the MachineBusy macro is used to define the SUSPEND and PREEMPT expressions. If you have changed these expressions at your site, you will need to add $(WorkHours) to your SUSPEND and PREEMPT expressions as appropriate.
Depending on your site, you might also want to avoid suspending jobs during work hours, so that in the morning, if a job is running, it will be immediately preempted, instead of being suspended for some length of time:
WANT_SUSPEND = $(AfterHours)
This section describes how the current policy expressions differ from the policy expressions in previous versions of Condor. If you have never used Condor version 6.0 or earlier, or you never looked closely at the policy settings, skip this section.
In summary, there is no longer a VACATE expression, and the KILL expression is not evaluated while a machine is claimed. There is a PREEMPT expression which describes the conditions when a machine will move from the Claimed state to the Preempting state. Once a machine is transitioning into the Preempting state, the WANT_VACATE expression controls whether the job should be vacated with a checkpoint or directly killed. The KILL expression determines the transition from Preempting/Vacating to Preempting/Killing.
In previous versions of Condor, the KILL expression handled three distinct cases (the transitions from Claimed/Busy, Claimed/Suspended and Preempting/Vacating), and the VACATE expression handled two cases (the transitions from Claimed/Busy and Claimed/Suspended). In the current version of Condor, PREEMPT handles the same two cases as the previous VACATE expression, but the KILL expression handles one case. Very complex policies can now be specified using all of the default expressions, only tuning the WANT_VACATE and WANT_SUSPEND expressions. In previous versions, heavy use of the WANT_* expressions caused a complex KILL expression.