next up previous contents index
Next: 3.5 User Priorities in Up: 3. Administrators' Manual Previous: 3.3 Configuring Condor

Subsections

  
3.4 Installing Contrib Modules

This section describes how to install various contrib modules in the Condor system. Some of these modules are separate, optional pieces, not included in the main distribution of Condor. Examples are the checkpoint server and DAGMan. Others are integral parts of Condor taken from the development series that have certain features users might want to install. Examples are the new SMP-aware condor_startd and the CondorView collector. Both of these things come automatically with Condor version 6.1 and later versions. However, if you do not want to switch over to using only the development binaries, you can install these seperate modules while maintaining most of the stable release at your site.

  
3.4.1 Installing CondorView Contrib Modules

    To install CondorView for your pool, you really need two things:

1.
The CondorView server, which collects historical information.
2.
The CondorView client, a Java applet that views this data.

These are separate modules, and they are installed separately.

  
3.4.2 Installing the CondorView Server Module

The CondorView server is an enhanced version of the condor_collector   that logs information on disk, providing a persistent, historical database of your pool state. This includes machine state, as well as the state of jobs submitted by users, and so on. This enhanced condor_collector is the version 6.1 development series, but it can be installed in a 6.0 pool. The historical information logging can be turned on or off, so you can install the CondorView collector without using up disk space for historical information if you don't want it.

To install the CondorView server, you download the appropriate binary module for the platform on which you will run CondorView. This does not have to be the same platform as your existing central manager (see below). After you uncompress and untar the module, you will have a directory with a view_server.tar file, a README, and so on. The view_server.tar acts much like the release.tar file for a main release of Condor. It contains all the binaries and supporting files you would install in your release directory:

        sbin/condor_collector
        etc/examples/condor_config.local.view_server

You have two options to choose from when deciding how to install this enhanced condor_collector in your pool:

1.
Replace your existing condor_collector and use the new version for both historical information and the regular role the collector plays in your pool.
2.
Install the new condor_collector and run it on a separate host from your main condor_collector and configure your machines to send updates to both collectors.

If you replace your existing collector with the enhanced version, there may be bugs or problems that cause problems for your pool. This is because it is development code. Installing the enhanced version on a separate host may cause problems, but only CondorView will be affected, not your entire pool. Unfortunately, installing the CondorView collector on a separate host generates more network traffic (from all the duplicate updates that are sent from each machine in your pool to both collectors). In addition, the installation procedure to have both collectors running is a more complicated process. Decide for yourself which solution you feel more comfortable with.

What follows are details common to both types of installation.

  
3.4.2.1 Setting up the CondorView Server Module

  Before you install the CondorView collector (as described in the following sections), you have to add a few settings to the local configuration file of that machine to enable historical data collection. These settings are described in detail in the Condor Version 6.1 Administrator's Manual, in the section ``condor_collector Config File Entries''. A short explanation of the entries you must customize is provided below. These entries are also explained in the etc/examples/condor_config.local.view_server file, included in the contrib module. Insert that file into the local configuration file for your CondorView collector host and customize as appropriate at your site.

POOL_HISTORY_DIR    
This is the directory where historical data will be stored. This directory must be writable by whatever user the CondorView collector is running as (usually the user condor). There is a configurable limit to the maximum space required for all the files created by the CondorView server called (POOL_HISTORY_MAX_STORAGE    ).

NOTE: This directory should be separate and different from the spool or log directories already set up for Condor. There are a few problems putting these files into either of those directories.

KEEP_POOL_HISTORY    
This is a boolean value that determines if the CondorView collector should store the historical information. It is false by default, which is why you must specify it as true in your local configuration file to enable data collection.

Once these settings are in place in the local configuration file for your CondorView server host, you must to create the directory you specified in POOL_HISTORY_DIR     and make it writable by the user your CondorView collector is running as. This is the same user that owns the CollectorLog file in your log directory. The user is usually condor.

Once these steps are completed, you are ready to install the new binaries and you will begin collecting historical information. After that, install the CondorView client contrib module which contains the tools used to query and display this information.

  
3.4.2.2 CondorView Collector as Your Only Collector

To install the new CondorView collector as your main collector, you replace your existing binary with the new one, found in the view_server.tar file. Move your existing condor_collector binary out of the way with the mv command. For example:

        % cd /full/path/to/your/release/directory
        % cd sbin
        % mv condor_collector condor_collector.old
Then, from that same directory, untar the view_server.tar file into your release directory. This will install a new condor_collector binary and an example configuration file. Within 5 minutes, the condor_master will notice the new timestamp on your new condor_collector binary, shutdown your existing collector, and spawn the new version. You will see messages about this in the log file for your condor_master (usually MasterLog in your log directory). Once the new collector is running, it is safe to remove the old binary, although you may want to keep it around in case you have problems with the new version and want to revert back.

Once this is completed, add configuration file entries to the local configuration file on your central manager to enable historical data collection as described below in the ``Configuring the CondorView Server Module'' section.

  
3.4.2.3 CondorView Collector in Addition to Your Main Collector

Installing the CondorView collector in addition to your regular collector requires a little extra work. First, untar the view_server.tar file into a temporary location (not your main release directory). Copy the sbin/condor_collector file from the temporary location to your main release directory's sbin with a new name (such as condor_collector.view_server).

Next, configure whatever host is going to run your separate CondorView server to spawn this new collector in addition to other daemons it is running. You do this by adding COLLECTOR to the DAEMON_LIST     on this machine and defining what COLLECTOR means. For example:

        DAEMON_LIST = MASTER, STARTD, SCHEDD, COLLECTOR
        COLLECTOR = $(SBIN)/condor_collector.view_server
For this change to take effect, you must re-start the condor_master on this host (which you can do with the condor_restart command, if you run the command from a machine with administrator access to your pool. (See section 3.8 on page [*] for full details of IP/host-based security in Condor).

As a last step, you tell all the machines in your pool to start sending updates to both collectors. You do this by specifying the following setting in your global configuration file:

        CONDOR_VIEW_HOST = full.hostname
where full.hostname is the full hostname of the machine where you are running your CondorView collector.

Once this setting is in place, send a condor_reconfig to all machines in your pool so the changes take effect. This is described in section 3.10.2 on page [*].

  
3.4.3 Installing the CondorView Client Contrib Module

The CondorView Client contrib module is used to automatically generate World Wide Web (WWW) pages displaying usage statistics of your Condor Pool. Included in the module is a shell script which invokes the condor_stats command to retrieve pool usage statistics from the CondorView server and generate HTML pages from the results. Also included is a Java applet which graphically visualizes Condor usage information. Users can interact with the applet to customize the visualization and to zoom in to a specific time frame. Figure 3.2 on page [*] is a screenshot of a web page created by CondorView. To get a further feel for what pages generated by CondorView look like, you can view the statistics for the University of Wisconsin-Madison pool by going to URL http://www.cs.wisc.edu/condor and clicking on Condor View.


  
Figure 3.2: Screenshot of CondorView Client

After unpacking and installing the CondorView Client, a script named make_stats can be invoked to create HTML pages displaying Condor usage for the past hour, day, week, or month. By using the Unix cron facility to periodically execute make_stats, Condor pool usage statistics can be kept up to date automatically. This simple model allows the CondorView Client to be easily installed; no Web server CGI interface is needed.

  
3.4.3.1 Step-by-Step Installation of the CondorView Client

   

1.
First, make certain that you have configured your pool's condor_collector (typically running on the central manager) to log information to disk in order to provide a persistent, historical database of pool statistics. The CondorView Client makes queries over the network against this database. The condor_collector included with version 6.0.x of Condor does not have this database support; you will need to download and install the CondorView Server contrib module. If you are running Condor version 6.1 or above, there is no need to install the CondorView Server contrib module because the condor_collector included in Condor v6.1+ already has the necessary database support. To activate the persistent database logging, add the following entries into the configuration file on your central manager:
    POOL_HISTORY_DIR = /full/path/to/directory/to/store/historical/data 
    KEEP_POOL_HISTORY = True
For full details on these and other condor_collector configuration file entries, see section 3.3.15 on page [*].

2.
Create a directory where CondorView places the HTML files. This directory should be one published by a web server, so HTML files which exist in this directory can be accessed via a web browser. This is referred to as the VIEWDIR directory.

3.
Unpack/untar the CondorView Client contrib module into VIEWDIR. This creates several files and subdirectories within VIEWDIR.

4.
Edit the make_statsscript. At the top of this file are six parameters to customize. The parameters are:

ORGNAME    
Set to a brief name identifying your organization, for example ``Univ of Wisconsin''. Do not use any slashes in the name or other special regular-expression characters. Avoid characters / ^ $.

CONDORADMIN    
Set to the email address of the Condor administrator at your site. This email address will appear at the bottom of the web pages.

VIEWDIR    
Set to the full pathname (not a relative path) to the VIEWDIR directory selected in installation step 2. It is the directory that contains the make_stats script.

STATSDIR    
Set to the full pathname of the directory which contains the condor_stats binary. The condor_stats program is included in the <release_dir>/bin directory with Condor version 6.1 and above; for Condor version 6.0x, the condor_stats program can be found in the CondorView Server contrib module. The value for STATSDIR     is added to the PATH     parameter by default; see below.

PATH    
Set to a list of subdirectories, separated by colons, where the make_stats script can find awk, bc, sed, date, and condor_stats programs. If you have perl installed, set the path to include the directory where perl is installed as well. Using the following default works on most systems:
 
        PATH=/bin:/usr/bin:$STATSDIR:/usr/local/bin

5.
To create all of the initial HTML files, type
        ./make_stats setup
Open the file index.html to verify things look good.

   

6.
Add the make_stats program to cron. Running make_stats in step 5 created a cronentries file. This cronentries file is ready to be processed by the Unix crontab command. The crontab manual page can familiarize you with the crontab command and the cron daemon. Take a look at the cronentries file; by default, it will run make_stats hour every 15 minutes, make_stats day once an hour, make_stats week twice per day, and make_stats month once per day. These are reasonable defaults. You can add these commands to cron on any system that can access the $(VIEWDIR)   and $(STATSDIR)   directories, even on a system that does not have Condor installed. The commands do not have to run as user root; in fact, they should probably not run as root. These commands can run as any user that has read/write access to the VIEWDIR. To add these commands to cron, enter :
 
        crontab cronentries

7.
Point your web browser at the VIEWDIR directory, and you are finished with the installation.

 

  
3.4.4 Installing a Checkpoint Server

      The Checkpoint Server maintains a repository for checkpoint files. Using checkpoint servers reduces the disk requirements of submitting machines in the pool, since the submitting machines no longer need to store checkpoint files locally. Checkpoint server machines should have a large amount of disk space available, and they should have a fast connection to machines in the Condor pool.

If your spool directories are on a network file system, then checkpoint files will make two trips over the network: one between the submitting machine and the execution machine, and a second between the submitting machine and the network file server. If you install a checkpoint server and configure it to use the server's local disk, the checkpoint will travel only once over the network, between the execution machine and the checkpoint server. You may also obtain checkpointing network performance benefits by using multiple checkpoint servers, as discussed below.

NOTE: It is a good idea to pick very stable machines for your checkpoint servers. If individual checkpoint servers crash, the Condor system will continue to operate, although poorly. While the Condor system will recover from a checkpoint server crash as best it can, there are two problems that can (and will) occur:

1.
A checkpoint cannot be sent to a checkpoint server that is not functioning. Jobs will keep trying to contact the checkpoint server, backing off exponentially in the time they wait between attempts. Normally, jobs only have a limited time to checkpoint before they are kicked off the machine. So, if the server is down for a long period of time, chances are that a lot of work will be lost by jobs being killed without writing a checkpoint.

2.
If a checkpoint is not available from the checkpoint server, a job cannot be retrieved, and it will either have to be restarted from the beginning, or the job will wait for the server to come back online. This behavior is controlled with the MAX_DISCARDED_RUN_TIME     parameter in the config file (see section 3.3.6 on page [*] for details). This parameter represents the maximum amount of CPU time you are willing to discard by starting a job over from scratch if the checkpoint server is not responding to requests.

  
3.4.4.1 Preparing to Install a Checkpoint Server

The location of checkpoints changes upon the installation of a checkpoint server. A configuration change would cause currently queued jobs with checkpoints to not be able to find their checkpoints. This results in the jobs with checkpoints remaining indefinitely queued (never running) due to the lack of finding their checkpoints. It is therefore best to either remove jobs from the queues or let them complete before installing a checkpoint server. It is advisable to shut your pool down before doing any maintenance on your checkpoint server. See section 3.10 on page [*] for details on shutting down your pool.

A graduated installation of the checkpoint server may be accomplished by configuring submit machines as their queues empty.

  
3.4.4.2 Installing the Checkpoint Server Module

To install a checkpoint server, download the appropriate binary contrib module for the platform(s) on which your server will run. Uncompress and untar the file to result in a directory that contains a README, ckpt_server.tar, and so on. The file ckpt_server.tar acts much like the release.tar file from a main release. This archive contains the files:

        sbin/condor_ckpt_server
        sbin/condor_cleanckpts
        etc/examples/condor_config.local.ckpt.server
These new files are not found in the main release, so you can safely untar the archive directly into your existing release directory. condor_ckpt_server is the checkpoint server binary. condor_cleanckpts is a script that can be periodically run to remove stale checkpoint files from your server. The checkpoint server normally cleans all old files itself. However, in certain error situations, stale files can be left that are no longer needed. You may set up a cron job that calls condor_cleanckpts every week or so to automate the cleaning up of any stale files. The example configuration file give with the module is described below.

After unpacking the module, there are three steps to complete. Each is discussed in its own section:

1.
Configure the checkpoint server.
2.
Start the checkpoint server.
3.
Configure your pool to use the checkpoint server.

  
3.4.4.3 Configuring a Checkpoint Server

  Place settings in the local configuration file of the checkpoint server. The file etc/examples/condor_config.local.ckpt.server contains the needed settings. Insert these into the local configuration file of your checkpoint server machine.

The CKPT_SERVER_DIR     must be customized. The CKPT_SERVER_DIR     attribute defines where your checkpoint files are to be located. It is better if this is on a very fast local file system (preferably a RAID). The speed of this file system will have a direct impact on the speed at which your checkpoint files can be retrieved from the remote machines.

The other optional settings are:

DAEMON_LIST    
(Described in section 3.3.7). To have the checkpoint server managed by the condor_master, the DAEMON_LIST     entry must have MASTER and CKPT_SERVER. Add STARTD if you want to allow jobs to run on your checkpoint server. Similarly, add SCHEDD if you would like to submit jobs from your checkpoint server.

The rest of these settings are the checkpoint server-specific versions of the Condor logging entries, as described in section 3.3.3 on page [*].

CKPT_SERVER_LOG    
The CKPT_SERVER_LOG     is where the checkpoint server log is placed.

MAX_CKPT_SERVER_LOG    
Sets the maximum size of the checkpoint server log before it is saved and the log file restarted.

CKPT_SERVER_DEBUG    
Regulates the amount of information printed in the log file. Currently, the only debug level supported is D_ALWAYS.

  
3.4.4.4 Start the Checkpoint Server

To start the newly configured checkpoint server, restart Condor on that host to enable the condor_master to notice the new configuration. Do this by sending a condor_restart command from any machine with administrator access to your pool. See section 3.8 on page [*] for full details about IP/host-based security in Condor.

  
3.4.4.5 Configuring your Pool to Use the Checkpoint Server

After the checkpoint server is running, you change a few settings in your configuration files to let your pool know about your new server:

USE_CKPT_SERVER    
This parameter should be set to TRUE (the default).

CKPT_SERVER_HOST    
This parameter should be set to the full hostname of the machine that is now running your checkpoint server.

It is most convenient to set these parameters in your global configuration file, so they affect all submission machines. However, you may configure each submission machine separately (using local configuration files) if you do not want all of your submission machines to start using the checkpoint server at one time. If USE_CKPT_SERVER     is set to FALSE, the submission machine will not use a checkpoint server.

Once these settings are in place, send a condor_reconfig to all machines in your pool so the changes take effect. This is described in section 3.10.2 on page [*].

  
3.4.4.6 Configuring your Pool to Use Multiple Checkpoint Servers

 

It is possible to configure a Condor pool to use multiple checkpoint servers. The deployment of checkpoint servers across the network improves checkpointing performance. In this case, Condor machines are configured to checkpoint to the nearest checkpoint server. There are two main performance benefits to deploying multiple checkpoint servers:

Once you have multiple checkpoint servers running in your pool, the following configuration changes are required to make them active.

First, USE_CKPT_SERVER     should be set to TRUE (the default) on all submitting machines where Condor jobs should use a checkpoint server. Additionally, STARTER_CHOOSES_CKPT_SERVER     should be set to TRUE (the default) on these submitting machines. When TRUE, this parameter specifies that the checkpoint server specified by the machine running the job should be used instead of the checkpoint server specified by the submitting machine. See section 3.3.6 on page [*] for more details. This allows the job to use the checkpoint server closest to the machine on which it is running, instead of the server closest to the submitting machine. For convenience, set these parameters in the global configuration file.

Second, set CKPT_SERVER_HOST     on each machine. As described, this is set to the full hostname of the checkpoint server machine. In the case of multiple checkpoint servers, set this in the local configuraton file. It is the hostname of the nearest server to the machine.

Third, send a condor_reconfig to all machines in the pool so the changes take effect. This is described in section 3.10.2 on page [*].

After completing these three steps, the jobs in your pool will send checkpoints to the nearest checkpoint server. On restart, a job will remember where its checkpoint was stored and get it from the appropriate server. After a job successfully writes a checkpoint to a new server, it will remove any previous checkpoints left on other servers.

NOTE: If the configured checkpoint server is unavailable, the job will keep trying to contact that server as described above. It will not use alternate checkpoint servers. This may change in future versions of Condor.

  
3.4.4.7 Checkpoint Server Domains

The configuration described in the previous section ensures that jobs will always write checkpoints to their nearest checkpoint server. In some circumstances, it is also useful to configure Condor to localize checkpoint read transfers, which occur when the job restarts from its last checkpoint on a new machine. To localize these transfers, we want to schedule the job on a machine which is near the checkpoint server on which the job's checkpoint is stored.

We can say that all of the machines configured to use checkpoint server ``A'' are in ``checkpoint server domain A.'' To localize checkpoint transfers, we want jobs which run on machines in a given checkpoint server domain to continue running on machines in that domain, transferring checkpoint files in a single local area of the network. There are two possible configurations which specify what a job should do when there are no available machines in its checkpoint server domain:

These two configurations are described below.

The first step in implementing checkpoint server domains is to include the name of the nearest checkpoint server in the machine ClassAd, so this information can be used in job scheduling decisions. To do this, add the following configuration to each machine:

  CkptServer = "$(CKPT_SERVER_HOST)"
  STARTD_EXPRS = $(STARTD_EXPRS), CkptServer
For convenience, we suggest that you set these parameters in the global config file. Note that this example assumes that STARTD_EXPRS     is defined previously in your configuration. If not, then you should use the following configuration instead:
  CkptServer = "$(CKPT_SERVER_HOST)"
  STARTD_EXPRS = CkptServer
Now, all machine ClassAds will include a CkptServer attribute, which is the name of the checkpoint server closest to this machine. So, the CkptServer attribute defines the checkpoint server domain of each machine.

To restrict jobs to one checkpoint server domain, we need to modify the jobs' Requirements expression as follows:

  Requirements = ((LastCkptServer == TARGET.CkptServer) || (LastCkptServer =?= UNDEFINED))
This Requirements expression uses the LastCkptServer attribute in the job's ClassAd, which specifies where the job last wrote a checkpoint, and the CkptServer attribute in the machine ClassAd, which specifies the checkpoint server domain. If the job has not written a checkpoint yet, the LastCkptServer attribute will be UNDEFINED, and the job will be able to execute in any checkpoint server domain. However, once the job performs a checkpoint, LastCkptServer will be defined and the job will be restricted to the checkpoint server domain where it started running.

If instead we want to allow jobs to transfer to other checkpoint server domains when there are no available machines in the current checkpoint server domain, we need to modify the jobs' Rank expression as follows:

  Rank = ((LastCkptServer == TARGET.CkptServer) || (LastCkptServer =?= UNDEFINED))
This Rank expression will evaluate to 1 for machines in the job's checkpoint server domain and 0 for other machines. So, the job will prefer to run on machines in its checkpoint server domain, but if no such machines are available, the job will run in a new checkpoint server domain.

You can automatically append the checkpoint server domain Requirements or Rank expressions to all STANDARD universe jobs submitted in your pool using APPEND_REQ_STANDARD     or APPEND_RANK_STANDARD    . See section 3.3.13 on page [*] for more details.  

  
3.4.5 Installing PVM Support in Condor

      To install support for PVM in Condor, download the file archive from http://www.cs.wisc.edu/condor/downloads and follow the directions found the INSTALL file contained in the archive. NOTE: The PVM contrib module version must agree with your installed Condor version.

  
3.4.6 Installing MPI Support in Condor

     

For complete documentation on using MPI in Condor, see the section entitled ``Running MPICH jobs in Condor'' in the version 6.1 manual. This manual can be found at http://www.cs.wisc.edu/condor/manual/v6.1. You must have Condor version 6.1.15 or better in order to use the MPI contrib module.

To install the MPI contrib module, all you have to do is download to appropriate binary module for whatever platform(s) you plan to use for MPI jobs in Condor. Once you have downloaded each module, uncompressed and untarred it, you will be left with a directory that contains a mpi.tar, README and so on. The mpi.tar acts much like the release.tar file for a main release. It contains all the binaries and supporting files you would install in your release directory:

        sbin/condor_shadow.v61
        sbin/condor_starter.v61
        sbin/rsh

Since these files do not exist in a main release, you can safely untar the mpi.tar directly into your release directory, and you're done installing the MPI contrib module. Again, see the 6.1 manual for instructions on how to use MPI in Condor.

  
3.4.7 Condor Event Daemon

     

The event daemon is an administrative tool for scheduling events in a Condor pool. Every EVENTD_INTERVAL    , for each defined event, the event daemon (eventd) computes an estimate of the time required to complete or prepare for the event. If the time required is less than the time between the next interval and the start of the event, the event daemon activates the event.

Currently, this daemon supports SHUTDOWN     events, which place machines in the owner state during scheduled times. The eventd causes machines to vacate jobs one at a time in anticipation of SHUTDOWN     events. Scheduling this improves performance, because the machines do not all attempt to checkpoint their jobs at the same time. To determine the estimate of the time required to complete a SHUTDOWN     event, the ImageSize values for all running standard universe jobs are totalled and then divided by the maximum bandwidth specified for this event.

When a SHUTDOWN     event is activated, the eventd contacts all startd daemons that match constraints given in the configuration file, and instructs them to shut down. In response to this instruction, the startd on any machine not running a job will immediately transition to the owner state. Any machine currently running a job will continue to run the job, but will not start any new job. The eventd then sends a vacate command to the each startd that is currently running a job. Once the job is vacated, the startd transitions to the owner state.

condor_eventd must run on a machine with administrator access to your pool. See section 3.8 on page [*] for full details about IP/host-based security in Condor.

  
3.4.7.1 Installing the Event Daemon

condor_eventd requires version 6.1.3 or later of condor_startd. So, you should first install either the latest version of the SMP condor_startd contrib module or the latest release of Condor version 6.1.

First, download the condor_eventd contrib module. Uncompress and untar the file, to have a directory that contains a eventd.tar. The eventd.tar acts much like the release.tar file from a main release. This archive contains the files:

	sbin/condor_eventd
	etc/examples/condor_config.local.eventd
These are all new files, not found in the main release, so you can safely untar the archive directly into your existing release directory. The file condor_eventd is the eventd binary. The example configuration file is described below.

  
3.4.7.2 Configuring the Event Daemon

The file etc/examples/condor_config.local.eventd contains an example configuration. To define events, first set the EVENT_LIST     macro. This macro contains a list of macro names which define the individual events. The definition of individual events depends on the type of the event. Currently, there is only one event type: SHUTDOWN    . The format for SHUTDOWN     events is

	SHUTDOWN DAY TIME DURATION BANDWIDTH CONSTRAINT RANK
TIME and DURATION are specified in an hours:minutes format. DAY is a string of days, where M = Monday, T = Tuesday, W = Wednesday, R = Thursday, F = Friday, S = Saturday, and U = Sunday. For example, MTWRFSU would specify that the event occurs daily, MTWRF would specify that the event occurs only on weekdays, and SU would specificy that the event occurs only on weekends.

The following is an example event daemon configuration:  

EVENT_LIST	= TestEvent, TestEvent2
TestEvent	= SHUTDOWN W 16:00 1:00 2.5 TestEventConstraint TestEventRank
TestEvent2	= SHUTDOWN F 14:00 0:30 6.0 TestEventConstraint2 TestEventRank
TestEventConstraint		= (Arch == "INTEL")
TestEventConstraint2		= (True)
TestEventRank			= (0 - ImageSize)

In this example, the TestEvent is a SHUTDOWN     type event, which specifies that all machines whose startd ads match the constraint Arch == "INTEL" should be shutdown for one hour starting at 16:00 every Wednesday, and no more than 2.5 Mbytes/s of bandwidth should be used to vacate jobs in anticipation of the shutdown event. According to the TestEventRank, jobs will be vacated in reverse order of their ImageSize (larger jobs first, smaller jobs last). TestEvent2 is a SHUTDOWN     type event, which specifies that all machines should be shutdown for 30 minutes starting at 14:00 every Friday, and no more than 6.0 Mbytes/s of bandwidth should be used to vacate jobs in anticipation of the shutdown event.

Note that the DAEMON_LIST     macro (described in section 3.3.7) is defined in the section of settings you may want to customize. If you want the event daemon managed by the condor_master, the DAEMON_LIST     entry must contain both MASTER and EVENTD. Verify that this macro is set to run the correct daemons on this machine. By default, the list also includes SCHEDD and STARTD.

See section 3.3.17 on page [*] for a description of optional event daemon parameters.

  
3.4.7.3 Starting the Event Daemon

To start an event daemon once it is configured to run on a given machine, restart Condor on that given machine to enable the condor_master to notice the new configuration. Send a condor_restart command from any machine with administrator access to your pool. See section 3.8 on page [*] for full details about IP/host-based security in Condor.


next up previous contents index
Next: 3.5 User Priorities in Up: 3. Administrators' Manual Previous: 3.3 Configuring Condor
condor-admin@cs.wisc.edu