next up previous contents index
Next: 2.11 Inter-job Dependencies: DAGMan Up: 2. Users' Manual Previous: 2.9 Running MPICH jobs

Subsections

     
2.10 Extending your Condor pool with Glidein

 

Condor works together with Globus software to provide the capability of submitting Condor jobs to remote computer systems. Globus software provides mechanisms to access and utilize remote resources.

condor_glidein is a program that can be used to add Globus resources to a Condor pool on a temporary basis. During this period, these resources are visible to users of the pool, but only the user that added the resources is allowed to use them. The machine in the Condor pool is referred to herein as the local node, while the resource added to the local Condor pool is referred to as the remote node.

These requirements are general to using any Globus resource:

1.
An X.509 certificate issued by a Globus certificate authority.

2.
Access to a Globus resource. You must be a valid Globus user and be mapped to a valid login account by the site's Globus administrator on every Globus resource that will be added to the local Condor pool using condor_glidein. More information can be found at http://www.globus.org

3.
The environment variables HOME and either GLOBUS_INSTALL_PATH or GLOBUS_DEPLOY_PATH must be set.

2.10.1 condor_glidein Requirements

In order to use condor_glidein to add a Globus resource to the local Condor pool, there are several requirements beyond the general Globus requirements given above.

1.
Use Globus v1.1 or better.

2.
Be an authorized user of the local Condor pool.

3.
The local Condor pool configuration file(s) must give HOSTALLOW_WRITE     permission to every resource that will be added using condor_glidein. Wildcards are permitted in this specification. An example is of adding every machine at cs.wisc.edu by adding *.cs.wisc.edu to the HOSTALLOW_WRITE     list. Recall that the changes take effect when all machines in the local pool are sent a reconfigure command.

4.
The local Condor pool's configuration file(s) must set GLOBUSRUN     to be the path of globusrun and SHADOW_GLOBUS     to be the path of the condor_shadow.globus.

5.
Included in the PATH must be the common user programs directory /bin, globus tools, and the Condor user program directory.

6.
Have the environment variable X509_USER_PROXY set, pointing to a valid user proxy.

2.10.2 What condor_glidein Does

condor_glidein first checks that there is a valid proxy and that the necessary files are available to condor_glidein.

condor_glidein then contacts the Globus resource and checks for the presence of the necessary configuration files and Condor executables. If the executables are not present for the machine architecture, operating system version, and Condor version required, a server running at UW is contacted to transfer the needed executables.

When the files are correctly in place, Condor daemons are started. condor_glidein does this by creating a submit description file for condor_submit, which runs the condor_master under the Globus universe. This implies that execution of the condor_master is started on the Globus resource. The Condor daemons exit gracefully when no jobs run on the daemons for a configurable period of time. The default length of time is 20 minutes.

The Condor executables on the Globus resource contact the local pool and attempt to join the pool. The START expression for the condor_startd daemon requires that the username of the person running condor_glidein matches the username of the jobs submitted through Condor.

After a short length of time, the Globus resource can be seen in the local Condor pool, as with this example.

% condor_status | grep denal
7591386@denal IRIX65      SGI    Unclaimed  Idle       3.700  24064  0+00:06:35

Once the Globus resource has been added to the local Condor pool with condor_glidein, job(s) may be submitted. To force a job to run on the Globus resource, specify that Globus resource as a machine requirement in the submit description file. Here is an example from within the submit description file that forces submission to the Globus resource denali.mcs.anl.gov:

      requirements = ( machine == "denali.mcs.anl.gov" ) \
         && FileSystemDomain != "" \
         && Arch != "" && OpSys != ""
This example requires that the job run only on denali.mcs.anl.gov, and it prevents Condor from inserting the filesystem domain, architecture, and operating system attributes as requirements in the matchmaking process. Condor must be told not to use the submission machine's attributes in those cases where the Globus resource's attributes do not match the submission machine's attributes.


next up previous contents index
Next: 2.11 Inter-job Dependencies: DAGMan Up: 2. Users' Manual Previous: 2.9 Running MPICH jobs
condor-admin@cs.wisc.edu