next up previous contents index
Next: 5.4 Limitations of Condor-G Up: 5. Condor-G Previous: 5.2 Working with Globus   Contents   Index

Subsections


5.3 Using the Globus Universe

This section contains what users need to know to install Condor-G, run, and manage jobs under the globus universe.


5.3.1 Accessing the Grid with Condor-G

Condor-G allows the user to treat the Grid as a local resource, and the same command-line tools perform basic job management such as:

These are features that Condor has provided for many years. Condor-G extends this to the grid, providing resource management while still providing fault tolerance and exactly-once execution semantics.

Figure 5.1: Remote Execution by Condor-G on Globus managed resources
\includegraphics{condor-g/gfig1.eps}

Figure 5.1 shows how Condor-G interacts with Globus protocols. Condor-G contains a GASS server, used to transfer the executable, stdin, stdout, and stderr to and from the remote job execution site. Condor-G uses the GRAM protocol to contact the remote Globus Gatekeeper and request that a new job manager be started. GRAM is also used to monitor the job's progress. Condor-G detects and intelligently handles cases such as if the remote Globus resource crashes.


5.3.2 Condor-G Installation

There are two ways to obtain and install Condor-G. The first and recommended method utilizes a full installation of Condor together with a contrib module to acquire the ability to submit globus universe jobs. If a pool of machines running Condor Version 6.5.0 already exists, then the path to submitting globus universe jobs is quite short.

The second way to obtain Condor-G uses the GPT-packaged version. GPT is the Grid Packaging Technology from NCSA, the native packaging format for the Globus Toolkit(tm). The GPT-packaged version of Condor-G will install into an existing Globus Toolkit installation. It is not capable of providing the functionality of a complete Condor pool, but it does allow use of the Condor job queuing interface to the Grid. It is appropriate for those who only want to submit jobs to Globus-managed resources.

The following sections detail the installation and start up of Condor-G based on these two ways.


5.3.2.1 Full Install with Condor-G Contrib Module

Once Condor is obtained via download, installed, and configured, (see manual section 3.2 on page [*]) there are three steps necessary before a globus universe job can be submitted:

  1. Obtain the Condor-G contrib module. From the Condor home page, http://www.cs.wisc.edu/condor/, find and click on the Condor-G page. Find and click on the Condor-G contrib module link. After agreeing to the license, find and click on the Condor-G module for the proper platform to begin the transfer.

    After the transfer is complete, you will have received some text files along with the file condor-g.tar. Untar this file in the existing $(release) directory to produce the three files

        sbin/condor_gridmanager
        sbin/gahp_server
        etc/examples/condor_config.local.condor-g
    

  2. Configure for Condor-G. To configure Condor to be able to run globus universe jobs, import the contents of the file etc/examples/condor_config.local.condor-g to the existing configuration file.

    If Condor-G is installed as root, the file set by the configuration variable GRIDMANAGER_LOG must have world-write permission. The Gridmanager runs as the user who submitted the job, so the Gridmanager may not be able to write to the ordinary $(log) directory. The example configuration file sets the log file to be

    GRIDMANAGER_LOG = $(LOG)/GridLogs/GridmanagerLog.$(USERNAME)
    
    Use of this definition of GRIDMANAGER_LOG will likely require the creation of the directory $(LOG)/GridLogs. Permissions on this directory should be set by running chmod using the value 1777.

    Another option is to use the commented out configuration, located directly below within the example configuration file, to set GRIDMANAGER_LOG with

    GRIDMANAGER_LOG  = /tmp/GridmanagerLog.$(USERNAME)
    

  3. Run Condor. Directions for running the Condor daemons do not change when using the Condor-G contrib module. See section [*] on page [*] for details.


5.3.2.2 GPT NMI release including Condor-G


5.3.3 Running a Globus Universe Job

Under Condor, successful job submission to the Globus universe requires credentials. An X.509 certificate is used to create a proxy, and an account, authorization, or allocation to use a grid resource is required. For more information on proxies and certificates, please consult the Alliance PKI pages at

http://archive.ncsa.uiuc.edu/SCD/Alliance/GridSecurity/

Before submitting a job to Condor under the Globus universe, make sure you have your Grid credentials and have used grid-proxy-init to create a proxy.

A job is submitted for execution to Condor using the condor_ submit command. condor_ submit takes as an argument the name of a file called a submit description file. The following sample submit description file runs a job on the Origin2000 at NCSA.

executable = test
globusscheduler = modi4.ncsa.uiuc.edu/jobmanager
universe = globus
output = test.out
log = test.log
queue

The executable for this example is transferred from the local machine to the remote machine. By default, Condor transfers the executable. Note that this executable must be compiled for the correct platform.

The globusscheduler command is dependent on the scheduling software available on remote resource. This required command will change based on the Grid resource intended for execution of the job.

All Condor-G jobs (intended for execution on Globus-controlled resources) are submitted to the globus universe. The universe = globus command is required in the submit description file.

No input file is specified for this example job. Condor transfers the output file produced from the remote machine to the local machine during execution. The log file is maintained on the local machine.

To submit this job to Condor-G for execution on the remote machine, use

condor_submit test.submit
where test.submit is the name of the submit description file.

Example output from condor_ q for this submission looks like:

% condor_q


-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi

 ID      OWNER         SUBMITTED     RUN_TIME ST PRI SIZE CMD
   7.0   epaulson     3/26 14:08   0+00:00:00 I  0   0.0  test

1 jobs; 1 idle, 0 running, 0 held

After a short time, Globus accepts the job. Again running condor_ q will now result in

% condor_q


-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi

 ID      OWNER         SUBMITTED     RUN_TIME ST PRI SIZE CMD
   7.0   epaulson     3/26 14:08   0+00:01:15 R  0   0.0  test

1 jobs; 0 idle, 1 running, 0 held

Then, very shortly after that, the queue will be empty again, because the job has finished:

% condor_q


-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi

 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

0 jobs; 0 idle, 0 running, 0 held

A second example of a submit description file runs the Unix ls program on a different Globus resource.

executable = /bin/ls
Transfer_Executable = false
globusscheduler = vulture.cs.wisc.edu/jobmanager
universe = globus
output = ls-test.out
log = ls-test.log
queue

In this example, the executable (the binary) is pre-staged. The executable is on the remote machine, and it is not to be transferred before execution. Note that the required globusscheduler and universe commands are present. The command

Transfer_Executable = FALSE
within the submit description file identifies the executable as being pre-staged. In this case, the executable command gives the path to the executable on the remote machine.

A third example shows how Condor-G can set environment variables for a job. Save the following Perl script with the name env-test.pl, and run the Unix command

chmod 755 env-test.pl
to make the Perl script executable.

#!/usr/bin/env perl

foreach $key (sort keys(%ENV))
{
   print "$key = $ENV{$key}\n"
}

exit 0;

Now create the following submit file (Replace biron.cs.wisc.edu/jobmanager with a resource you are authorized to use.):

executable = env-test.pl
globusscheduler = biron.cs.wisc.edu/jobmanager
universe = globus
environment = foo=bar; zot=qux
output = env-test.out
log = env-test.log
queue

When the job has completed, the output file env-test.out should contain something like this:

GLOBUS_GRAM_JOB_CONTACT = https://biron.cs.wisc.edu:36213/30905/1020633947/
GLOBUS_GRAM_MYJOB_CONTACT = URLx-nexus://biron.cs.wisc.edu:36214
GLOBUS_LOCATION = /usr/local/globus
GLOBUS_REMOTE_IO_URL = /home/epaulson/.globus/.gass_cache/globus_gass_cache_1020633948
HOME = /home/epaulson
LANG = en_US
LOGNAME = epaulson
X509_USER_PROXY = /home/epaulson/.globus/.gass_cache/globus_gass_cache_1020633951
foo = bar
zot = qux

Of particular interest is the GLOBUS_REMOTE_IO_URL environment variable. Condor-G automatically starts up a GASS remote I/O server on the submitting machine. Because of the potential for either side of the connection to fail, the URL for the server cannot be passed directly to the job. Instead, it is put into a file, and the GLOBUS_REMOTE_IO_URL environment variable points to this file. Remote jobs can read this file and use the URL it contains to access the remote GASS server running inside Condor-G. If the location of the GASS server changes (for example, if Condor-G restarts), Condor-G will contact the Globus gatekeeper and update this file on the machine where the job is running. It is therefore important that all accesses to the remote GASS server check this file for the latest location.

The following Perl script will use the GASS server in Condor-G to copy input files to the execute machine. (In our case, our remote job is just going to count the number of lines in a file. Hopefully, your job will be a bit more productive.)

#!/usr/bin/env perl
use FileHandle;
use Cwd;

STDOUT->autoflush();
$gassUrl = `cat $ENV{GLOBUS_REMOTE_IO_URL}`;
chomp $gassUrl;

$ENV{LD_LIBRARY_PATH} = $ENV{GLOBUS_LOCATION}. "/lib";
$urlCopy = $ENV{GLOBUS_LOCATION}."/bin/globus-url-copy";

# globus-url-copy needs a full pathname
$pwd = getcwd();
print "$urlCopy $gassUrl/etc/hosts file://$pwd/temporary.hosts\n\n";
`$urlCopy $gassUrl/etc/hosts file://$pwd/temporary.hosts`;

open(file, "temporary.hosts");
while(<file>) {
print $_;
}

exit 0;

Our submit file looks like this:

executable = gass-example.pl
globusscheduler = biron.cs.wisc.edu/jobmanager
universe = globus
output = gass.out
log = gass.log
queue

There are two optional submit description file commands of note: x509userproxy and globusrsl. The x509userproxy command specifies the path to an X.509 proxy. The command is of the form:

x509userproxy = /path/on/file/system
If this optional command is not present in the submit description file, then Condor-G checks the value of the environment variable X509_USER_PROXY for the location of the proxy. If this environment variable is not present, then Condor-G looks for the proxy in the file /tmp/x509up_u0000, where the trailing zeros in this file name are replaced with the Unix user id.

The globusrsl command is used to add additional attribute settings to a job's RSL string. The format of the globusrsl command is

globusrsl = (name=value)(name=value)
An example of this command in a submit description file
globusrsl = (project=Test_Project)
This example's attribute name for the additional RSL is project, and the value assigned is Test_Project.


5.3.4 Configuration and Credential Management

The following are required configuration file entries that relate to submission of globus universe jobs. Condor-G fails if any of these entries are missing. These entries are provided in the file etc/examples/condor_config.local.condor-g that is used during the installation of the Condor-G contrib module.

GRIDMANAGER             = $(SBIN)/condor_gridmanager
GRIDMANAGER_LOG         = $(LOG)/GridmanagerLog
MAX_GRIDMANAGER_LOG     = 64000
GRIDMANAGER_DEBUG       = D_COMMAND
GAHP                    = $(SBIN)/gahp_server

GRIDMANAGER gives the path to the gridmanager daemon. The GRIDMANAGER_LOG and MAX_GRIDMANAGER_LOG entries give the location of and how long the log files may be. GRIDMANAGER_DEBUG sets a debugging level for the gridmanager daemon. The GAHP entry specifies the location of the required GAHP (Globus ASCII Helper Protocol) server. Details of the protocol may be found at http://www.cs.wisc.edu/condor/gahp/.

Further configuration file entries are for the gridmanager daemon, and they are relevant to the newest job managers from the Globus 2.0 version of software.

GRIDMANAGER_CHECKPROXY_INTERVAL = 600
GRIDMANAGER_MINIMUM_PROXY_TIME = 180

Condor-G periodically checks for an updated proxy at an interval given by the configuration variable GRIDMANAGER_CHECKPROXY_INTERVAL. The value is defined in terms of seconds. For example, if you create a 12-hour proxy, and then 6 hours later re-run grid-proxy-init, Condor-G will check the proxy within this time interval, and use the new proxy it finds there. The default interval is 10 minutes.

Condor-G also knows when the proxy of each job will expire, and if the proxy is not refreshed before GRIDMANAGER_MINIMUM_PROXY_TIME seconds before the proxy expires, Condor-G will shut down. In other words, if GRIDMANAGER_MINIMUM_PROXY_TIME is 180, and the proxy is 3 minutes away from expiring, Condor-G will attempt to safely shut down, instead of simply losing contact with the remote job because it is unable to authenticate it. The default setting is 3 minutes (180 seconds).


next up previous contents index
Next: 5.4 Limitations of Condor-G Up: 5. Condor-G Previous: 5.2 Working with Globus   Contents   Index
condor-admin@cs.wisc.edu