This section contains what users need to know to install Condor-G, run, and manage jobs under the globus universe.
Condor-G allows the user to treat the Grid as a local resource, and the same command-line tools perform basic job management such as:
These are features that Condor has provided for many years. Condor-G extends this to the grid, providing resource management while still providing fault tolerance and exactly-once execution semantics.
Figure 5.1 shows how Condor-G interacts with Globus protocols. Condor-G contains a GASS server, used to transfer the executable, stdin, stdout, and stderr to and from the remote job execution site. Condor-G uses the GRAM protocol to contact the remote Globus Gatekeeper and request that a new job manager be started. GRAM is also used to monitor the job's progress. Condor-G detects and intelligently handles cases such as if the remote Globus resource crashes.
The second way to obtain Condor-G uses the GPT-packaged version. GPT is the Grid Packaging Technology from NCSA, the native packaging format for the Globus Toolkit(tm). The GPT-packaged version of Condor-G will install into an existing Globus Toolkit installation. It is not capable of providing the functionality of a complete Condor pool, but it does allow use of the Condor job queuing interface to the Grid. It is appropriate for those who only want to submit jobs to Globus-managed resources.
The following sections detail the installation and start up of Condor-G based on these two ways.
)
there are three steps necessary before a globus universe job
can be submitted:
After the transfer is complete, you will have received some text files along with the file condor-g.tar. Untar this file in the existing $(release) directory to produce the three files
sbin/condor_gridmanager
sbin/gahp_server
etc/examples/condor_config.local.condor-g
If Condor-G is installed as root, the file set by the configuration variable GRIDMANAGER_LOG must have world-write permission. The Gridmanager runs as the user who submitted the job, so the Gridmanager may not be able to write to the ordinary $(log) directory. The example configuration file sets the log file to be
GRIDMANAGER_LOG = $(LOG)/GridLogs/GridmanagerLog.$(USERNAME)Use of this definition of GRIDMANAGER_LOG will likely require the creation of the directory
$(LOG)/GridLogs.
Permissions on this directory should be set
by running chmod using the value 1777.
Another option is to use the commented out configuration, located directly below within the example configuration file, to set GRIDMANAGER_LOG with
GRIDMANAGER_LOG = /tmp/GridmanagerLog.$(USERNAME)
on page
for details.
Under Condor, successful job submission to the Globus universe requires credentials. An X.509 certificate is used to create a proxy, and an account, authorization, or allocation to use a grid resource is required. For more information on proxies and certificates, please consult the Alliance PKI pages at
http://archive.ncsa.uiuc.edu/SCD/Alliance/GridSecurity/
Before submitting a job to Condor under the Globus universe, make sure you have your Grid credentials and have used grid-proxy-init to create a proxy.
A job is submitted for execution to Condor using the condor_ submit command. condor_ submit takes as an argument the name of a file called a submit description file. The following sample submit description file runs a job on the Origin2000 at NCSA.
executable = test globusscheduler = modi4.ncsa.uiuc.edu/jobmanager universe = globus output = test.out log = test.log queue
The executable for this example is transferred from the local machine to the remote machine. By default, Condor transfers the executable. Note that this executable must be compiled for the correct platform.
The globusscheduler command is dependent on the scheduling software available on remote resource. This required command will change based on the Grid resource intended for execution of the job.
All Condor-G jobs (intended for execution on Globus-controlled
resources) are submitted to the globus universe.
The universe = globus command is required
in the submit description file.
No input file is specified for this example job. Condor transfers the output file produced from the remote machine to the local machine during execution. The log file is maintained on the local machine.
To submit this job to Condor-G for execution on the remote machine, use
condor_submit test.submitwhere test.submit is the name of the submit description file.
Example output from condor_ q for this submission looks like:
% condor_q -- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 7.0 epaulson 3/26 14:08 0+00:00:00 I 0 0.0 test 1 jobs; 1 idle, 0 running, 0 held
After a short time, Globus accepts the job. Again running condor_ q will now result in
% condor_q -- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 7.0 epaulson 3/26 14:08 0+00:01:15 R 0 0.0 test 1 jobs; 0 idle, 1 running, 0 held
Then, very shortly after that, the queue will be empty again, because the job has finished:
% condor_q -- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held
A second example of a submit description file runs the Unix ls program on a different Globus resource.
executable = /bin/ls Transfer_Executable = false globusscheduler = vulture.cs.wisc.edu/jobmanager universe = globus output = ls-test.out log = ls-test.log queue
In this example, the executable (the binary) is pre-staged. The executable is on the remote machine, and it is not to be transferred before execution. Note that the required globusscheduler and universe commands are present. The command
Transfer_Executable = FALSEwithin the submit description file identifies the executable as being pre-staged. In this case, the executable command gives the path to the executable on the remote machine.
A third example shows how Condor-G can set environment variables for a job. Save the following Perl script with the name env-test.pl, and run the Unix command
chmod 755 env-test.plto make the Perl script executable.
#!/usr/bin/env perl
foreach $key (sort keys(%ENV))
{
print "$key = $ENV{$key}\n"
}
exit 0;
Now create the following submit file (Replace biron.cs.wisc.edu/jobmanager with a resource you are authorized to use.):
executable = env-test.pl globusscheduler = biron.cs.wisc.edu/jobmanager universe = globus environment = foo=bar; zot=qux output = env-test.out log = env-test.log queue
When the job has completed, the output file env-test.out should contain something like this:
GLOBUS_GRAM_JOB_CONTACT = https://biron.cs.wisc.edu:36213/30905/1020633947/ GLOBUS_GRAM_MYJOB_CONTACT = URLx-nexus://biron.cs.wisc.edu:36214 GLOBUS_LOCATION = /usr/local/globus GLOBUS_REMOTE_IO_URL = /home/epaulson/.globus/.gass_cache/globus_gass_cache_1020633948 HOME = /home/epaulson LANG = en_US LOGNAME = epaulson X509_USER_PROXY = /home/epaulson/.globus/.gass_cache/globus_gass_cache_1020633951 foo = bar zot = qux
Of particular interest is the GLOBUS_REMOTE_IO_URL environment variable. Condor-G automatically starts up a GASS remote I/O server on the submitting machine. Because of the potential for either side of the connection to fail, the URL for the server cannot be passed directly to the job. Instead, it is put into a file, and the GLOBUS_REMOTE_IO_URL environment variable points to this file. Remote jobs can read this file and use the URL it contains to access the remote GASS server running inside Condor-G. If the location of the GASS server changes (for example, if Condor-G restarts), Condor-G will contact the Globus gatekeeper and update this file on the machine where the job is running. It is therefore important that all accesses to the remote GASS server check this file for the latest location.
The following Perl script will use the GASS server in Condor-G to copy input files to the execute machine. (In our case, our remote job is just going to count the number of lines in a file. Hopefully, your job will be a bit more productive.)
#!/usr/bin/env perl
use FileHandle;
use Cwd;
STDOUT->autoflush();
$gassUrl = `cat $ENV{GLOBUS_REMOTE_IO_URL}`;
chomp $gassUrl;
$ENV{LD_LIBRARY_PATH} = $ENV{GLOBUS_LOCATION}. "/lib";
$urlCopy = $ENV{GLOBUS_LOCATION}."/bin/globus-url-copy";
# globus-url-copy needs a full pathname
$pwd = getcwd();
print "$urlCopy $gassUrl/etc/hosts file://$pwd/temporary.hosts\n\n";
`$urlCopy $gassUrl/etc/hosts file://$pwd/temporary.hosts`;
open(file, "temporary.hosts");
while(<file>) {
print $_;
}
exit 0;
Our submit file looks like this:
executable = gass-example.pl globusscheduler = biron.cs.wisc.edu/jobmanager universe = globus output = gass.out log = gass.log queue
There are two optional submit description file commands of note: x509userproxy and globusrsl. The x509userproxy command specifies the path to an X.509 proxy. The command is of the form:
x509userproxy = /path/on/file/systemIf this optional command is not present in the submit description file, then Condor-G checks the value of the environment variable X509_USER_PROXY for the location of the proxy. If this environment variable is not present, then Condor-G looks for the proxy in the file /tmp/x509up_u0000, where the trailing zeros in this file name are replaced with the Unix user id.
The globusrsl command is used to add additional attribute settings to a job's RSL string. The format of the globusrsl command is
globusrsl = (name=value)(name=value)An example of this command in a submit description file
globusrsl = (project=Test_Project)This example's attribute name for the additional RSL is
project, and the value assigned is Test_Project.
The following are required configuration file entries that relate to submission of globus universe jobs. Condor-G fails if any of these entries are missing. These entries are provided in the file etc/examples/condor_config.local.condor-g that is used during the installation of the Condor-G contrib module.
GRIDMANAGER = $(SBIN)/condor_gridmanager GRIDMANAGER_LOG = $(LOG)/GridmanagerLog MAX_GRIDMANAGER_LOG = 64000 GRIDMANAGER_DEBUG = D_COMMAND GAHP = $(SBIN)/gahp_server
GRIDMANAGER gives the path to the gridmanager daemon. The GRIDMANAGER_LOG and MAX_GRIDMANAGER_LOG entries give the location of and how long the log files may be. GRIDMANAGER_DEBUG sets a debugging level for the gridmanager daemon. The GAHP entry specifies the location of the required GAHP (Globus ASCII Helper Protocol) server. Details of the protocol may be found at http://www.cs.wisc.edu/condor/gahp/.
Further configuration file entries are for the gridmanager daemon, and they are relevant to the newest job managers from the Globus 2.0 version of software.
GRIDMANAGER_CHECKPROXY_INTERVAL = 600 GRIDMANAGER_MINIMUM_PROXY_TIME = 180
Condor-G periodically checks for an updated proxy at an interval given by the configuration variable GRIDMANAGER_CHECKPROXY_INTERVAL. The value is defined in terms of seconds. For example, if you create a 12-hour proxy, and then 6 hours later re-run grid-proxy-init, Condor-G will check the proxy within this time interval, and use the new proxy it finds there. The default interval is 10 minutes.
Condor-G also knows when the proxy of each job will expire, and if the proxy is not refreshed before GRIDMANAGER_MINIMUM_PROXY_TIME seconds before the proxy expires, Condor-G will shut down. In other words, if GRIDMANAGER_MINIMUM_PROXY_TIME is 180, and the proxy is 3 minutes away from expiring, Condor-G will attempt to safely shut down, instead of simply losing contact with the remote job because it is unable to authenticate it. The default setting is 3 minutes (180 seconds).