Condor - High Throughput Computing


Condor or Condor-G: Which one is right for me?

A source of confusion for many new (and old) Condor users is the relationship between Condor and Condor-G. People who are new to grid computing may have heard that Condor-G gives them a way to easily get their jobs running on the resources of the emerging Grids. Managers and users of existing Condor pools also want to link their resources with other pools to establish a larger Grid. These paragraphs attempt to explain the differences between Condor-G and Condor, they give example use cases, and they elaborate on which is the right one to install.

The difference between Condor and Condor-G will become clearer with a brief overview of the software that forms a Condor pool. For this discussion, the software of a Condor pool is divided into two parts. The first part does job management. It keeps track of a user's jobs. You can ask the job management part of Condor to show you the job queue, to submit new jobs to the system, to put jobs on hold, and to request information about jobs that have completed. The other part of the Condor software does resource management. It keeps track of which machines are available to run jobs, how the available machines should be utilized given all the users who want to run jobs on them, and when a machine is no longer available.

A machine with the job management part installed is called a submit machine. A machine with the resource management part installed is called an execute machine. Each machine may have one part or both. For example, a cluster of 10 machines may have one machine as a submit machine, and the other nine as execute machines. At the UW Computer Sciences department, we have both parts installed on all of our 300 users' desktops. This allows anyone in the department to submit from his/her own machine. No one needs to login to a central server to use the thousand machines in the pool, and every machine is potentially available run jobs.

What Is Condor-G?

Condor-G is the job management part of Condor. Condor-G lets you submit jobs into a queue, have a log detailing the life cycle of your jobs, manage your input and output files, along with everything else you expect from a job queuing system. Condor-G gets its name from how it talks to the resource management part. Instead of using the Condor-developed protocols to start running a job on a remote machine, Condor-G uses the Globus Toolkit(tm) to start the job on the remote machine. Condor-G provides a "window to the Grid" for users to both access resources and manage jobs running on remote resources. Use Condor-G to look across the Grid and see instantly how your jobs are doing. You can trust Condor-G to keep watching your jobs while your attention is elsewhere.

Who should want to install Condor-G?

Install Condor-G to submit to resources accessible through a Globus interface. For example, the Origin2000 at Boston University is running LSF, and it also provides a Globus interface. One way of running jobs on this machine is to use 'ssh' to connect to the Origin, and run the LSF submit command. Another way of running jobs on this machine is to use the Globus interface directly to ask the Origin to run a job for you. Better yet, use Condor-G to submit the jobs. Condor-G gives you a consistent interface for submitting jobs, whether to another machine in a local pool or to a resource sitting behind a firewall on the other side of the country. Condor-G users use the same interface to run a job on a Linux cluster managed by PBS or the NQS queue on a Cray T3E.

Do not install Condor-G if you want to add a Globus interface to your existing Condor pool. Remember, Condor-G does not *create* a grid service. It only deals with *using* remote grid services.

Of course, much like you can have a machine be both a submit machine and an execute machine in a Condor pool, you can have Condor-G and the Globus interface to a pool installed at the same. Users will submit jobs through Condor-G in order to have their jobs execute on resources outside the Condor pool. External users will use the Globus interface to submit jobs to be executed inside the Condor pool. (Yes, you could use the submission semantics of Condor-G to force a job to be submitted to the Globus interface on the local Condor pool, but that is a rather circuitous way to run jobs.)

How do I get Condor-G for installation?

See instructions on Condor-G web page

Answers to some frequently asked questions

  1. Do I need to have a Condor pool installed to use Condor-G?
    No, you do not. Condor-G is only the job management part of Condor. It makes perfect sense to install Condor-G on just one machine within an organization. Note that access to remote resources using a Globus interface will be done through that one machine using Condor-G.
  2. I am an existing Condor pool administrator, and I want to be able to let my users access resources managed by Globus, as well as access the Condor pool. Should I install Condor-G?
    Yes.
  3. I want to install Globus on my Condor pool, and have external users submit jobs into the pool. Do I need to install Condor-G?
    No, you do not need to install Condor-G. You need to install Globus, to get the "Condor jobmanager setup package." See the Condor-G home page for a link to Globus.
  4. I just bought a 64-node cluster. Should I install Condor or another system such as PBS, or LSF, or GridEngine?
    Answer: It is up to you. We obviously would rather see you run only Condor on your new cluster, but your decision depends on what you're most experienced and comfortable with. If you install a different batch system, you can still install Globus on the front end, and you can use Condor-G to let external users submit jobs to run on your cluster.
  5. I am the administrator at Physics, and I have a 64-node cluster running Condor. The administrator at Chemistry is also running Condor on her 64-node cluster. We would like to be able to share resources. Do I need to install Globus and Condor-G?
    You may, but you do not have a need to. Condor has a feature called flocking, which lets multiple Condor pools share resources. By setting configuration variables, jobs may be executed on either cluster. Flocking is good (better than Condor-G) because all the Condor features continue to work. Examples of features are checkpointing remote system calls. Unfortunately, flocking only works between Condor pools. So, access to a resource managed by PBS, for example, will still require Globus and Condor-G to submit jobs into the PBS queue.
  6. Why would I want to use Condor-G?
    If you have more than a trivial set of work to get done, you will find that you spend a great deal of time managing your jobs without Condor. Your time would be better spent working with your results. For example, imagine you want to run 1000 jobs on the NCSA Itanium cluster without Condor-G. To use the Globus interface directly, you will type 'globusrun' 1000 times. Now imagine that you want to check on the status of your jobs. Using 'globus-job-status' 1000 times will not be much fun. And, heaven help you if you find a bug and need to cancel your thousand jobs. This will be followed by resubmitting all 1000 jobs. Under Condor-G, job submission and management is simplified. Condor-G will also ease submission and management of jobs when job flow is complex. For example, if the first 100 jobs must run before any of the next 100 jobs, and once those 200 jobs are done, the next 700 may start. The last 100 may run only after the first 900 finish. Condor DAGMan manages all of the dependencies for you. You can avoid writing a rather complex Perl script.
  7. What piece of software is most like Condor?
    Condor is comparable to LSF, Sun GRID Engine, or PBS.
  8. What piece of software is most like Condor-G?
    Condor-G is most like globus-job-submit, or a web job-submission portal.
  9. Can I install both a Globus interface to a Condor pool and Condor-G at the same time?
    Yes. In fact, you might want to. (See question 2 above.)
  10. What is Glidein?
    Glidein provides a way to temporarily add a Globus resource to a local Condor pool. Condor-G is not required to use glidein. Glidein uses Globus resource-management software to run jobs. It works by executing the portions of Condor software on the Globus resource. Then, Condor may execute the user's jobs. There are several benefits to working in this way. Checkpoints can be made of the jobs running on the Globus resource, and remote system calls can be supported. Condor can also dynamically schedule jobs across the Grid.
  11. What is planned for future Condor-G versions?
    In the short term, there are a few minor queuing features missing from Condor-G, such as sending job-completion e-mail. Globus 2.2 adds file-staging to Globus resources, so a future Condor-G will be able to transfer all of a job's data files to a remote site, and not just stdin, stdout, and stderr. A Windows port of Globus is expected sometime in late 2002 or 2003, and when that is complete, Condor-G will run on Windows.


Condor home page   -   condor-admin@cs.wisc.edu