LIGO Support Ticket 1663
Ticket Information
Number: support 1663
User: dbrown@ligo.caltech.edu
Email: anderson__AT__ligo.caltech.edu,espinoza_e__AT__ligo.caltech.edu
Status: new
Assigned To: tannenba
CC: Stuart Anderson <anderson__AT__ligo.caltech.edu>, Erik Espinoza
<espinoza_e__AT__ligo.caltech.edu>, Todd Tannenbaum
<tannenba__AT__cs.wisc.edu>
From: Duncan Brown <dbrown__AT__ligo.caltech.edu>
Subject: LIGO: condor_submit problem and a couple of feature requests
Date: Tue, 22 Aug 2006 11:34:00 -0700
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Hi Todd,
I have been using the parallel universe for MPI jobs on our cluster,
which has 4 CPUs per node and, from a users perspective, I have been
trying to get around the system NEGOTIATOR_POST_JOB_RANK which
attempts to fill the nodes breadth-first instead of depth-first.
The rank I want should prioritize virtual machines on nodes that have
all 4 CPUs available (above those that only have 3, 2 or 1 free)
since communicating over shared memory is less expensive than
ethernet. I'd also like to make sure the cluster gets filled depth-
first for these parallel jobs.
Reading the manual, I see that NEGOTIATOR_PRE_JOB_RANK is evaluated
first, then the Rank in the submit file, then the
NEGOTIATOR_POST_JOB_RANK. Our pre rank is just
LIGO_NEGOTIATOR_PRE_JOB_RANK = RemoteOwner =?= UNDEFINED
so at the sub file level I set my rank to
rank = 1 * (Machine == "node1.ldas-cit.ligo.caltech.edu") + 1000 *
(40 - TotalCondorLoadAvg) + \
2 * (Machine == "node2.ldas-cit.ligo.caltech.edu") + 1000 * (40 -
TotalCondorLoadAvg) + \
3 * (Machine == "node3.ldas-cit.ligo.caltech.edu") + 1000 * (40 -
TotalCondorLoadAvg) + \
4 * (Machine == "node4.ldas-cit.ligo.caltech.edu") + 1000 * (40 -
TotalCondorLoadAvg) + \
and so on. The first part of each expression causes the nodes to be
ordered depth first and the second part ranks according to the load
average.
The first problem I encountered was that wit 327 nodes, condor_submit
segfaults. Shortening the rank line fixes this problem, but if the
expression is too long, I think condor_submit should print and error
rather than just crashing.
Obviously the better solution is to get the sysadmin to change the
NEGOTIATOR_POST_JOB_RANK to do different things for parallel vs
standard/vanilla jobs, and Stuart tells me that he's discussed this
with you.
I then tried changing the rank to just
rank = (40 - 2 * TotalCondorLoadAvg)
to order by reverse condor load on a node. This apparently does what
I want: jobs go to machines with a total condor load average of 0
first and fill up depth first. I don't understand why this happens,
however. If I understand the documentation correctly, after my submit
file rank is evaluated, all machines with a load average of 0 should
have the same rank. On an empty cluster, every machine has the same
rank, so I would have expected NEGOTIATOR_POST_JOB_RANK to be used to
break the degeneracy (which is why I tried the huge rank expression
in the first place: to make sure all machines were ranked before
NEGOTIATOR_POST_JOB_RANK is invoked). This doesn't appear to happen,
however, which is good for my jobs, but confusing. Any ideas?
Also, I think it would be useful if the Starter advertized the number
of claimed vms on a node in the classadd for each vm i.e. something
like TotalCpuBusy (similar to TotalCondorLoadAvg). Using
TotalCondorLoadAvg in the rank can be fooled by jobs that claim a
load, but don't produce significant load.
Finally, I think a -parallel option to condor_run would be useful.
Currently condor_q -run only shows the zeroth node in each parallel job
[dbrown__AT__ldas-grid.ligo Run]$ condor_q -run dbrown
-- Quill: citquill@ligo : <10.14.0.12:5432> : citquill_db
ID OWNER SUBMITTED RUN_TIME HOST(S)
6414747.0 dbrown 8/22 09:41 0+01:47:54 vm3__AT__node119.ldas-
cit.ligo.caltech.edu
6414748.0 dbrown 8/22 09:56 0+01:31:28 vm3__AT__node231.ldas-
cit.ligo.caltech.edu
6414749.0 dbrown 8/22 09:58 0+01:30:53 vm3__AT__node257.ldas-
cit.ligo.caltech.edu
6414750.0 dbrown 8/22 09:58 0+01:30:18 vm1__AT__node57.ldas-
cit.ligo.caltech.edu
6414751.0 dbrown 8/22 10:02 0+01:25:19 vm1__AT__node258.ldas-
cit.ligo.caltech.edu
6414827.0 dbrown 8/22 11:00 0+00:27:19 vm1__AT__node26.ldas-
cit.ligo.caltech.edu
I think it would be useful if it could also show the nodes that make
up the parallel job, even better if in a tree format, like -dag, i.e
something like:
[dbrown__AT__ldas-grid.ligo Run]$ condor_q -parallel -dag -run
-- Quill: citquill@ligo : <10.14.0.12:5432> : citquill_db
ID OWNER/NODENAME SUBMITTED RUN_TIME HOST(S)
6387664.0 lindy 8/19 21:30 2+14:00:22 ldas-
grid.ligo.caltech.edu
6396883.0 arogan 8/20 15:11 1+20:19:14 ldas-
grid.ligo.caltech.edu
6397751.0 dietz 8/21 03:49 1+07:41:58 ldas-
grid.ligo.caltech.edu
6397752.0 |-55c0a28450c 8/21 03:49 1+07:41:38 vm1__AT__node96.ldas-
cit.ligo.caltech.edu
6397753.0 |-9c558ab445f 8/21 03:49 1+07:41:36 vm3__AT__node96.ldas-
cit.ligo.caltech.edu
6397754.0 |-4391edcc9fb 8/21 03:49 1+07:41:36 vm1__AT__node53.ldas-
cit.ligo.caltech.edu
6414747.0 dbrown 8/22 09:41 0+01:49:32 vm3__AT__node119.ldas-
cit.ligo.caltech.edu
6414747.0 dbrown 8/22 09:41 0+01:49:32 vm3__AT__node119.ldas-
cit.ligo.caltech.edu
6414747.0 |-subproc1 8/22 09:41 0+01:49:32 vm2__AT__node119.ldas-
cit.ligo.caltech.edu
6414747.0 |-subproc2 8/22 09:41 0+01:49:32 vm1__AT__node119.ldas-
cit.ligo.caltech.edu
Cheers,
Duncan.
--
Duncan Brown California Institute of
Technology
Tapir: (626) 395 8409 MS 18-34, Pasadena, CA
91125, USA
LIGO: (626) 395 8812 http://www.lsc-
group.phys.uwm.edu/~duncan
===========================================================================
Date of creation: Tue Aug 22 13:51:21 2006 (1156272684)
Subject: Actions
Assigned to tannenba by akarp
===========================================================================
Date of actions: Tue Aug 22 15:01:11 2006 (1156276871)