LIGO Support Ticket 1663

Ticket Information
  Number:      support 1663
  User:        dbrown@ligo.caltech.edu
  Email:       anderson__AT__ligo.caltech.edu,espinoza_e__AT__ligo.caltech.edu
  Status:      new
  Assigned To: tannenba
CC: Stuart Anderson <anderson__AT__ligo.caltech.edu>,         Erik Espinoza
 <espinoza_e__AT__ligo.caltech.edu>,         Todd Tannenbaum
 <tannenba__AT__cs.wisc.edu>
From: Duncan Brown <dbrown__AT__ligo.caltech.edu>
Subject: LIGO: condor_submit problem and a couple of feature requests
Date: Tue, 22 Aug 2006 11:34:00 -0700
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>

Hi Todd,

I have been using the parallel universe for MPI jobs on our cluster,  
which has 4 CPUs per node and, from a users perspective, I have been  
trying to get around the system NEGOTIATOR_POST_JOB_RANK which  
attempts to fill the nodes breadth-first instead of depth-first.

The rank I want should prioritize virtual machines on nodes that have  
all 4 CPUs available (above those that only have 3, 2 or 1 free)  
since communicating over shared memory is less expensive than  
ethernet. I'd also like to make sure the cluster gets filled depth- 
first for these parallel jobs.

Reading the manual, I see that NEGOTIATOR_PRE_JOB_RANK is evaluated  
first, then the Rank in the submit file, then the  
NEGOTIATOR_POST_JOB_RANK. Our pre rank is just

LIGO_NEGOTIATOR_PRE_JOB_RANK = RemoteOwner =?= UNDEFINED

so at the sub file level I set my rank to

rank = 1 * (Machine == "node1.ldas-cit.ligo.caltech.edu") + 1000 *  
(40 - TotalCondorLoadAvg) + \
2 * (Machine == "node2.ldas-cit.ligo.caltech.edu") + 1000 * (40 -  
TotalCondorLoadAvg) + \
3 * (Machine == "node3.ldas-cit.ligo.caltech.edu") + 1000 * (40 -  
TotalCondorLoadAvg) + \
4 * (Machine == "node4.ldas-cit.ligo.caltech.edu") + 1000 * (40 -  
TotalCondorLoadAvg) + \

and so on. The first part of each expression causes the nodes to be  
ordered depth first and the second part ranks according to the load  
average.

The first problem I encountered was that wit 327 nodes, condor_submit  
segfaults. Shortening the rank line fixes this problem, but if the  
expression is too long, I think condor_submit should print and error  
rather than just crashing.

Obviously the better solution is to get the sysadmin to change the  
NEGOTIATOR_POST_JOB_RANK to do different things for parallel vs  
standard/vanilla jobs, and Stuart tells me that he's discussed this  
with you.

I then tried changing the rank to just

rank = (40 - 2 * TotalCondorLoadAvg)

to order by reverse condor load on a node. This apparently does what  
I want: jobs go to machines with a total condor load average of 0  
first and fill up depth first. I don't understand why this happens,  
however. If I understand the documentation correctly, after my submit  
file rank is evaluated, all machines with a load average of 0 should  
have the same rank. On an empty cluster, every machine has the same  
rank, so I would have expected NEGOTIATOR_POST_JOB_RANK to be used to  
break the degeneracy (which is why I tried the huge rank expression  
in the first place: to make sure all machines were ranked before  
NEGOTIATOR_POST_JOB_RANK is invoked). This doesn't appear to happen,  
however, which is good for my jobs, but confusing. Any ideas?

Also, I think it would be useful if the Starter advertized the number  
of claimed vms on a node in the classadd for each vm i.e. something  
like TotalCpuBusy (similar to TotalCondorLoadAvg). Using  
TotalCondorLoadAvg in the rank can be fooled by jobs that claim a  
load, but don't produce significant load.

Finally, I think a -parallel option to condor_run would be useful.  
Currently condor_q -run only shows the zeroth node in each parallel job

[dbrown__AT__ldas-grid.ligo Run]$ condor_q -run dbrown

-- Quill: citquill@ligo : <10.14.0.12:5432> : citquill_db
ID      OWNER            SUBMITTED     RUN_TIME HOST(S)
6414747.0   dbrown          8/22 09:41   0+01:47:54 vm3__AT__node119.ldas- 
cit.ligo.caltech.edu
6414748.0   dbrown          8/22 09:56   0+01:31:28 vm3__AT__node231.ldas- 
cit.ligo.caltech.edu
6414749.0   dbrown          8/22 09:58   0+01:30:53 vm3__AT__node257.ldas- 
cit.ligo.caltech.edu
6414750.0   dbrown          8/22 09:58   0+01:30:18 vm1__AT__node57.ldas- 
cit.ligo.caltech.edu
6414751.0   dbrown          8/22 10:02   0+01:25:19 vm1__AT__node258.ldas- 
cit.ligo.caltech.edu
6414827.0   dbrown          8/22 11:00   0+00:27:19 vm1__AT__node26.ldas- 
cit.ligo.caltech.edu

I think it would be useful if it could also show the nodes that make  
up the parallel job, even better if in a tree format, like -dag, i.e  
something like:

[dbrown__AT__ldas-grid.ligo Run]$ condor_q -parallel -dag -run

-- Quill: citquill@ligo : <10.14.0.12:5432> : citquill_db
ID      OWNER/NODENAME   SUBMITTED     RUN_TIME HOST(S)
6387664.0   lindy           8/19 21:30   2+14:00:22 ldas- 
grid.ligo.caltech.edu
6396883.0   arogan          8/20 15:11   1+20:19:14 ldas- 
grid.ligo.caltech.edu
6397751.0   dietz           8/21 03:49   1+07:41:58 ldas- 
grid.ligo.caltech.edu
6397752.0    |-55c0a28450c  8/21 03:49   1+07:41:38 vm1__AT__node96.ldas- 
cit.ligo.caltech.edu
6397753.0    |-9c558ab445f  8/21 03:49   1+07:41:36 vm3__AT__node96.ldas- 
cit.ligo.caltech.edu
6397754.0    |-4391edcc9fb  8/21 03:49   1+07:41:36 vm1__AT__node53.ldas- 
cit.ligo.caltech.edu
6414747.0   dbrown          8/22 09:41   0+01:49:32 vm3__AT__node119.ldas- 
cit.ligo.caltech.edu
6414747.0   dbrown          8/22 09:41   0+01:49:32 vm3__AT__node119.ldas- 
cit.ligo.caltech.edu
6414747.0    |-subproc1     8/22 09:41   0+01:49:32 vm2__AT__node119.ldas- 
cit.ligo.caltech.edu
6414747.0    |-subproc2     8/22 09:41   0+01:49:32 vm1__AT__node119.ldas- 
cit.ligo.caltech.edu

Cheers,
Duncan.

-- 

Duncan Brown                                 California Institute of  
Technology
Tapir: (626) 395 8409                         MS 18-34, Pasadena, CA  
91125, USA
LIGO:  (626) 395 8812                 http://www.lsc- 
group.phys.uwm.edu/~duncan



===========================================================================
Date of creation: Tue Aug 22 13:51:21 2006 (1156272684)
Subject: Actions

Assigned to tannenba by akarp
===========================================================================
Date of actions: Tue Aug 22 15:01:11 2006 (1156276871)