LIGO Support Ticket 14493
Ticket Information
Number: admin 14493
User: anderson@ligo.caltech.edu
Email:
Status: feature
Assigned To: wenger
Date: Mon, 20 Nov 2006 17:26:20 -0800
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: LIGO condor_hold/DAGMan enahcment request
This is a request to consider enhancing condor_hold when operating on a
DAGMan job to optionally hold all of the currently running job nodes as well
as the DAGMan job itself. Currently, condor_hold allows all currently running
nodes in the DAG to continue to run.
This is a low priority request since this can apparently already be done with
just two condor commands:
condor_hold jobid
condor_hold -const 'DAGManJobId==jobid'
However, there may be sufficient value added by combining these into a single
command, e.g., enforcing the proper order of these two steps, or adding a
necessary(?) synchronization between them to make sure no new jobs are
scheduled by the dag during the hold process itself.
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date of creation: Mon Nov 20 19:26:57 2006 (1164072419)
Subject: Actions
Assigned to wenger by khahn
===========================================================================
Date of actions: Wed Nov 22 10:35:47 2006 (1164213347)
Date: Wed, 22 Nov 2006 11:12:12 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: khahn <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #14493] LIGO condor_hold/DAGMan enahcment request
Stuart,
> This is a request to consider enhancing condor_hold when operating on a
> DAGMan job to optionally hold all of the currently running job nodes as well
> as the DAGMan job itself. Currently, condor_hold allows all currently running
> nodes in the DAG to continue to run.
>
> This is a low priority request since this can apparently already be done with
> just two condor commands:
>
> condor_hold jobid
> condor_hold -const 'DAGManJobId==jobid'
>
> However, there may be sufficient value added by combining these into a single
> command, e.g., enforcing the proper order of these two steps, or adding a
> necessary(?) synchronization between them to make sure no new jobs are
> scheduled by the dag during the hold process itself.
Okay, I've generated a problem report for this (776).
I'm also going to mark this ticket as a feature request.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Wed Nov 22 11:12:15 2006 (1164215535)
Subject: Actions
Status changed from open to feature by wenger
===========================================================================
Date of actions: Wed Nov 22 11:26:03 2006 (1164216363)
Date: Wed, 29 Nov 2006 11:32:45 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: khahn <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #14493] LIGO condor_hold/DAGMan enahcment request
Stuart,
> This is a request to consider enhancing condor_hold when operating on a
> DAGMan job to optionally hold all of the currently running job nodes as well
> as the DAGMan job itself. Currently, condor_hold allows all currently running
> nodes in the DAG to continue to run.
>
> This is a low priority request since this can apparently already be done with
> just two condor commands:
>
> condor_hold jobid
> condor_hold -const 'DAGManJobId==jobid'
>
> However, there may be sufficient value added by combining these into a single
> command, e.g., enforcing the proper order of these two steps, or adding a
> necessary(?) synchronization between them to make sure no new jobs are
> scheduled by the dag during the hold process itself.
For now, I'd suggest just making a very simple script, something like
this:
#! /bin/csh -f
condor_hold -const "DAGManJobId==$1 || ClusterId==$1"
I guess if you're worried about a race condition, you could hold the
DAGMan itself first, and then the node jobs:
#! /bin/csh -f
condor_hold $1
condor_hold -const DAGManJobId==$1
I think that's about as much synchronization as you'd need to worry
about.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Wed Nov 29 11:32:47 2006 (1164821567)