LIGO Support Ticket 14006
Ticket Information
Number: admin 14006
User: dbrown@ligo.caltech.edu
Email: anderson__AT__ligo.caltech.edu
Status: open
Assigned To: nleroy
CC: Alain Roy <roy__AT__cs.wisc.edu>, Stuart Anderson <anderson__AT__ligo.caltech.edu>
From: Duncan Brown <dbrown__AT__ligo.caltech.edu>
Subject: LIGO: append to stdout/err files on re-execution
Date: Sat, 15 Jul 2006 21:50:04 -0700
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Hi,
I have a job that I only allow to leave the queue if it exits with
status code 0 or a signal by using
on_exit_remove = (ExitBySignal == True) || (ExitCode == 0)
in my submit file. This works as expected and the job is re-queued if
it exits with a non-zero code, however the stdout and stderr files
are truncated when the job is rerun. Is there any way to get condor
to append to these files? My submit file is
universe = parallel
executable = /archive/home/dbrown/bin/condor_mpirun
arguments = --verbose SpEC
log = cluster$(CLUSTER).proc$(PROCESS).subproc0.log
output = cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).out
error = cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).err
append_files = cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).out,
cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).err
machine_count = 28
on_exit_remove = (ExitBySignal == True) || (ExitCode != 143)
queue
but the append_files line does not appear to do what I want.
Cheers,
Duncan.
--
Duncan Brown California Institute of
Technology
Tapir: (626) 395 8409 MS 18-34, Pasadena, CA
91125, USA
LIGO: (626) 395 8812 http://www.lsc-
group.phys.uwm.edu/~duncan
===========================================================================
Date of creation: Sat Jul 15 23:57:47 2006 (1153025870)
Subject: Actions
Assigned to nleroy by nleroy
===========================================================================
Date of actions: Wed Jul 19 13:32:00 2006 (1153336712)
From: Nick LeRoy <nleroy__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #14006] LIGO: append to stdout/err files on
re-execution
Date: Wed, 19 Jul 2006 14:44:56 -0500
> Hi,
Hello Duncan,
> I have a job that I only allow to leave the queue if it exits with
> status code 0 or a signal by using
>
> on_exit_remove = (ExitBySignal == True) || (ExitCode == 0)
>
> in my submit file. This works as expected and the job is re-queued if
> it exits with a non-zero code, however the stdout and stderr files
> are truncated when the job is rerun. Is there any way to get condor
> to append to these files?
I can certainly see where this would be useful, but there's no way to do it
directly with Condor. You should be able to accomplish the same behavior by
submitting a wrapper shell script as your executable, however... This is
just off the top of my head, I haven't tried it, but, how about something
like this:
In your submit file:
Executable = my_wrapper
Transfer_Input_Files = job.out, job.err, my_executable
Transfer_Output_files = job.out, job.err
# Don't specify these:
# Output =
# Error =
Wrapper script:
#! /bin/sh
./my_executable 2>>job.err >>job.out
> My submit file is
<snip>
> but the append_files line does not appear to do what I want.
If you carefully read the entry on append_files, you'll see that it's only for
standard universe.
I hope this helps
-Nick
--
<<< Knock, knock, Neo. >>>
/`-_ Nicholas R. LeRoy The Condor Project
{ }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
\ / nleroy__AT__cs.wisc.edu The University of Wisconsin
|_*_| 608-265-5761 Department of Computer Sciences
===========================================================================
Date mail was appended: Wed Jul 19 14:45:02 2006 (1153338303)
Subject: Actions
Ticket resolved by nleroy
===========================================================================
Date of actions: Wed Jul 19 14:45:02 2006 (1153338305)
Subject: Actions
Ticket was reopened by mailnull
===========================================================================
Date of actions: Fri Jul 21 16:10:32 2006 (1153516232)
CC: anderson__AT__ligo.caltech.edu
From: Duncan Brown <dbrown__AT__ligo.caltech.edu>
Subject: Re: [condor-admin #14006] LIGO: append to stdout/err files on
re-execution
Date: Fri, 21 Jul 2006 14:02:55 -0700
To: condor-admin__AT__cs.wisc.edu
Hi Nick,
Thanks for the suggestion. It would be a nice feature it condor could
take care of this internally.
Cheers,
Duncan.
On Jul 19, 2006, at 1:45 PM, condor-admin response tracking system
wrote:
>
>> Hi,
> Hello Duncan,
>
>> I have a job that I only allow to leave the queue if it exits with
>> status code 0 or a signal by using
>>
>> on_exit_remove = (ExitBySignal == True) || (ExitCode == 0)
>>
>> in my submit file. This works as expected and the job is re-queued if
>> it exits with a non-zero code, however the stdout and stderr files
>> are truncated when the job is rerun. Is there any way to get condor
>> to append to these files?
>
> I can certainly see where this would be useful, but there's no way
> to do it
> directly with Condor. You should be able to accomplish the same
> behavior by
> submitting a wrapper shell script as your executable, however...
> This is
> just off the top of my head, I haven't tried it, but, how about
> something
> like this:
>
> In your submit file:
>
> Executable = my_wrapper
> Transfer_Input_Files = job.out, job.err, my_executable
> Transfer_Output_files = job.out, job.err
> # Don't specify these:
> # Output =
> # Error =
>
> Wrapper script:
> #! /bin/sh
> ./my_executable 2>>job.err >>job.out
>
>
>> My submit file is
> <snip>
>
>> but the append_files line does not appear to do what I want.
>
> If you carefully read the entry on append_files, you'll see that
> it's only for
> standard universe.
>
> I hope this helps
>
> -Nick
>
> --
> <<< Knock, knock, Neo. >>>
> /`-_ Nicholas R. LeRoy The Condor Project
> { }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
> \ / nleroy__AT__cs.wisc.edu The University of Wisconsin
> |_*_| 608-265-5761 Department of Computer
> Sciences
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Nick LeRoy <nleroy__AT__cs.wisc.edu>
> * Ticket Email List: dbrown__AT__ligo.caltech.edu,
> anderson__AT__ligo.caltech.edu
>
> --
> ======================================================================
> This mail was sent from the RUST Mail System
> This ticket has been resolved.
> ======================================================================
>
--
Duncan Brown California Institute of
Technology
Tapir: (626) 395 8409 MS 18-34, Pasadena, CA
91125, USA
LIGO: (626) 395 8812 http://www.lsc-
group.phys.uwm.edu/~duncan
===========================================================================
Date mail was appended: Fri Jul 21 16:10:32 2006 (1153516232)