LIGO Support Ticket 14006

Ticket Information
  Number:      admin 14006
  User:        dbrown@ligo.caltech.edu
  Email:       anderson__AT__ligo.caltech.edu
  Status:      open
  Assigned To: nleroy
CC: Alain Roy <roy__AT__cs.wisc.edu>, Stuart Anderson <anderson__AT__ligo.caltech.edu>
From: Duncan Brown <dbrown__AT__ligo.caltech.edu>
Subject: LIGO: append to stdout/err files on re-execution
Date: Sat, 15 Jul 2006 21:50:04 -0700
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>

Hi,

I have a job that I only allow to leave the queue if it exits with  
status code 0 or a signal by using

on_exit_remove = (ExitBySignal == True) || (ExitCode == 0)

in my submit file. This works as expected and the job is re-queued if  
it exits with a non-zero code, however the stdout and stderr files  
are truncated when the job is rerun. Is there any way to get condor  
to append to these files? My submit file is

universe = parallel
executable = /archive/home/dbrown/bin/condor_mpirun
arguments = --verbose SpEC
log = cluster$(CLUSTER).proc$(PROCESS).subproc0.log
output = cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).out
error = cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).err
append_files = cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).out,  
cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).err
machine_count = 28
on_exit_remove = (ExitBySignal == True) || (ExitCode != 143)
queue

but the append_files line does not appear to do what I want.

Cheers,
Duncan.

-- 

Duncan Brown                                 California Institute of  
Technology
Tapir: (626) 395 8409                         MS 18-34, Pasadena, CA  
91125, USA
LIGO:  (626) 395 8812                 http://www.lsc- 
group.phys.uwm.edu/~duncan



===========================================================================
Date of creation: Sat Jul 15 23:57:47 2006 (1153025870)
Subject: Actions

Assigned to nleroy by nleroy
===========================================================================
Date of actions: Wed Jul 19 13:32:00 2006 (1153336712)
From: Nick LeRoy <nleroy__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #14006] LIGO: append to stdout/err files on    
 re-execution
Date: Wed, 19 Jul 2006 14:44:56 -0500

> Hi,
Hello Duncan,

> I have a job that I only allow to leave the queue if it exits with
> status code 0 or a signal by using
>
> on_exit_remove = (ExitBySignal == True) || (ExitCode == 0)
>
> in my submit file. This works as expected and the job is re-queued if
> it exits with a non-zero code, however the stdout and stderr files
> are truncated when the job is rerun. Is there any way to get condor
> to append to these files?

I can certainly see where this would be useful, but there's no way to do it 
directly with Condor.  You should be able to accomplish the same behavior by 
submitting a wrapper shell script as your executable, however...  This is 
just off the top of my head, I haven't tried it, but, how about something 
like this:

In your submit file:

Executable = my_wrapper
Transfer_Input_Files = job.out, job.err, my_executable
Transfer_Output_files = job.out, job.err
# Don't specify these:
# Output =
# Error =

Wrapper script:
#! /bin/sh
./my_executable 2>>job.err >>job.out


> My submit file is 
<snip>

> but the append_files line does not appear to do what I want.

If you carefully read the entry on append_files, you'll see that it's only for 
standard universe.

I hope this helps

-Nick

-- 
           <<< Knock, knock, Neo. >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy__AT__cs.wisc.edu              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences

===========================================================================
Date mail was appended: Wed Jul 19 14:45:02 2006 (1153338303)
Subject: Actions

Ticket resolved by nleroy
===========================================================================
Date of actions: Wed Jul 19 14:45:02 2006 (1153338305)
Subject: Actions

Ticket was reopened by mailnull
===========================================================================
Date of actions: Fri Jul 21 16:10:32 2006 (1153516232)
CC: anderson__AT__ligo.caltech.edu
From: Duncan Brown <dbrown__AT__ligo.caltech.edu>
Subject: Re: [condor-admin #14006] LIGO: append to stdout/err files on    
 re-execution
Date: Fri, 21 Jul 2006 14:02:55 -0700
To: condor-admin__AT__cs.wisc.edu

Hi Nick,

Thanks for the suggestion. It would be a nice feature it condor could  
take care of this internally.

Cheers,
Duncan.

On Jul 19, 2006, at 1:45 PM, condor-admin response tracking system  
wrote:

>
>> Hi,
> Hello Duncan,
>
>> I have a job that I only allow to leave the queue if it exits with
>> status code 0 or a signal by using
>>
>> on_exit_remove = (ExitBySignal == True) || (ExitCode == 0)
>>
>> in my submit file. This works as expected and the job is re-queued if
>> it exits with a non-zero code, however the stdout and stderr files
>> are truncated when the job is rerun. Is there any way to get condor
>> to append to these files?
>
> I can certainly see where this would be useful, but there's no way  
> to do it
> directly with Condor.  You should be able to accomplish the same  
> behavior by
> submitting a wrapper shell script as your executable, however...   
> This is
> just off the top of my head, I haven't tried it, but, how about  
> something
> like this:
>
> In your submit file:
>
> Executable = my_wrapper
> Transfer_Input_Files = job.out, job.err, my_executable
> Transfer_Output_files = job.out, job.err
> # Don't specify these:
> # Output =
> # Error =
>
> Wrapper script:
> #! /bin/sh
> ./my_executable 2>>job.err >>job.out
>
>
>> My submit file is
> <snip>
>
>> but the append_files line does not appear to do what I want.
>
> If you carefully read the entry on append_files, you'll see that  
> it's only for
> standard universe.
>
> I hope this helps
>
> -Nick
>
> -- 
>            <<< Knock, knock, Neo. >>>
>  /`-_    Nicholas R. LeRoy               The Condor Project
> {     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
>  \    /  nleroy__AT__cs.wisc.edu              The University of Wisconsin
>  |_*_|   608-265-5761                    Department of Computer  
> Sciences
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Nick LeRoy <nleroy__AT__cs.wisc.edu>
> * Ticket Email List: dbrown__AT__ligo.caltech.edu,  
> anderson__AT__ligo.caltech.edu
>
> -- 
> ======================================================================
> This mail was sent from the RUST Mail System
> This ticket has been resolved.
> ======================================================================
>

-- 

Duncan Brown                                 California Institute of  
Technology
Tapir: (626) 395 8409                         MS 18-34, Pasadena, CA  
91125, USA
LIGO:  (626) 395 8812                 http://www.lsc- 
group.phys.uwm.edu/~duncan



===========================================================================
Date mail was appended: Fri Jul 21 16:10:32 2006 (1153516232)