LIGO Support Ticket 19162

Ticket Information
  Number:      admin 19162
  User:        anderson@ligo.caltech.edu
  Email:       dabrown__AT__physics.syr.edu,channa__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,dmoraru__AT__ligo-wa.caltech.edu
  Status:      open
  Assigned To: psilord
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: LIGO: confusion over copy_to_spool
Date: Tue, 31 Mar 2009 21:33:30 -0700
CC: Duncan Brown <dabrown__AT__physics.syr.edu>,        chad hanna
 <channa__AT__ligo.caltech.edu>
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

Running Condor version,

$ condor_version
$CondorVersion: 7.2.1 Feb 19 2009 BuildID: 133382 $
$CondorPlatform: X86_64-LINUX_RHEL3 $

I am confused over the copy_to_spool setting. In particular, I have  
set a pool default value of Truce in the main condor_config file via,

WantRemoteIO = False
WantBadgers = True
Notification = Error
JOB_LEASE_DURATION = 3600
copy_to_spool = True
SUBMIT_EXPRS = WantRemoteIO, Notification, WantBadgers,  
JOB_LEASE_DURATION, copy_to_spool

However, when I set this to False in the submit file for a simple test  
job,

$ cat mysleep.sub
universe = local
executable = mysleep.run
initialdir = /archive/home/anderson
notification = NEVER
log = mysleep.log
output = mysleep.out
error = mysleep.err
getenv = True
copy_to_spool = False
queue

and run this with the command,

$ condor_submit mysleep.sub

I then observe,

# condor_q -long anderson  | grep -i copy_to_spool
copy_to_spool = TRUE


Shouldn't the .sub value take precedence? Is there any chance this is  
just a reporting bug and the .sub value is actually honored?

Thanks.


--
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson




===========================================================================
Date of creation: Tue Mar 31 23:33:49 2009 (1238560432)
Subject: Actions

Assigned to psilord by zmiller
===========================================================================
Date of actions: Wed Apr  1 12:51:56 2009 (1238608316)
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #19162] LIGO: confusion over copy_to_spool
Date: Fri, 22 May 2009 13:14:32 -0700
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu

This still seems to be the case with,

$ condor_version
$CondorVersion: 7.2.3 May 11 2009 BuildID: 151729 $
$CondorPlatform: X86_64-LINUX_RHEL5 $

Thanks.


--
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson




===========================================================================
Date mail was appended: Fri May 22 15:14:53 2009 (1243023293)
Date: Fri, 22 May 2009 15:17:54 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
CC: condor-admin__AT__cs.wisc.edu, "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #19162] LIGO: confusion over copy_to_spool

On Fri, 22 May 2009, Stuart Anderson wrote:

> This still seems to be the case with,
>
> $ condor_version
> $CondorVersion: 7.2.3 May 11 2009 BuildID: 151729 $
> $CondorPlatform: X86_64-LINUX_RHEL5 $

Hmm, maybe this wasn't exactly the same as the problem I was thinking of, 
then.  It was couple months ago, I think, so my memory is a bit hazy.

Kent

===========================================================================
Date mail was appended: Fri May 22 15:18:02 2009 (1243023482)
Date: Tue, 7 Jul 2009 16:47:02 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: zmiller <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #19162] LIGO: confusion over copy_to_spool

Hello,

We've figured out this scenario and wrote up an explanation which, for those
of you with condor wiki access, can be found here:

http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=557

The fundament of the issue is that things added to SUBMIT_EXPRS simply
become attributes placed into the job ad and don't pass through the
machinery of condor_submit like how copy_to_spool being read directly
from the submit file would (in which case it alters the behavior of
condor_submit).

In essence, the default of copy_to_spool cannot be altered by putting something
into SUBMIT_EXPRS. The default case for copy_to_spool is TRUE for stduniv
jobs and FALSE for everything else. So, because you added copy_to_spool
to SUBMIT_EXPRS, it appears in the job ad, but actually has no effect on
anything. You actually need a copy_to_spool specified in the submit file
for a policy to be set.

So, I would suppose there is no direct bug here, just somewhat confusing
behavior.

Thank you.

-pete

===========================================================================
Date mail was appended: Tue Jul  7 16:47:07 2009 (1247003228)
CC: Duncan Brown <dabrown__AT__physics.syr.edu>,        chad hanna
 <channa__AT__ligo.caltech.edu>,        Scott Koranda
 <skoranda__AT__gravity.phys.uwm.edu>,        Dan Moraru
 <dmoraru__AT__ligo-wa.caltech.edu>
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #19162] LIGO: confusion over copy_to_spool
Date: Sat, 11 Jul 2009 11:23:22 -0700
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

Pete,
	Thanks for the detailed explanation. Unfortunately, I believe this  
means Condor cannot currently do what I want it to do so please  
consider the following RFE:

* It should be possible to override the default value for the command  
copy_to_spool in the main condor config file for a given pool, while  
retaining the ability for users to change the value if desired for  
individual jobs

as a secondary request:

* A job classad should always reflect the actual state of commands  
when there state is different than the out-of-the-box default, or  
perhaps anytime a config file or user has requested an explicit state  
regardless of the out-of-the-box default value. Put another way, at  
some level condor should enforce a "tell no lie" policy.

Note, the first RFE is much more important than the second one. The  
motivation for this, is to prevent a novice user from making the  
following mistake:

* Compile application
* condor_submit (using the default settings for copy_to_spool)
* observe a problem with some jobs and recompile the application while  
other instances are still running.

Given our use of "USE_NFS = True" and the default setting of  
copy_to_spool = False for the default universe (Vanilla) this will  
result in unpredictable results--but often bus error or similar as the  
execute machines page in new execute bits while jobs are running.

Thanks.


On Jul 7, 2009, at 2:47 PM, condor-admin response tracking system wrote:

> Hello,
>
> We've figured out this scenario and wrote up an explanation which,  
> for those
> of you with condor wiki access, can be found here:
>
> http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=557
>
> The fundament of the issue is that things added to SUBMIT_EXPRS simply
> become attributes placed into the job ad and don't pass through the
> machinery of condor_submit like how copy_to_spool being read directly
> from the submit file would (in which case it alters the behavior of
> condor_submit).
>
> In essence, the default of copy_to_spool cannot be altered by  
> putting something
> into SUBMIT_EXPRS. The default case for copy_to_spool is TRUE for  
> stduniv
> jobs and FALSE for everything else. So, because you added  
> copy_to_spool
> to SUBMIT_EXPRS, it appears in the job ad, but actually has no  
> effect on
> anything. You actually need a copy_to_spool specified in the submit  
> file
> for a policy to be set.
>
> So, I would suppose there is no direct bug here, just somewhat  
> confusing
> behavior.
>
> Thank you.
>
> -pete
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: anderson__AT__ligo.caltech.edu, dabrown__AT__physics.syr.edu 
> ,channa__AT__ligo.caltech.edu
>

--
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson




===========================================================================
Date mail was appended: Sat Jul 11 13:23:42 2009 (1247336623)
Subject: Comments added

Moved to worry about in the 7.5 series.

Comments added by psilord

===========================================================================
Date comments were added: Fri Sep 25 13:27:43 2009 (1253903263)
Subject: Comments added

If I recall correctly, we're also waiting for a real use case where a problem with
Hrm, my mistake, ignore this comment!

Comments added by psilord

===========================================================================
Date comments were added: Fri Oct  9 12:00:01 2009 (1255107601)