LIGO Support Ticket 19162
Ticket Information
Number: admin 19162
User: anderson@ligo.caltech.edu
Email: dabrown__AT__physics.syr.edu,channa__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,dmoraru__AT__ligo-wa.caltech.edu
Status: open
Assigned To: psilord
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: LIGO: confusion over copy_to_spool
Date: Tue, 31 Mar 2009 21:33:30 -0700
CC: Duncan Brown <dabrown__AT__physics.syr.edu>, chad hanna
<channa__AT__ligo.caltech.edu>
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Running Condor version,
$ condor_version
$CondorVersion: 7.2.1 Feb 19 2009 BuildID: 133382 $
$CondorPlatform: X86_64-LINUX_RHEL3 $
I am confused over the copy_to_spool setting. In particular, I have
set a pool default value of Truce in the main condor_config file via,
WantRemoteIO = False
WantBadgers = True
Notification = Error
JOB_LEASE_DURATION = 3600
copy_to_spool = True
SUBMIT_EXPRS = WantRemoteIO, Notification, WantBadgers,
JOB_LEASE_DURATION, copy_to_spool
However, when I set this to False in the submit file for a simple test
job,
$ cat mysleep.sub
universe = local
executable = mysleep.run
initialdir = /archive/home/anderson
notification = NEVER
log = mysleep.log
output = mysleep.out
error = mysleep.err
getenv = True
copy_to_spool = False
queue
and run this with the command,
$ condor_submit mysleep.sub
I then observe,
# condor_q -long anderson | grep -i copy_to_spool
copy_to_spool = TRUE
Shouldn't the .sub value take precedence? Is there any chance this is
just a reporting bug and the .sub value is actually honored?
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date of creation: Tue Mar 31 23:33:49 2009 (1238560432)
Subject: Actions
Assigned to psilord by zmiller
===========================================================================
Date of actions: Wed Apr 1 12:51:56 2009 (1238608316)
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #19162] LIGO: confusion over copy_to_spool
Date: Fri, 22 May 2009 13:14:32 -0700
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
This still seems to be the case with,
$ condor_version
$CondorVersion: 7.2.3 May 11 2009 BuildID: 151729 $
$CondorPlatform: X86_64-LINUX_RHEL5 $
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Fri May 22 15:14:53 2009 (1243023293)
Date: Fri, 22 May 2009 15:17:54 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
CC: condor-admin__AT__cs.wisc.edu, "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #19162] LIGO: confusion over copy_to_spool
On Fri, 22 May 2009, Stuart Anderson wrote:
> This still seems to be the case with,
>
> $ condor_version
> $CondorVersion: 7.2.3 May 11 2009 BuildID: 151729 $
> $CondorPlatform: X86_64-LINUX_RHEL5 $
Hmm, maybe this wasn't exactly the same as the problem I was thinking of,
then. It was couple months ago, I think, so my memory is a bit hazy.
Kent
===========================================================================
Date mail was appended: Fri May 22 15:18:02 2009 (1243023482)
Date: Tue, 7 Jul 2009 16:47:02 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: zmiller <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #19162] LIGO: confusion over copy_to_spool
Hello,
We've figured out this scenario and wrote up an explanation which, for those
of you with condor wiki access, can be found here:
http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=557
The fundament of the issue is that things added to SUBMIT_EXPRS simply
become attributes placed into the job ad and don't pass through the
machinery of condor_submit like how copy_to_spool being read directly
from the submit file would (in which case it alters the behavior of
condor_submit).
In essence, the default of copy_to_spool cannot be altered by putting something
into SUBMIT_EXPRS. The default case for copy_to_spool is TRUE for stduniv
jobs and FALSE for everything else. So, because you added copy_to_spool
to SUBMIT_EXPRS, it appears in the job ad, but actually has no effect on
anything. You actually need a copy_to_spool specified in the submit file
for a policy to be set.
So, I would suppose there is no direct bug here, just somewhat confusing
behavior.
Thank you.
-pete
===========================================================================
Date mail was appended: Tue Jul 7 16:47:07 2009 (1247003228)
CC: Duncan Brown <dabrown__AT__physics.syr.edu>, chad hanna
<channa__AT__ligo.caltech.edu>, Scott Koranda
<skoranda__AT__gravity.phys.uwm.edu>, Dan Moraru
<dmoraru__AT__ligo-wa.caltech.edu>
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #19162] LIGO: confusion over copy_to_spool
Date: Sat, 11 Jul 2009 11:23:22 -0700
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Pete,
Thanks for the detailed explanation. Unfortunately, I believe this
means Condor cannot currently do what I want it to do so please
consider the following RFE:
* It should be possible to override the default value for the command
copy_to_spool in the main condor config file for a given pool, while
retaining the ability for users to change the value if desired for
individual jobs
as a secondary request:
* A job classad should always reflect the actual state of commands
when there state is different than the out-of-the-box default, or
perhaps anytime a config file or user has requested an explicit state
regardless of the out-of-the-box default value. Put another way, at
some level condor should enforce a "tell no lie" policy.
Note, the first RFE is much more important than the second one. The
motivation for this, is to prevent a novice user from making the
following mistake:
* Compile application
* condor_submit (using the default settings for copy_to_spool)
* observe a problem with some jobs and recompile the application while
other instances are still running.
Given our use of "USE_NFS = True" and the default setting of
copy_to_spool = False for the default universe (Vanilla) this will
result in unpredictable results--but often bus error or similar as the
execute machines page in new execute bits while jobs are running.
Thanks.
On Jul 7, 2009, at 2:47 PM, condor-admin response tracking system wrote:
> Hello,
>
> We've figured out this scenario and wrote up an explanation which,
> for those
> of you with condor wiki access, can be found here:
>
> http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=557
>
> The fundament of the issue is that things added to SUBMIT_EXPRS simply
> become attributes placed into the job ad and don't pass through the
> machinery of condor_submit like how copy_to_spool being read directly
> from the submit file would (in which case it alters the behavior of
> condor_submit).
>
> In essence, the default of copy_to_spool cannot be altered by
> putting something
> into SUBMIT_EXPRS. The default case for copy_to_spool is TRUE for
> stduniv
> jobs and FALSE for everything else. So, because you added
> copy_to_spool
> to SUBMIT_EXPRS, it appears in the job ad, but actually has no
> effect on
> anything. You actually need a copy_to_spool specified in the submit
> file
> for a policy to be set.
>
> So, I would suppose there is no direct bug here, just somewhat
> confusing
> behavior.
>
> Thank you.
>
> -pete
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: anderson__AT__ligo.caltech.edu, dabrown__AT__physics.syr.edu
> ,channa__AT__ligo.caltech.edu
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Sat Jul 11 13:23:42 2009 (1247336623)
Subject: Comments added
Moved to worry about in the 7.5 series.
Comments added by psilord
===========================================================================
Date comments were added: Fri Sep 25 13:27:43 2009 (1253903263)
Subject: Comments added
If I recall correctly, we're also waiting for a real use case where a problem with
Hrm, my mistake, ignore this comment!
Comments added by psilord
===========================================================================
Date comments were added: Fri Oct 9 12:00:01 2009 (1255107601)