LIGO Support Ticket 1699
Ticket Information
Number: support 1699
User: anderson@ligo.caltech.edu
Email: espinoza_e__AT__ligo.caltech.edu
Status: resolved
Assigned To: adesmet
Date: Sat, 21 Oct 2006 11:46:12 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support__AT__cs.wisc.edu, roy__AT__cs.wisc.edu
CC: Erik Espinoza <espinoza_e__AT__ligo.caltech.edu>
Subject: LIGO error opening standard file
The LIGO Caltech Condor pool running,
> condor_version
$CondorVersion: 6.8.2 Oct 12 2006 $
$CondorPlatform: X86_64-LINUX_RHEL3 $
is generating lots of the following ShadowLog error messages:
10/21 11:29:21 (8310706.0) (17655):error: Error: Couldn't open standard file '/dev/null'
However, this error message may be in error itself. Please consider
the simple patch below to remote_startup.c. While it has not been
compiled/tested it appears to be fairly obvious from inspection.
It is surprising that the protopye definition,
void _condor_error_retry( const char *format, ... )
did not catch a call with an argument type of char **.
More importantly, do you have any suggestions as to why
open_std_files() might be failing for Shadow processes?
Thanks.
--- condor-6.8.2/src/condor_syscall_lib/remote_startup.c.orig 2006-10-21 11:20:29.000000000 -0700
+++ condor-6.8.2/src/condor_syscall_lib/remote_startup.c 2006-10-21 11:22:34.000000000 -0700
@@ -806,7 +806,7 @@
}
new_fd = open(logical_name[fd],flags,0);
if(new_fd<0) {
- _condor_error_retry("Couldn't open standard file '%s'", logical_name );
+ _condor_error_retry("Couldn't open standard file '%s'", logical_name[fd] );
}
if(new_fd!=fd) {
dup2(fd,new_fd);
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date of creation: Sat Oct 21 13:47:24 2006 (1161456447)
Subject: Actions
Assigned to adesmet by adesmet
===========================================================================
Date of actions: Mon Oct 23 11:31:44 2006 (1161621104)
Date: Mon, 23 Oct 2006 12:15:29 -0500
From: Alan De Smet <adesmet__AT__cs.wisc.edu>
To: adesmet <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #1699] LIGO error opening standard file
> However, this error message may be in error itself. Please consider
> the simple patch below to remote_startup.c.
That does look suspiciously wrong and could certainly lead to the
error message reporting the wrong file. I'll give it a quick
test and add your fix to Condor, probably 6.8.3
> More importantly, do you have any suggestions as to why
> open_std_files() might be failing for Shadow processes?
Check what the job's input, output, and error paths are. I'm
guessing that one or more of them are not accessible (the input
may not be readable, the output or error might not be creatable).
The error is actually from the standard universe component
compiled into your job. The Shadow is simply reporting the error
on its behalf. Of course, when your job attempts to open those
files, the request goes back to the Shadow.
===========================================================================
Date mail was appended: Mon Oct 23 12:15:30 2006 (1161623731)
Subject: Actions
Ticket resolved by adesmet
===========================================================================
Date of actions: Tue Oct 24 14:55:18 2006 (1161719721)