LIGO Support Ticket 17983
Ticket Information
Number: admin 17983
User: dabrown@physics.syr.edu
Email: anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu
Status: resolved
Assigned To: jfrey
CC: Stuart Anderson <anderson__AT__ligo.caltech.edu>, Scott Koranda
<skoranda__AT__gravity.phys.uwm.edu>
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: LIGO: stderr not captured in 7.0.1 standard universe
Date: Thu, 8 May 2008 09:56:51 -0400
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
Hi all,
I have been running a DAG containing standard universe jobs which
fail due to an error in the command line arguments. The jobs exit
with return code 1 and dagman correctly captures them as failures.
The stderr files are written, but are all empty. They should contain
the error message from the code:
-rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
inspiral-817566800-817568848-29160-0.err
-rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
inspiral-817566800-817568848-29160-0.out
-rw-r--r-- 1 dbrown sufaculty 0 May 7 16:30
inspiral-817568720-817570768-29175-0.err
-rw-r--r-- 1 dbrown sufaculty 0 May 7 16:30
inspiral-817568720-817570768-29175-0.out
-rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
inspiral-817570640-817572688-29161-0.err
-rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
inspiral-817570640-817572688-29161-0.out
-rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
inspiral-817572560-817574608-29163-0.err
When I run the code on the command line, I see the error message
correctly:
[dbrown@sugar tag]$ ./lalapps_inspiral --trig-end-time 0 --cluster-
method window --dynamic-range-exponent 69.0 --disable-rsq-veto --bank-
file H1-TMPLTBANK-817577671-2048.xml.gz --high-pass-order 8 --strain-
high-pass-order 8 --ifo-tag FIRST --gps-end-time 817579719 --
calibrated-data real_8 --channel-name H1:LSC-STRAIN --snr-threshold
5.5 --number-of-segments 15 --trig-start-time 0 --enable-high-pass
30.0 --debug-level 33 --gps-start-time 817577671 --enable-filter-inj-
only --high-pass-attenuation 0.1 --chisq-bins 0 --inverse-spec-length
16 --segment-length 1048576 --low-frequency-cutoff 40.0 --pad-data 8
--cluster-window 16 --sample-rate 4096 --chisq-threshold 10.0 --
resample-filter ldas --strain-high-pass-atten 0.1 --strain-high-pass-
freq 30.0 --segment-overlap 524288 --frame-cache cache/H-
H1_RDS_C03_L2-817577663-817592553.cache --chisq-delta 0.2 --bank-veto-
subbank-size 1 --approximant FindChirpSP --write-compress --enable-
output --order twoPN --spectrum-type median
Condor: Notice: Will checkpoint to ./lalapps_inspiral.ckpt
Condor: Notice: Remote system calls disabled.
./lalapps_inspiral: unrecognized option `--bank-veto-subbank-size'
Any idea why this is not being captured in the log file? stdout from
another job is:
-rw-r--r-- 1 dbrown sufaculty 0 May 7 21:15
tmpltbank-817745157-817747205-29812-0.err
-rw-r--r-- 1 dbrown sufaculty 31 May 7 22:23
tmpltbank-817745157-817747205-29812-0.out
Cheers,
Duncan.
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date of creation: Thu May 8 8:56:45 2008 (1210255008)
Subject: Actions
Assigned to jfrey by jfrey
===========================================================================
Date of actions: Fri May 9 9:39:59 2008 (1210343999)
From: Jaime Frey <jfrey__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #17983] LIGO: stderr not captured in 7.0.1
standard universe
Date: Fri, 9 May 2008 09:44:19 -0500
> I have been running a DAG containing standard universe jobs which
> fail due to an error in the command line arguments. The jobs exit
> with return code 1 and dagman correctly captures them as failures.
> The stderr files are written, but are all empty. They should contain
> the error message from the code:
>
> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
> inspiral-817566800-817568848-29160-0.err
> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
> inspiral-817566800-817568848-29160-0.out
> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:30
> inspiral-817568720-817570768-29175-0.err
> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:30
> inspiral-817568720-817570768-29175-0.out
> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
> inspiral-817570640-817572688-29161-0.err
> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
> inspiral-817570640-817572688-29161-0.out
> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
> inspiral-817572560-817574608-29163-0.err
>
> When I run the code on the command line, I see the error message
> correctly:
>
> [dbrown@sugar tag]$ ./lalapps_inspiral --trig-end-time 0 --cluster-
> method window --dynamic-range-exponent 69.0 --disable-rsq-veto --bank-
> file H1-TMPLTBANK-817577671-2048.xml.gz --high-pass-order 8 --strain-
> high-pass-order 8 --ifo-tag FIRST --gps-end-time 817579719 --
> calibrated-data real_8 --channel-name H1:LSC-STRAIN --snr-threshold
> 5.5 --number-of-segments 15 --trig-start-time 0 --enable-high-pass
> 30.0 --debug-level 33 --gps-start-time 817577671 --enable-filter-inj-
> only --high-pass-attenuation 0.1 --chisq-bins 0 --inverse-spec-length
> 16 --segment-length 1048576 --low-frequency-cutoff 40.0 --pad-data 8
> --cluster-window 16 --sample-rate 4096 --chisq-threshold 10.0 --
> resample-filter ldas --strain-high-pass-atten 0.1 --strain-high-pass-
> freq 30.0 --segment-overlap 524288 --frame-cache cache/H-
> H1_RDS_C03_L2-817577663-817592553.cache --chisq-delta 0.2 --bank-veto-
> subbank-size 1 --approximant FindChirpSP --write-compress --enable-
> output --order twoPN --spectrum-type median
> Condor: Notice: Will checkpoint to ./lalapps_inspiral.ckpt
> Condor: Notice: Remote system calls disabled.
> ./lalapps_inspiral: unrecognized option `--bank-veto-subbank-size'
>
> Any idea why this is not being captured in the log file? stdout from
> another job is:
>
> -rw-r--r-- 1 dbrown sufaculty 0 May 7 21:15
> tmpltbank-817745157-817747205-29812-0.err
> -rw-r--r-- 1 dbrown sufaculty 31 May 7 22:23
> tmpltbank-817745157-817747205-29812-0.out
Can you try condor_compile'ing the following program with the same
compiler as your application and submitting it to your Condor pool?
------------------
#include <stdio.h>
int main() {
fprintf( stderr, "I am stderr!\n" );
return 0;
}
------------------
Thanks and regards,
Jaime Frey
UW-Madison Condor Team
===========================================================================
Date mail was appended: Fri May 9 9:44:21 2008 (1210344262)
Subject: Actions
Status changed from open to pending by jfrey
===========================================================================
Date of actions: Fri May 9 9:44:21 2008 (1210344263)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #17983] LIGO: stderr not captured in 7.0.1
standard universe
Date: Thu, 15 May 2008 17:50:16 -0400
To: condor-admin__AT__cs.wisc.edu
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
Hi Jamie,
[dbrown@sugar stderr]$ cat stderr_test.c
#include <stdio.h>
int main() {
fprintf( stderr, "I am stderr!\n" );
return 0;
}
[dbrown@sugar stderr]$ condor_compile gcc -g -o stderr_test
stderr_test.c
LINKING FOR CONDOR : /usr/bin/ld -L/usr/lib64 -Bstatic --eh-frame-hdr
-m elf_x86_64 --hash-style=gnu -dynamic-linker /lib64/ld-linux-
x86-64.so.2 -o stderr_test /usr/lib64/condor_rt0.o /usr/lib64/crti.o /
usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtbeginT.o -L/usr/lib64 -L/usr/
lib/gcc/x86_64-redhat-linux/4.1.2 -L/usr/lib/gcc/x86_64-redhat-linux/
4.1.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64 -L/
lib/../lib64 -L/usr/lib/../lib64 /tmp/cc0ez2X0.o /usr/lib64/
libcondorsyscall.a /usr/lib64/libcondor_z.a /usr/lib64/libcomp_libstdc
++.a /usr/lib64/libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-
needed --no-as-needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
lcondor_resolv -lcondor_c /usr/lib64/libcomp_libgcc.a /usr/lib64/
libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/x86_64-
redhat-linux/4.1.2/crtend.o /usr/lib64/crtn.o
/usr/lib64/libcondorsyscall.a(condor_file_agent.o): In function
`CondorFileAgent::open(char const*, int, int)':
/home/condor/execute/dir_15919/userdir/src/condor_ckpt/
condor_file_agent.C:106: warning: the use of `tmpnam' is dangerous,
better use `mkstemp'
/usr/lib64/libcondorsyscall.a(special_stubs.o): In function
`condor_gethostbyaddr':
/home/condor/execute/dir_15919/userdir/src/condor_syscall_lib/
special_stubs.C:200: warning: Using 'gethostbyaddr' in statically
linked applications requires at runtime the shared libraries from the
glibc version used for linking
/usr/lib64/libcondorsyscall.a(special_stubs.o): In function
`condor_gethostbyname':
/home/condor/execute/dir_15919/userdir/src/condor_syscall_lib/
special_stubs.C:193: warning: Using 'gethostbyname' in statically
linked applications requires at runtime the shared libraries from the
glibc version used for linking
/usr/lib64/libcondorsyscall.a(sock.o): In function
`Sock::getportbyserv(char*)':
/home/condor/execute/dir_15919/userdir/src/condor_io/sock.C:208:
warning: Using 'getservbyname' in statically linked applications
requires at runtime the shared libraries from the glibc version used
for linking
[dbrown@sugar stderr]$ ./stderr_test
Condor: Notice: Will checkpoint to ./stderr_test.ckpt
Condor: Notice: Remote system calls disabled.
I am stderr!
[dbrown@sugar stderr]$ cat stderr_test.sub
universe = standard
executable = /home/dbrown/projects/daswg/condor/stderr/stderr_test
error = stderr_test.err
output = stderr_test.out
log = stderr_test.log
queue
[dbrown@sugar stderr]$ condor_submit stderr_test.sub
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 40846.
[dbrown@sugar stderr]$ cat stderr_test.log
000 (40846.000.000) 05/15 17:08:11 Job submitted from host:
<10.20.1.23:58095>
...
001 (40846.000.000) 05/15 17:08:13 Job executing on host:
<10.20.2.33:42389>
...
005 (40846.000.000) 05/15 17:08:13 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
725 - Run Bytes Sent By Job
5683869 - Run Bytes Received By Job
725 - Total Bytes Sent By Job
5683869 - Total Bytes Received By Job
...
[dbrown@sugar stderr]$ cat stderr_test.err
I am stderr!
[dbrown@sugar stderr]$ cat stderr_test.out
[dbrown@sugar stderr]$
Now the inspiral code:
[dbrown@sugar stderr]$ ./lalapps_inspiral --badgers
Condor: Notice: Will checkpoint to lalapps_inspiral.ckpt
Condor: Notice: Remote system calls disabled.
lalapps_inspiral: unrecognized option `--badgers'
[dbrown@sugar stderr]$
[dbrown@sugar stderr]$ ./lalapps_inspiral --badgers 2>/dev/null
[dbrown@sugar stderr]$
This prints nothing, so the error message really is going to stderr.
[dbrown@sugar stderr]$ cat inspiral_test.sub
universe = standard
executable = /home/dbrown/projects/daswg/condor/stderr/lalapps_inspiral
arguments = --badgers
error = inspiral_test.err
output = inspiral_test.out
log = inspiral_test.log
queue
[dbrown@sugar stderr]$ condor_submit inspiral_test.sub
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 40847.
[dbrown@sugar stderr]$ cat inspiral_test.log
000 (40847.000.000) 05/15 17:12:23 Job submitted from host:
<10.20.1.23:58095>
...
001 (40847.000.000) 05/15 17:12:38 Job executing on host:
<10.20.2.33:42389>
...
005 (40847.000.000) 05/15 17:12:38 Job terminated.
(1) Normal termination (return value 1)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
941 - Run Bytes Sent By Job
14791776 - Run Bytes Received By Job
941 - Total Bytes Sent By Job
14791776 - Total Bytes Received By Job
...
[dbrown@sugar stderr]$ cat inspiral_test.err
[dbrown@sugar stderr]$ cat inspiral_test.out
Both are empty, even though the job exited with status code 1.
Now the error message 'unrecognized option' is coming from getopt, so
I modified stderr_test to do getopt parsing. The source is attached
below.
[dbrown@sugar stderr]$ ./stderr_test
Condor: Notice: Will checkpoint to ./stderr_test.ckpt
Condor: Notice: Remote system calls disabled.
I am stderr!
[dbrown@sugar stderr]$ ./stderr_test --verbose
Condor: Notice: Will checkpoint to ./stderr_test.ckpt
Condor: Notice: Remote system calls disabled.
I am stderr!
I really, really am stderr!
[dbrown@sugar stderr]$ ./stderr_test --badgers
Condor: Notice: Will checkpoint to ./stderr_test.ckpt
Condor: Notice: Remote system calls disabled.
./stderr_test: unrecognized option `--badgers'
[dbrown@sugar stderr]$ cat stderr_test.sub
universe = standard
executable = /home/dbrown/projects/daswg/condor/stderr/stderr_test
arguments = --badgers
error = stderr_test.err
output = stderr_test.out
log = stderr_test.log
queue
[dbrown@sugar stderr]$ condor_submit stderr_test.sub
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 40849.
[dbrown@sugar stderr]$ cat stderr_test.log
000 (40849.000.000) 05/15 17:30:07 Job submitted from host:
<10.20.1.23:58095>
...
001 (40849.000.000) 05/15 17:30:10 Job executing on host:
<10.20.2.33:42389>
...
005 (40849.000.000) 05/15 17:30:10 Job terminated.
(1) Normal termination (return value 1)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
725 - Run Bytes Sent By Job
5716309 - Run Bytes Received By Job
725 - Total Bytes Sent By Job
5716309 - Total Bytes Received By Job
...
[dbrown@sugar stderr]$ cat stderr_test.err
condor_exec.40849.0: unrecognized option `--badgers'
So now I'm completely confused. If you want to look at the source
code for lalapps_inspiral it is at
<http://www.lsc-group.phys.uwm.edu/cgi-bin/cvs/viewcvs.cgi/lalapps/
src/inspiral/inspiral.c?rev=1.272&cvsroot=lal&content-type=text/
vnd.viewcvs-markup>
I'll try doing some more experiments, but if you have suggestions of
where to look, I'd appreciate it.
Cheers,
Duncan.
[dbrown@sugar stderr]$ cat stderr_test.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
int main( int argc, char *argv[] )
{
int vrbflg = 0;
struct option long_options[] =
{
{"verbose", no_argument, &vrbflg, 1 },
{0, 0, 0, 0}
};
int c;
while ( 1 )
{
/* getopt_long stores long option here */
int option_index = 0;
c = getopt_long_only( argc, argv, "v", long_options,
&option_index );
/* detect the end of the options */
if ( c == - 1 )
{
break;
}
switch ( c )
{
case 0:
/* if this option set a flag, do nothing else
* now */
if ( long_options[option_index].flag != 0 )
{
break;
}
else
{
fprintf( stderr, "error parsing option %s with argument %s
\n",
long_options[option_index].name, optarg );
exit( 1 );
}
break;
/* exit if we get an unknown option */
case '?':
exit( 1 );
break;
default:
fprintf( stderr, "unknown error while parsing options (%d)
\n", c );
exit( 1 );
}
}
if ( optind < argc )
{
fprintf( stderr, "extraneous command line arguments:\n" );
while ( optind < argc )
{
fprintf ( stderr, "%s\n", argv[optind++] );
}
exit( 1 );
}
fprintf( stderr, "I am stderr!\n" );
if ( vrbflg )
{
fprintf( stderr, "I really, really am stderr!\n" );
}
return 0;
}
On May 9, 2008, at 10:44 AM, condor-admin response tracking system
wrote:
>> I have been running a DAG containing standard universe jobs which
>> fail due to an error in the command line arguments. The jobs exit
>> with return code 1 and dagman correctly captures them as failures.
>> The stderr files are written, but are all empty. They should contain
>> the error message from the code:
>>
>> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
>> inspiral-817566800-817568848-29160-0.err
>> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
>> inspiral-817566800-817568848-29160-0.out
>> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:30
>> inspiral-817568720-817570768-29175-0.err
>> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:30
>> inspiral-817568720-817570768-29175-0.out
>> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
>> inspiral-817570640-817572688-29161-0.err
>> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
>> inspiral-817570640-817572688-29161-0.out
>> -rw-r--r-- 1 dbrown sufaculty 0 May 7 16:25
>> inspiral-817572560-817574608-29163-0.err
>>
>> When I run the code on the command line, I see the error message
>> correctly:
>>
>> [dbrown@sugar tag]$ ./lalapps_inspiral --trig-end-time 0 --cluster-
>> method window --dynamic-range-exponent 69.0 --disable-rsq-veto --
>> bank-
>> file H1-TMPLTBANK-817577671-2048.xml.gz --high-pass-order 8 --strain-
>> high-pass-order 8 --ifo-tag FIRST --gps-end-time 817579719 --
>> calibrated-data real_8 --channel-name H1:LSC-STRAIN --snr-threshold
>> 5.5 --number-of-segments 15 --trig-start-time 0 --enable-high-pass
>> 30.0 --debug-level 33 --gps-start-time 817577671 --enable-filter-inj-
>> only --high-pass-attenuation 0.1 --chisq-bins 0 --inverse-spec-length
>> 16 --segment-length 1048576 --low-frequency-cutoff 40.0 --pad-data 8
>> --cluster-window 16 --sample-rate 4096 --chisq-threshold 10.0 --
>> resample-filter ldas --strain-high-pass-atten 0.1 --strain-high-pass-
>> freq 30.0 --segment-overlap 524288 --frame-cache cache/H-
>> H1_RDS_C03_L2-817577663-817592553.cache --chisq-delta 0.2 --bank-
>> veto-
>> subbank-size 1 --approximant FindChirpSP --write-compress --enable-
>> output --order twoPN --spectrum-type median
>> Condor: Notice: Will checkpoint to ./lalapps_inspiral.ckpt
>> Condor: Notice: Remote system calls disabled.
>> ./lalapps_inspiral: unrecognized option `--bank-veto-subbank-size'
>>
>> Any idea why this is not being captured in the log file? stdout from
>> another job is:
>>
>> -rw-r--r-- 1 dbrown sufaculty 0 May 7 21:15
>> tmpltbank-817745157-817747205-29812-0.err
>> -rw-r--r-- 1 dbrown sufaculty 31 May 7 22:23
>> tmpltbank-817745157-817747205-29812-0.out
>
>
> Can you try condor_compile'ing the following program with the same
> compiler as your application and submitting it to your Condor pool?
>
> ------------------
> #include <stdio.h>
> int main() {
> fprintf( stderr, "I am stderr!\n" );
> return 0;
> }
> ------------------
>
>
> Thanks and regards,
> Jaime Frey
> UW-Madison Condor Team
>
>
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Jaime Frey <jfrey__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu,
> anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Thu May 15 16:50:23 2008 (1210888224)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #17983] LIGO: stderr not captured in 7.0.1
standard universe
Date: Thu, 15 May 2008 18:20:20 -0400
To: condor-admin__AT__cs.wisc.edu
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
Hi Jamie,
On May 9, 2008, at 10:44 AM, condor-admin response tracking system
wrote:
> Can you try condor_compile'ing the following program with the same
> compiler as your application and submitting it to your Condor pool?
I forgot to mention that this works fine on the Caltech cluster
running FC4:
[dbrown__AT__ldas-pcdev1.ligo inspiral]$ cat inspiral_test.sub
universe = standard
executable = /usr1/dbrown/src/lalapps/src/inspiral/lalapps_inspiral
arguments = --badgers
error = inspiral_test.err
output = inspiral_test.out
log = inspiral_test.log
queue
[dbrown__AT__ldas-pcdev1.ligo inspiral]$ condor_submit inspiral_test.sub
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 2620571.
[dbrown__AT__ldas-pcdev1.ligo inspiral]$ cat inspiral_test.log
000 (2620571.000.000) 05/15 15:18:01 Job submitted from host:
<10.14.0.18:50983>
...
001 (2620571.000.000) 05/15 15:18:06 Job executing on host:
<10.14.1.18:36563>
...
005 (2620571.000.000) 05/15 15:18:06 Job terminated.
(1) Normal termination (return value 1)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
1225 - Run Bytes Sent By Job
10186707 - Run Bytes Received By Job
1225 - Total Bytes Sent By Job
10186707 - Total Bytes Received By Job
...
[dbrown__AT__ldas-pcdev1.ligo inspiral]$ cat inspiral_test.err
condor_exec.2620571.0: unrecognized option `--badgers'
Cheers,
Duncan.
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Thu May 15 17:20:27 2008 (1210890030)
From: Jaime Frey <jfrey__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #17983] LIGO: stderr not captured in 7.0.1
standard universe
Date: Mon, 19 May 2008 11:03:51 -0500
> So now I'm completely confused. If you want to look at the source
> code for lalapps_inspiral it is at
>
> <http://www.lsc-group.phys.uwm.edu/cgi-bin/cvs/viewcvs.cgi/lalapps/
> src/inspiral/inspiral.c?rev=1.272&cvsroot=lal&content-type=text/
> vnd.viewcvs-markup>
>
> I'll try doing some more experiments, but if you have suggestions of
> where to look, I'd appreciate it.
Can you try building stderr_test and lalapps_inspiral on the same
machine, then submit two executables to Condor?
Thanks and regards,
Jaime Frey
UW-Madison Condor Team
===========================================================================
Date mail was appended: Mon May 19 11:03:54 2008 (1211213034)
Subject: Actions
Status changed from open to pending by jfrey
===========================================================================
Date of actions: Mon May 19 11:03:54 2008 (1211213035)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #17983] LIGO: stderr not captured in 7.0.1
standard universe
Date: Mon, 19 May 2008 14:07:16 -0400
To: condor-admin__AT__cs.wisc.edu
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
Hi Jamie,
On May 19, 2008, at 12:03 PM, condor-admin response tracking system
wrote:
>> So now I'm completely confused. If you want to look at the source
>> code for lalapps_inspiral it is at
>>
>> <http://www.lsc-group.phys.uwm.edu/cgi-bin/cvs/viewcvs.cgi/lalapps/
>> src/inspiral/inspiral.c?rev=1.272&cvsroot=lal&content-type=text/
>> vnd.viewcvs-markup>
>>
>> I'll try doing some more experiments, but if you have suggestions of
>> where to look, I'd appreciate it.
>
>
> Can you try building stderr_test and lalapps_inspiral on the same
> machine, then submit two executables to Condor?
[dbrown@sugar-dev1 stderr]$ condor_compile gcc -Wall -g -o
stderr_test stderr_test.c
LINKING FOR CONDOR : /usr/bin/ld -L/usr/lib64 -Bstatic --eh-frame-hdr
-m elf_x86_64 --hash-style=gnu -dynamic-linker /lib64/ld-linux-
x86-64.so.2 -o stderr_test /usr/lib64/condor_rt0.o /usr/lib64/crti.o /
usr/lib/gcc/x86_64-redhat-linux/4.1.1/crtbeginT.o -L/usr/lib64 -L/usr/
lib/gcc/x86_64-redhat-linux/4.1.1 -L/usr/lib/gcc/x86_64-redhat-linux/
4.1.1 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.1/../../../../lib64 -L/
lib/../lib64 -L/usr/lib/../lib64 /tmp/ccMlMLAE.o /usr/lib64/
libcondorsyscall.a /usr/lib64/libcondor_z.a /usr/lib64/libcomp_libstdc
++.a /usr/lib64/libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-
needed --no-as-needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
lcondor_resolv -lcondor_c /usr/lib64/libcomp_libgcc.a /usr/lib64/
libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/x86_64-
redhat-linux/4.1.1/crtend.o /usr/lib64/crtn.o
/usr/lib64/libcondorsyscall.a(condor_file_agent.o): In function
`CondorFileAgent::open(char const*, int, int)':
/home/condor/execute/dir_15919/userdir/src/condor_ckpt/
condor_file_agent.C:106: warning: the use of `tmpnam' is dangerous,
better use `mkstemp'
/usr/lib64/libcondorsyscall.a(special_stubs.o): In function
`condor_gethostbyaddr':
/home/condor/execute/dir_15919/userdir/src/condor_syscall_lib/
special_stubs.C:200: warning: Using 'gethostbyaddr' in statically
linked applications requires at runtime the shared libraries from the
glibc version used for linking
/usr/lib64/libcondorsyscall.a(special_stubs.o): In function
`condor_gethostbyname':
/home/condor/execute/dir_15919/userdir/src/condor_syscall_lib/
special_stubs.C:193: warning: Using 'gethostbyname' in statically
linked applications requires at runtime the shared libraries from the
glibc version used for linking
/usr/lib64/libcondorsyscall.a(sock.o): In function
`Sock::getportbyserv(char*)':
/home/condor/execute/dir_15919/userdir/src/condor_io/sock.C:208:
warning: Using 'getservbyname' in statically linked applications
requires at runtime the shared libraries from the glibc version used
for linking
[dbrown@sugar-dev1 stderr]$ cat stderr_inspiral.sub
universe = standard
arguments = --badgers
error = stderr_inspiral.$(cluster).$(process).err
output = stderr_inspiral.$(cluster).$(process).out
log = stderr_inspiral.log
executable = /home/dbrown/projects/daswg/condor/stderr/stderr_test
queue
executable = /home/dbrown/projects/daswg/condor/stderr/lalapps_inspiral
queue
[dbrown@sugar stderr]$ condor_submit stderr_inspiral.sub
Submitting job(s)..
Logging submit event(s)..
1 job(s) submitted to cluster 43046.
1 job(s) submitted to cluster 43047.
[dbrown@sugar stderr]$ tail -f stderr_inspiral.log
000 (43046.000.000) 05/19 12:36:45 Job submitted from host:
<10.20.1.23:58095>
...
000 (43047.000.000) 05/19 12:36:45 Job submitted from host:
<10.20.1.23:58095>
...
001 (43046.000.000) 05/19 12:41:58 Job executing on host:
<10.20.2.44:38245>
...
005 (43046.000.000) 05/19 12:41:58 Job terminated.
(1) Normal termination (return value 1)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
725 - Run Bytes Sent By Job
5716333 - Run Bytes Received By Job
725 - Total Bytes Sent By Job
5716333 - Total Bytes Received By Job
...
001 (43047.000.000) 05/19 12:41:58 Job executing on host:
<10.20.2.44:38245>
...
005 (43047.000.000) 05/19 12:41:58 Job terminated.
(1) Normal termination (return value 1)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
941 - Run Bytes Sent By Job
14791884 - Run Bytes Received By Job
941 - Total Bytes Sent By Job
14791884 - Total Bytes Received By Job
...
[dbrown@sugar stderr]$ cat stderr_inspiral.43046.0.err
condor_exec.43046.0: unrecognized option `--badgers'
[dbrown@sugar stderr]$ cat stderr_inspiral.43047.0.err
[dbrown@sugar stderr]$
Nothing in the inspiral stderr.
Cheers,
Duncan.
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Mon May 19 13:07:19 2008 (1211220439)
Subject: Comments added
This ticket is being closed since it looks like it was tied to admin #17975
which I've solved. In the event admin #17975 doesn't solve this ticket, it
will be reopened.
Comments added by psilord
===========================================================================
Date comments were added: Mon Jun 2 16:11:01 2008 (1212441062)
Subject: Actions
Ticket resolved by psilord
===========================================================================
Date of actions: Mon Jun 2 16:11:57 2008 (1212441117)