LIGO Support Ticket 2123
Ticket Information
Number: support 2123
User: anderson@ligo.caltech.edu
Email: keppel_d__AT__ligo.caltech.edu,kipp__AT__gravity.phys.uwm.edu,dabrown__AT__physics.syr.edu,skoranda__AT__gravity.phys.uwm.edu,channa__AT__phys.lsu.edu
Status: resolved
Assigned To: psilord
Date: Thu, 11 Oct 2007 19:39:08 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support__AT__cs.wisc.edu, psilord__AT__cs.wisc.edu, Todd
Tannenbaum <tannenba__AT__cs.wisc.edu>
CC: Drew Keppel <keppel_d__AT__ligo.caltech.edu>, Kipp Cannon
<kipp__AT__gravity.phys.uwm.edu>, Brown Duncan
<dabrown__AT__physics.syr.edu>, Scott Koranda
<skoranda__AT__gravity.phys.uwm.edu>
Subject: LIGO: Corrupt file output in the Standard Universe
X-Seen-BY: mailfromd 4.1 lava.cs.wisc.edu
We have found that one of the main LIGO analysis workflows is generating
corrupt output when using the Standard Universe in Condor 6.9.5 running
on FC4 x86_64 machines. This happens whether the static executable is
built on an FC4 x86 or x86_64 machine and it does not happen when running
in the Vanilla Universe. The problem also does not occur when condor_compiling
the same code on the same build machines with Condor version 6.8.5 on either
x86 or x86_64.
The application in question is a C program that has the option of writing
its output as plain ASCII, which works in either the Standard or Vanilla
Universe, or making calls to libz to output gzip compressed files. The file
corruption happens only when selecting compressed output in the Standard
Universe with 6.9.5, in which case the output files are not valid gzip files.
Note, the problem happens with WantRemoteIO = FALSE and has not been tested
with set to True.
While it is possible that Condor 6.9.5 has merely exposed a bug in the
LIGO code, in my opinion there is a significant chance that this is a
Condor bug. Therefore, while we attempt to reduce this down to a simple test
program, I would like to know if there are any other known issues with 6.9.5
standard universe file I/O? It would also be helpful to know if you are aware
of any other Standard Universe applications running under 6.9.5 that use libz
to successfully output their results?
Regardless of the resolution of this immediate problem, this is also a
reminder that I would like to request that we update the existing LIGO
code used by the regular Condor build and regression test system to use
a newer code version that exercises the use of libz.
Drew, Kipp, Duncan,
If I have misrepresented the facts or left anything important out
please group reply to the automatic email response form the Condor problem
reporting system that includes the condor-support ticket number for this
issue.
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date of creation: Thu Oct 11 21:41:29 2007 (1192156891)
Subject: Actions
Assigned to psilord by psilord
===========================================================================
Date of actions: Fri Oct 12 10:14:18 2007 (1192202058)
Date: Fri, 12 Oct 2007 11:47:12 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: psilord <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
> From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
>
> We have found that one of the main LIGO analysis workflows is generating
> corrupt output when using the Standard Universe in Condor 6.9.5 running
> on FC4 x86_64 machines. This happens whether the static executable is
> built on an FC4 x86 or x86_64 machine and it does not happen when running
> in the Vanilla Universe. The problem also does not occur when condor_compiling
> the same code on the same build machines with Condor version 6.8.5 on either
> x86 or x86_64.
Hrm. I can check the differences between those revisions in our repository and
see if anything jumps out as to why it fails.
> While it is possible that Condor 6.9.5 has merely exposed a bug in the
> LIGO code, in my opinion there is a significant chance that this is a
> Condor bug. Therefore, while we attempt to reduce this down to a simple test
> program, I would like to know if there are any other known issues with 6.9.5
> standard universe file I/O? It would also be helpful to know if you are aware
> of any other Standard Universe applications running under 6.9.5 that use libz
> to successfully output their results?
As far as we know, there aren't any other problems (beyond reading and
writing an fd at the same time) which could cause such corruption. While
I don't know of any people using gzipped output at this time, in the
past people had used it and it worked for them.
Does the corruption happen across a checkpoint? Or does it happen if
the process runs to completion without checkpointing?
> Regardless of the resolution of this immediate problem, this is also a
> reminder that I would like to request that we update the existing LIGO
> code used by the regular Condor build and regression test system to use
> a newer code version that exercises the use of libz.
If I could get the exact requirements of what you would like tested under
what conditions (source codes, specific environments--compiler revisions,
etc, etc), I can work to create the tests you desire.
Thank you.
Condor Admin
===========================================================================
Date mail was appended: Fri Oct 12 11:47:20 2007 (1192207640)
Date: Fri, 12 Oct 2007 09:56:24 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 basalt.cs.wisc.edu
On Fri, Oct 12, 2007 at 11:47:20AM -0500, condor-support response tracking system wrote:
> Hello,
>
> > From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
> >
> > We have found that one of the main LIGO analysis workflows is generating
> > corrupt output when using the Standard Universe in Condor 6.9.5 running
> > on FC4 x86_64 machines. This happens whether the static executable is
> > built on an FC4 x86 or x86_64 machine and it does not happen when running
> > in the Vanilla Universe. The problem also does not occur when condor_compiling
> > the same code on the same build machines with Condor version 6.8.5 on either
> > x86 or x86_64.
>
> Hrm. I can check the differences between those revisions in our repository and
> see if anything jumps out as to why it fails.
Thanks.
>
> > While it is possible that Condor 6.9.5 has merely exposed a bug in the
> > LIGO code, in my opinion there is a significant chance that this is a
> > Condor bug. Therefore, while we attempt to reduce this down to a simple test
> > program, I would like to know if there are any other known issues with 6.9.5
> > standard universe file I/O? It would also be helpful to know if you are aware
> > of any other Standard Universe applications running under 6.9.5 that use libz
> > to successfully output their results?
>
> As far as we know, there aren't any other problems (beyond reading and
> writing an fd at the same time) which could cause such corruption. While
> I don't know of any people using gzipped output at this time, in the
> past people had used it and it worked for them.
>
> Does the corruption happen across a checkpoint? Or does it happen if
> the process runs to completion without checkpointing?
It happens without checkpointing.
>
> > Regardless of the resolution of this immediate problem, this is also a
> > reminder that I would like to request that we update the existing LIGO
> > code used by the regular Condor build and regression test system to use
> > a newer code version that exercises the use of libz.
>
> If I could get the exact requirements of what you would like tested under
> what conditions (source codes, specific environments--compiler revisions,
> etc, etc), I can work to create the tests you desire.
>
This is a request to update the current LIGO code test you are running and
the ball is in our court to give you the necessary information.
Duncan,
I believe you provided the original information, are you willing
to package and distribute an updated code drop to Peter?
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Fri Oct 12 11:56:42 2007 (1192208202)
CC: condor-support response tracking system <condor-support__AT__cs.wisc.edu>,
Drew Keppel <keppel_d__AT__ligo.caltech.edu>, Kipp Cannon
<kipp__AT__gravity.phys.uwm.edu>, Scott Koranda
<skoranda__AT__gravity.phys.uwm.edu>, Chad Hanna <channa__AT__phys.lsu.edu>
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Date: Fri, 12 Oct 2007 13:03:18 -0400
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu
--Apple-Mail-2--846966137
Hi Stuart,
On Oct 12, 2007, at 12:56 PM, Stuart Anderson wrote:
> On Fri, Oct 12, 2007 at 11:47:20AM -0500, condor-support response
> tracking system wrote:
>> Hello,
>>
>>> From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
>>>
>>> We have found that one of the main LIGO analysis workflows is
>>> generating
>>> corrupt output when using the Standard Universe in Condor 6.9.5
>>> running
>>> on FC4 x86_64 machines. This happens whether the static
>>> executable is
>>> built on an FC4 x86 or x86_64 machine and it does not happen when
>>> running
>>> in the Vanilla Universe. The problem also does not occur when
>>> condor_compiling
>>> the same code on the same build machines with Condor version
>>> 6.8.5 on either
>>> x86 or x86_64.
>>
>> Hrm. I can check the differences between those revisions in our
>> repository and
>> see if anything jumps out as to why it fails.
>
> Thanks.
>
>>
>>> While it is possible that Condor 6.9.5 has merely exposed a bug
>>> in the
>>> LIGO code, in my opinion there is a significant chance that this
>>> is a
>>> Condor bug. Therefore, while we attempt to reduce this down to a
>>> simple test
>>> program, I would like to know if there are any other known issues
>>> with 6.9.5
>>> standard universe file I/O? It would also be helpful to know if
>>> you are aware
>>> of any other Standard Universe applications running under 6.9.5
>>> that use libz
>>> to successfully output their results?
>>
>> As far as we know, there aren't any other problems (beyond reading
>> and
>> writing an fd at the same time) which could cause such corruption.
>> While
>> I don't know of any people using gzipped output at this time, in the
>> past people had used it and it worked for them.
>>
>> Does the corruption happen across a checkpoint? Or does it happen if
>> the process runs to completion without checkpointing?
>
> It happens without checkpointing.
>
>>
>>> Regardless of the resolution of this immediate problem, this is
>>> also a
>>> reminder that I would like to request that we update the existing
>>> LIGO
>>> code used by the regular Condor build and regression test system
>>> to use
>>> a newer code version that exercises the use of libz.
>>
>> If I could get the exact requirements of what you would like
>> tested under
>> what conditions (source codes, specific environments--compiler
>> revisions,
>> etc, etc), I can work to create the tests you desire.
>>
>
> This is a request to update the current LIGO code test you are
> running and
> the ball is in our court to give you the necessary information.
>
> Duncan,
> I believe you provided the original information, are you willing
> to package and distribute an updated code drop to Peter?
Chad volunteered to do this, so I've added him to this mail.
Cheers,
Duncan.
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-2--846966137
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAkeUMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDcwOTEyMTkwMjA3WhcNMDgwOTExMTkwMjA3WjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAM+cSlKBhUWHQNmcRf3bsHC3ngHwQC+E
/RJNoYdRC0vQPVQ6kt12TLn6IGX+Eq0p5cHxd+rmOAms7zMzdhw9CYypN4KE
rgQOubIV7zTzajSYsNBKYkX6TA/jyt1223fmruD5hEnnOCWevsruA1iLIUFS
NjZ0WqYA+2KbHvjrObPkcoB7zZ2x4CV1ayWlYRZwLUrSdzSXQKYpbmTJili/
c5GzXpd22oQCpWX8+472pM0zM9l4A3B7uTFpmunsvKox741+CeSbgJzaHkIZ
V/9TjsEO6zrg05+JEGOcXzII2mlgEWtJRmOTOco2QkZ5h0aU8XTsxVAbbtCU
Y7JH4Z91wCMCAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMB8GA1UdIwQYMBaAFMoZHRKObqQ4XULUMQ4I29mNFw1dMCIG
A1UdEQQbMBmBF2RhYnJvd25AcGh5c2ljcy5zeXIuZWR1MDoGA1UdHwQzMDEw
L6AtoCuGKWh0dHA6Ly9wa2kxLmRvZWdyaWRzLm9yZy9DUkwvMWMzZjJjYTgu
Y3JsMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgkwDQYJKoZIhvcNAQEFBQAD
ggEBAA3JV1r6C2MEwcNGarW8KBr3phLOLXoF2656DUFIy8sqler1t38f7ucX
hRSQLu26eLyGgUzrsPuiEAPqFYYNZa71DuQCcYbBs6wW7QFQrXMq7trHkXVG
qRhiHgT+tTVqxPkZgMKDcj853N9MiZod5QgYQCfEy+4A17WZ31W/2NzPgSYn
2beOsHTnMbkciPIi7Jq7E8IV0wvfPuv+ypRRhymG3VthKrRQCMKu0I4QaUfL
iX4BrlB07QesDw7X4kwR+o5flOjkjliQdBWZDcl+hyLNzbi20niOuLW1eoto
Gn9dnelZa9h2jQRqhyfvXDUpOt9jStxsSZjgkDK6L4BfmT4xggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAkeUMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA3MTAxMjE3
MDMxOVowIwYJKoZIhvcNAQkEMRYEFE1L68MxzffJ49duHsUBVA8hgeXVMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICR5QwgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJHlDANBgkqhkiG9w0BAQEF
AASCAQB7wG1ycm9e0rwDagjJuv6K7vQy2yNTuc2KIhY5yvh+e3WPVyd6uYUI
tC35jxfslWeQy9ei4m3UQlTHOlyYQnic9T7FbXuX0ZYV+RXBCVPs2QWKIvMM
8+qLlNk6f+MCCxXEaXFlJvNjmqJm1IqVA/jcMg+m6+BO6oaqS+2x9aP06rBY
BiNZxCSxOqW1rYIIhHFzglAE4TJjGAlxiO94dxS8IE/e0S/6DeKsy3F4Vgdh
KggxSprsmEnr1d3V5VVj0NYivnEdw6lsHH9KcL2fUepBiwMj3GCEFpCLg2BP
shd4SDYZtwBeEEjVvTJGL5ZSnQtsDHsxOX3JETWQYm79QK2LAAAAAAAA
--Apple-Mail-2--846966137--
===========================================================================
Date mail was appended: Fri Oct 12 12:03:33 2007 (1192208615)
Date: Sat, 13 Oct 2007 13:55:54 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Peter,
Here is a simple test case that shows the problem.
The platform is FC4 x86_64 with the bundled gcc-4.0.2 and
$ condor_version
$CondorVersion: 6.9.4 Aug 30 2007 $
$CondorPlatform: X86_64-LINUX_RHEL3 $
$ cat bug.c
/* Generates an invalid bug.gz file when condor_compile with 6.9.4 */
/* [condor-support #2123] */
#include<stdio.h>
#include<zlib.h>
main()
{
gzFile *gzfp;
gzfp = gzopen("bug.gz","w");
gzprintf(gzfp,"Can you read me?\n");
gzclose(gzfp);
}
$ gcc bug.c -o bug -lz
$ ./bug
$ zcat bug.gz
Can you read me?
$ condor_compile gcc bug.c -o bug -lz
LINKING FOR CONDOR : /usr/bin/ld -L/usr/lib64 -Bstatic --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o bug /usr/lib64/condor_rt0.o /usr/lib64/crti.o /usr/lib/gcc/x86_64-redhat-linux/4.0.2/crtbeginT.o -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2/../../../../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2/../../.. -L/lib/../lib64 -L/usr/lib/../lib64 /tmp/ccIeWr6H.o -lz /usr/lib64/libcondorsyscall.a /usr/lib64/libcondor_z.a /usr/lib64/libcomp_libstdc++.a /usr/lib64/libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-needed --no-as-needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -lcondor_resolv -lcondor_c /usr/lib64/libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/x86_64-redhat-linux/4.0.2/crtend.o /usr/lib64/crtn.o
/usr/lib64/libcondorsyscall.a(condor_file_agent.o)(.text+0x29b): In function `CondorFileAgent::open(char const*, int, int)':
: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
/usr/lib64/libcondorsyscall.a(switches.o)(.text+0xa4bb): In function `__gets_chk':
: warning: the `gets' function is dangerous and should not be used.
$ ./bug
Condor: Notice: Will checkpoint to ./bug.ckpt
Condor: Notice: Remote system calls disabled.
$ zcat bug.gz
zcat: bug.gz: invalid compressed data--format violated
This program works with and without condor_compile if condor-6.8.5 is
used instead of 6.9.4. A copy of the source code and broken condor_compiled
executable may be found at,
http://www.ligo.caltech.edu/~anderson/condor.2123
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Sat Oct 13 15:56:14 2007 (1192308974)
CC: condor-support response tracking system <condor-support__AT__cs.wisc.edu>,
keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Date: Mon, 15 Oct 2007 09:07:53 -0400
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
--Apple-Mail-3--601891544
Hi Stuart,
If you condor compile this code with 6.8.5, but then run it under
6.9.4, does it work?
Cheers,
Duncan.
On Oct 13, 2007, at 4:55 PM, Stuart Anderson wrote:
> Peter,
> Here is a simple test case that shows the problem.
> The platform is FC4 x86_64 with the bundled gcc-4.0.2 and
>
> $ condor_version
> $CondorVersion: 6.9.4 Aug 30 2007 $
> $CondorPlatform: X86_64-LINUX_RHEL3 $
>
>
> $ cat bug.c
> /* Generates an invalid bug.gz file when condor_compile with 6.9.4 */
> /* [condor-support #2123] */
>
> #include<stdio.h>
> #include<zlib.h>
>
> main()
> {
> gzFile *gzfp;
>
> gzfp = gzopen("bug.gz","w");
> gzprintf(gzfp,"Can you read me?\n");
> gzclose(gzfp);
> }
>
> $ gcc bug.c -o bug -lz
> $ ./bug
> $ zcat bug.gz
> Can you read me?
>
> $ condor_compile gcc bug.c -o bug -lz
> LINKING FOR CONDOR : /usr/bin/ld -L/usr/lib64 -Bstatic --eh-frame-
> hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o
> bug /usr/lib64/condor_rt0.o /usr/lib64/crti.o /usr/lib/gcc/x86_64-
> redhat-linux/4.0.2/crtbeginT.o -L/usr/lib64 -L/usr/lib/gcc/x86_64-
> redhat-linux/4.0.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2 -L/usr/
> lib/gcc/x86_64-redhat-linux/4.0.2/../../../../lib64 -L/usr/lib/gcc/
> x86_64-redhat-linux/4.0.2/../../.. -L/lib/../lib64 -L/usr/lib/../
> lib64 /tmp/ccIeWr6H.o -lz /usr/lib64/libcondorsyscall.a /usr/lib64/
> libcondor_z.a /usr/lib64/libcomp_libstdc++.a /usr/lib64/
> libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-needed --no-as-
> needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
> lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
> lcondor_resolv -lcondor_c /usr/lib64/libcomp_libgcc.a /usr/lib64/
> libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/x86_64-
> redhat-linux/4.0.2/crtend.o /usr/lib64/crtn.o
> /usr/lib64/libcondorsyscall.a(condor_file_agent.o)(.text+0x29b): In
> function `CondorFileAgent::open(char const*, int, int)':
> : warning: the use of `tmpnam' is dangerous, better use `mkstemp'
> /usr/lib64/libcondorsyscall.a(switches.o)(.text+0xa4bb): In
> function `__gets_chk':
> : warning: the `gets' function is dangerous and should not be used.
> $ ./bug
> Condor: Notice: Will checkpoint to ./bug.ckpt
> Condor: Notice: Remote system calls disabled.
> $ zcat bug.gz
>
> zcat: bug.gz: invalid compressed data--format violated
>
>
> This program works with and without condor_compile if condor-6.8.5 is
> used instead of 6.9.4. A copy of the source code and broken
> condor_compiled
> executable may be found at,
>
> http://www.ligo.caltech.edu/~anderson/condor.2123
>
> Thanks.
>
> --
> Stuart Anderson anderson__AT__ligo.caltech.edu
> http://www.ligo.caltech.edu/~anderson
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-3--601891544
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAkeUMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDcwOTEyMTkwMjA3WhcNMDgwOTExMTkwMjA3WjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAM+cSlKBhUWHQNmcRf3bsHC3ngHwQC+E
/RJNoYdRC0vQPVQ6kt12TLn6IGX+Eq0p5cHxd+rmOAms7zMzdhw9CYypN4KE
rgQOubIV7zTzajSYsNBKYkX6TA/jyt1223fmruD5hEnnOCWevsruA1iLIUFS
NjZ0WqYA+2KbHvjrObPkcoB7zZ2x4CV1ayWlYRZwLUrSdzSXQKYpbmTJili/
c5GzXpd22oQCpWX8+472pM0zM9l4A3B7uTFpmunsvKox741+CeSbgJzaHkIZ
V/9TjsEO6zrg05+JEGOcXzII2mlgEWtJRmOTOco2QkZ5h0aU8XTsxVAbbtCU
Y7JH4Z91wCMCAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMB8GA1UdIwQYMBaAFMoZHRKObqQ4XULUMQ4I29mNFw1dMCIG
A1UdEQQbMBmBF2RhYnJvd25AcGh5c2ljcy5zeXIuZWR1MDoGA1UdHwQzMDEw
L6AtoCuGKWh0dHA6Ly9wa2kxLmRvZWdyaWRzLm9yZy9DUkwvMWMzZjJjYTgu
Y3JsMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgkwDQYJKoZIhvcNAQEFBQAD
ggEBAA3JV1r6C2MEwcNGarW8KBr3phLOLXoF2656DUFIy8sqler1t38f7ucX
hRSQLu26eLyGgUzrsPuiEAPqFYYNZa71DuQCcYbBs6wW7QFQrXMq7trHkXVG
qRhiHgT+tTVqxPkZgMKDcj853N9MiZod5QgYQCfEy+4A17WZ31W/2NzPgSYn
2beOsHTnMbkciPIi7Jq7E8IV0wvfPuv+ypRRhymG3VthKrRQCMKu0I4QaUfL
iX4BrlB07QesDw7X4kwR+o5flOjkjliQdBWZDcl+hyLNzbi20niOuLW1eoto
Gn9dnelZa9h2jQRqhyfvXDUpOt9jStxsSZjgkDK6L4BfmT4xggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAkeUMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA3MTAxNTEz
MDc1M1owIwYJKoZIhvcNAQkEMRYEFGU2j23UK35QMztXfZE8qDg+QcBVMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICR5QwgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJHlDANBgkqhkiG9w0BAQEF
AASCAQCYs1Uc3FCy6tUCbvAvMYAbXnfbVC3SC80Pa2h4tlHAMuOMRVk2dhxP
eEKlgnkvgulQ67wiLcb4Fpdvzab20+G4+/CcTUrHbq7j/SAHd6fV0Cxbm0Wp
2tm2VF4IWVCM/KLY19K0EFEShNMa5GnhbKbhW2/PwTnZolXarJO0TfBQTMvh
BkRBPGUopmtDCe6azMx3OdTLn2Jv6BQMEbaE/e90eteRGzNC3laUtJwVMQuL
bP5BE0aPWfBIVx8S6fVO55KcwbedEJOuv0lg3jubXff9nfCbYa3Zngta2wXr
nwMVoHQ7RayBuBG7hvnywpVsgDxc4pBoTb2IuP/8Rk2OKQhPAAAAAAAA
--Apple-Mail-3--601891544--
===========================================================================
Date mail was appended: Mon Oct 15 8:08:03 2007 (1192453686)
Date: Mon, 15 Oct 2007 11:20:44 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
> /* Generates an invalid bug.gz file when condor_compile with 6.9.4 */
> /* [condor-support #2123] */
>
> #include<stdio.h>
> #include<zlib.h>
>
> main()
> {
> gzFile *gzfp;
>
> gzfp = gzopen("bug.gz","w");
> gzprintf(gzfp,"Can you read me?\n");
> gzclose(gzfp);
> }
Ok, I've reproduced the problem. Figuring it out....
Thank you.
-pete
===========================================================================
Date mail was appended: Mon Oct 15 11:20:53 2007 (1192465254)
Date: Mon, 15 Oct 2007 09:53:15 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
CC: condor-support response tracking system <condor-support__AT__cs.wisc.edu>,
keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
I didn't personally try this, but the actual application that triggered
this bug report did work when condor_compiled on 6.8.5 and run on 6.9.4.
On Mon, Oct 15, 2007 at 09:07:53AM -0400, Duncan Brown wrote:
> Hi Stuart,
>
> If you condor compile this code with 6.8.5, but then run it under
> 6.9.4, does it work?
>
> Cheers,
> Duncan.
>
> On Oct 13, 2007, at 4:55 PM, Stuart Anderson wrote:
>
> >Peter,
> > Here is a simple test case that shows the problem.
> >The platform is FC4 x86_64 with the bundled gcc-4.0.2 and
> >
> >$ condor_version
> >$CondorVersion: 6.9.4 Aug 30 2007 $
> >$CondorPlatform: X86_64-LINUX_RHEL3 $
> >
> >
> >$ cat bug.c
> >/* Generates an invalid bug.gz file when condor_compile with 6.9.4 */
> >/* [condor-support #2123] */
> >
> >#include<stdio.h>
> >#include<zlib.h>
> >
> >main()
> >{
> > gzFile *gzfp;
> >
> > gzfp = gzopen("bug.gz","w");
> > gzprintf(gzfp,"Can you read me?\n");
> > gzclose(gzfp);
> >}
> >
> >$ gcc bug.c -o bug -lz
> >$ ./bug
> >$ zcat bug.gz
> >Can you read me?
> >
> >$ condor_compile gcc bug.c -o bug -lz
> >LINKING FOR CONDOR : /usr/bin/ld -L/usr/lib64 -Bstatic --eh-frame-
> >hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o
> >bug /usr/lib64/condor_rt0.o /usr/lib64/crti.o /usr/lib/gcc/x86_64-
> >redhat-linux/4.0.2/crtbeginT.o -L/usr/lib64 -L/usr/lib/gcc/x86_64-
> >redhat-linux/4.0.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2 -L/usr/
> >lib/gcc/x86_64-redhat-linux/4.0.2/../../../../lib64 -L/usr/lib/gcc/
> >x86_64-redhat-linux/4.0.2/../../.. -L/lib/../lib64 -L/usr/lib/../
> >lib64 /tmp/ccIeWr6H.o -lz /usr/lib64/libcondorsyscall.a /usr/lib64/
> >libcondor_z.a /usr/lib64/libcomp_libstdc++.a /usr/lib64/
> >libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-needed --no-as-
> >needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
> >lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
> >lcondor_resolv -lcondor_c /usr/lib64/libcomp_libgcc.a /usr/lib64/
> >libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/x86_64-
> >redhat-linux/4.0.2/crtend.o /usr/lib64/crtn.o
> >/usr/lib64/libcondorsyscall.a(condor_file_agent.o)(.text+0x29b): In
> >function `CondorFileAgent::open(char const*, int, int)':
> >: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
> >/usr/lib64/libcondorsyscall.a(switches.o)(.text+0xa4bb): In
> >function `__gets_chk':
> >: warning: the `gets' function is dangerous and should not be used.
> >$ ./bug
> >Condor: Notice: Will checkpoint to ./bug.ckpt
> >Condor: Notice: Remote system calls disabled.
> >$ zcat bug.gz
> >
> >zcat: bug.gz: invalid compressed data--format violated
> >
> >
> >This program works with and without condor_compile if condor-6.8.5 is
> >used instead of 6.9.4. A copy of the source code and broken
> >condor_compiled
> >executable may be found at,
> >
> >http://www.ligo.caltech.edu/~anderson/condor.2123
> >
> >Thanks.
> >
> >--
> >Stuart Anderson anderson__AT__ligo.caltech.edu
> >http://www.ligo.caltech.edu/~anderson
>
> --
>
> Duncan Brown Room 263-1, Department of Physics,
> Assistant Professor of Physics Syracuse University, NY 13244, USA
> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
>
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Mon Oct 15 11:53:33 2007 (1192467213)
X-Authentication-Warning: jester.phys.uwm.edu: kipp owned process doing -bs
Date: Mon, 15 Oct 2007 12:34:48 -0500 (CDT)
From: Kipp Cannon <kipp__AT__gravity.phys.uwm.edu>
X-X-Sender: kipp__AT__jester.phys.uwm.edu
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: anderson__AT__ligo.caltech.edu, keppel_d__AT__ligo.caltech.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 basalt.cs.wisc.edu
On Fri, 12 Oct 2007, condor-support response tracking system wrote:
> Hello,
>
>> From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
>>
>> We have found that one of the main LIGO analysis workflows is generating
>> corrupt output when using the Standard Universe in Condor 6.9.5 running
>> on FC4 x86_64 machines. This happens whether the static executable is
>> built on an FC4 x86 or x86_64 machine and it does not happen when running
>> in the Vanilla Universe. The problem also does not occur when condor_compiling
>> the same code on the same build machines with Condor version 6.8.5 on either
>> x86 or x86_64.
>
> Hrm. I can check the differences between those revisions in our repository and
> see if anything jumps out as to why it fails.
>
>> While it is possible that Condor 6.9.5 has merely exposed a bug in the
>> LIGO code, in my opinion there is a significant chance that this is a
>> Condor bug. Therefore, while we attempt to reduce this down to a simple test
>> program, I would like to know if there are any other known issues with 6.9.5
>> standard universe file I/O? It would also be helpful to know if you are aware
>> of any other Standard Universe applications running under 6.9.5 that use libz
>> to successfully output their results?
>
> As far as we know, there aren't any other problems (beyond reading and
> writing an fd at the same time) which could cause such corruption. While
> I don't know of any people using gzipped output at this time, in the
> past people had used it and it worked for them.
I don't believe the problem is zlib-related. We have programs that write
XML files, and can write either ASCII text XML or gzipped XML. The
gzipped XML files are corrupt (gzip --test reports a "format violation").
But all we know about the ASCII files is that they are valid XML and that
their payload, a comma-delimited table, decodes correctly. We don't
really have a convenient way to check that all the data in them is
correct.
The reason I don't think it's the gzipping is that we also have programs
that write non-gzipped files that contain time series data. These files
are in a binary in-house format, and are also corrupted when written by
programs condor-compiled against 6.9. If there is a pattern, it might be
8-bit vs. 7-bit files (the ASCII XML being, aparently, the only output
that isn't corrupted).
I don't believe this is a memory bug in our code that has been exposed by
the 6.9 condor compiling process, because it affects two independent code
paths: the .xml.gz output path, and the .gwf time series output path. I
think the only code shared by these two paths is fwrite().
-Kipp
===========================================================================
Date mail was appended: Mon Oct 15 12:34:57 2007 (1192469697)
CC: condor-support response tracking system <condor-support__AT__cs.wisc.edu>,
keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Date: Mon, 15 Oct 2007 13:38:45 -0400
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
--Apple-Mail-2--585639292
Hi Stuart,
OK, thanks. I was confused, as Chad claimed that setting his path to
use the old compiler still resulted in corrupted data, but perhaps
there was confusion over versions.
Are you going to remove 6.9.4 from the clusters until this is fixed?
Cheers,
Duncan.
On Oct 15, 2007, at 12:53 PM, Stuart Anderson wrote:
> I didn't personally try this, but the actual application that
> triggered
> this bug report did work when condor_compiled on 6.8.5 and run on
> 6.9.4.
>
> On Mon, Oct 15, 2007 at 09:07:53AM -0400, Duncan Brown wrote:
>> Hi Stuart,
>>
>> If you condor compile this code with 6.8.5, but then run it under
>> 6.9.4, does it work?
>>
>> Cheers,
>> Duncan.
>>
>> On Oct 13, 2007, at 4:55 PM, Stuart Anderson wrote:
>>
>>> Peter,
>>> Here is a simple test case that shows the problem.
>>> The platform is FC4 x86_64 with the bundled gcc-4.0.2 and
>>>
>>> $ condor_version
>>> $CondorVersion: 6.9.4 Aug 30 2007 $
>>> $CondorPlatform: X86_64-LINUX_RHEL3 $
>>>
>>>
>>> $ cat bug.c
>>> /* Generates an invalid bug.gz file when condor_compile with
>>> 6.9.4 */
>>> /* [condor-support #2123] */
>>>
>>> #include<stdio.h>
>>> #include<zlib.h>
>>>
>>> main()
>>> {
>>> gzFile *gzfp;
>>>
>>> gzfp = gzopen("bug.gz","w");
>>> gzprintf(gzfp,"Can you read me?\n");
>>> gzclose(gzfp);
>>> }
>>>
>>> $ gcc bug.c -o bug -lz
>>> $ ./bug
>>> $ zcat bug.gz
>>> Can you read me?
>>>
>>> $ condor_compile gcc bug.c -o bug -lz
>>> LINKING FOR CONDOR : /usr/bin/ld -L/usr/lib64 -Bstatic --eh-frame-
>>> hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o
>>> bug /usr/lib64/condor_rt0.o /usr/lib64/crti.o /usr/lib/gcc/x86_64-
>>> redhat-linux/4.0.2/crtbeginT.o -L/usr/lib64 -L/usr/lib/gcc/x86_64-
>>> redhat-linux/4.0.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2 -L/usr/
>>> lib/gcc/x86_64-redhat-linux/4.0.2/../../../../lib64 -L/usr/lib/gcc/
>>> x86_64-redhat-linux/4.0.2/../../.. -L/lib/../lib64 -L/usr/lib/../
>>> lib64 /tmp/ccIeWr6H.o -lz /usr/lib64/libcondorsyscall.a /usr/lib64/
>>> libcondor_z.a /usr/lib64/libcomp_libstdc++.a /usr/lib64/
>>> libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-needed --no-as-
>>> needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
>>> lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
>>> lcondor_resolv -lcondor_c /usr/lib64/libcomp_libgcc.a /usr/lib64/
>>> libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/x86_64-
>>> redhat-linux/4.0.2/crtend.o /usr/lib64/crtn.o
>>> /usr/lib64/libcondorsyscall.a(condor_file_agent.o)(.text+0x29b): In
>>> function `CondorFileAgent::open(char const*, int, int)':
>>> : warning: the use of `tmpnam' is dangerous, better use `mkstemp'
>>> /usr/lib64/libcondorsyscall.a(switches.o)(.text+0xa4bb): In
>>> function `__gets_chk':
>>> : warning: the `gets' function is dangerous and should not be used.
>>> $ ./bug
>>> Condor: Notice: Will checkpoint to ./bug.ckpt
>>> Condor: Notice: Remote system calls disabled.
>>> $ zcat bug.gz
>>>
>>> zcat: bug.gz: invalid compressed data--format violated
>>>
>>>
>>> This program works with and without condor_compile if
>>> condor-6.8.5 is
>>> used instead of 6.9.4. A copy of the source code and broken
>>> condor_compiled
>>> executable may be found at,
>>>
>>> http://www.ligo.caltech.edu/~anderson/condor.2123
>>>
>>> Thanks.
>>>
>>> --
>>> Stuart Anderson anderson__AT__ligo.caltech.edu
>>> http://www.ligo.caltech.edu/~anderson
>>
>> --
>>
>> Duncan Brown Room 263-1, Department of
>> Physics,
>> Assistant Professor of Physics Syracuse University, NY
>> 13244, USA
>> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/
>> ~duncan
>>
>>
>
>
>
> --
> Stuart Anderson anderson__AT__ligo.caltech.edu
> http://www.ligo.caltech.edu/~anderson
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-2--585639292
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAkeUMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDcwOTEyMTkwMjA3WhcNMDgwOTExMTkwMjA3WjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAM+cSlKBhUWHQNmcRf3bsHC3ngHwQC+E
/RJNoYdRC0vQPVQ6kt12TLn6IGX+Eq0p5cHxd+rmOAms7zMzdhw9CYypN4KE
rgQOubIV7zTzajSYsNBKYkX6TA/jyt1223fmruD5hEnnOCWevsruA1iLIUFS
NjZ0WqYA+2KbHvjrObPkcoB7zZ2x4CV1ayWlYRZwLUrSdzSXQKYpbmTJili/
c5GzXpd22oQCpWX8+472pM0zM9l4A3B7uTFpmunsvKox741+CeSbgJzaHkIZ
V/9TjsEO6zrg05+JEGOcXzII2mlgEWtJRmOTOco2QkZ5h0aU8XTsxVAbbtCU
Y7JH4Z91wCMCAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMB8GA1UdIwQYMBaAFMoZHRKObqQ4XULUMQ4I29mNFw1dMCIG
A1UdEQQbMBmBF2RhYnJvd25AcGh5c2ljcy5zeXIuZWR1MDoGA1UdHwQzMDEw
L6AtoCuGKWh0dHA6Ly9wa2kxLmRvZWdyaWRzLm9yZy9DUkwvMWMzZjJjYTgu
Y3JsMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgkwDQYJKoZIhvcNAQEFBQAD
ggEBAA3JV1r6C2MEwcNGarW8KBr3phLOLXoF2656DUFIy8sqler1t38f7ucX
hRSQLu26eLyGgUzrsPuiEAPqFYYNZa71DuQCcYbBs6wW7QFQrXMq7trHkXVG
qRhiHgT+tTVqxPkZgMKDcj853N9MiZod5QgYQCfEy+4A17WZ31W/2NzPgSYn
2beOsHTnMbkciPIi7Jq7E8IV0wvfPuv+ypRRhymG3VthKrRQCMKu0I4QaUfL
iX4BrlB07QesDw7X4kwR+o5flOjkjliQdBWZDcl+hyLNzbi20niOuLW1eoto
Gn9dnelZa9h2jQRqhyfvXDUpOt9jStxsSZjgkDK6L4BfmT4xggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAkeUMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA3MTAxNTE3
Mzg0NVowIwYJKoZIhvcNAQkEMRYEFNP75qZklJDgHmrlTPL7xD4HcX8aMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICR5QwgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJHlDANBgkqhkiG9w0BAQEF
AASCAQBDHyRB9Npr1VVNxILHJmEzUQte8jGtxNT4KPfZWTdI4bzZL6nz1QBO
C42hFT1STRhOzkeOFblAdUgZHLk43OjUzXWiFlnkx3BwtF9vwpMq4qLzof5H
QN3UwR8EWG+73S6JKs5Qq/VHLwB0ybSKJxZJ6k4FLXGLqQxvgMzbyByDriH/
ZF8JbRUwkR+mdbIqHpM8EvXTKf5FKZhW4fkwR9wMxqGBs+Uc7Mg67r1pshjZ
SMom0P6IgfAMgSoCjJ7KR0bsVnGu1IPXDFyUa3lKb4B5Bu1IJV7bFBGybT6p
SKZW8mY/IJ0cjrZaSzuJP9UDuCC/ewbwL6BBQiWVaGskP/U3AAAAAAAA
--Apple-Mail-2--585639292--
===========================================================================
Date mail was appended: Mon Oct 15 12:38:58 2007 (1192469938)
Date: Mon, 15 Oct 2007 10:58:12 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
CC: condor-support response tracking system <condor-support__AT__cs.wisc.edu>,
keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
On Mon, Oct 15, 2007 at 01:38:45PM -0400, Duncan Brown wrote:
> Hi Stuart,
>
> OK, thanks. I was confused, as Chad claimed that setting his path to
> use the old compiler still resulted in corrupted data, but perhaps
> there was confusion over versions.
>
> Are you going to remove 6.9.4 from the clusters until this is fixed?
Another option would be to disable just the condor_compile executable?
>
> Cheers,
> Duncan.
>
> On Oct 15, 2007, at 12:53 PM, Stuart Anderson wrote:
>
> >I didn't personally try this, but the actual application that
> >triggered
> >this bug report did work when condor_compiled on 6.8.5 and run on
> >6.9.4.
> >
> >On Mon, Oct 15, 2007 at 09:07:53AM -0400, Duncan Brown wrote:
> >>Hi Stuart,
> >>
> >>If you condor compile this code with 6.8.5, but then run it under
> >>6.9.4, does it work?
> >>
> >>Cheers,
> >>Duncan.
> >>
> >>On Oct 13, 2007, at 4:55 PM, Stuart Anderson wrote:
> >>
> >>>Peter,
> >>> Here is a simple test case that shows the problem.
> >>>The platform is FC4 x86_64 with the bundled gcc-4.0.2 and
> >>>
> >>>$ condor_version
> >>>$CondorVersion: 6.9.4 Aug 30 2007 $
> >>>$CondorPlatform: X86_64-LINUX_RHEL3 $
> >>>
> >>>
> >>>$ cat bug.c
> >>>/* Generates an invalid bug.gz file when condor_compile with
> >>>6.9.4 */
> >>>/* [condor-support #2123] */
> >>>
> >>>#include<stdio.h>
> >>>#include<zlib.h>
> >>>
> >>>main()
> >>>{
> >>> gzFile *gzfp;
> >>>
> >>> gzfp = gzopen("bug.gz","w");
> >>> gzprintf(gzfp,"Can you read me?\n");
> >>> gzclose(gzfp);
> >>>}
> >>>
> >>>$ gcc bug.c -o bug -lz
> >>>$ ./bug
> >>>$ zcat bug.gz
> >>>Can you read me?
> >>>
> >>>$ condor_compile gcc bug.c -o bug -lz
> >>>LINKING FOR CONDOR : /usr/bin/ld -L/usr/lib64 -Bstatic --eh-frame-
> >>>hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o
> >>>bug /usr/lib64/condor_rt0.o /usr/lib64/crti.o /usr/lib/gcc/x86_64-
> >>>redhat-linux/4.0.2/crtbeginT.o -L/usr/lib64 -L/usr/lib/gcc/x86_64-
> >>>redhat-linux/4.0.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2 -L/usr/
> >>>lib/gcc/x86_64-redhat-linux/4.0.2/../../../../lib64 -L/usr/lib/gcc/
> >>>x86_64-redhat-linux/4.0.2/../../.. -L/lib/../lib64 -L/usr/lib/../
> >>>lib64 /tmp/ccIeWr6H.o -lz /usr/lib64/libcondorsyscall.a /usr/lib64/
> >>>libcondor_z.a /usr/lib64/libcomp_libstdc++.a /usr/lib64/
> >>>libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-needed --no-as-
> >>>needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
> >>>lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
> >>>lcondor_resolv -lcondor_c /usr/lib64/libcomp_libgcc.a /usr/lib64/
> >>>libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/x86_64-
> >>>redhat-linux/4.0.2/crtend.o /usr/lib64/crtn.o
> >>>/usr/lib64/libcondorsyscall.a(condor_file_agent.o)(.text+0x29b): In
> >>>function `CondorFileAgent::open(char const*, int, int)':
> >>>: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
> >>>/usr/lib64/libcondorsyscall.a(switches.o)(.text+0xa4bb): In
> >>>function `__gets_chk':
> >>>: warning: the `gets' function is dangerous and should not be used.
> >>>$ ./bug
> >>>Condor: Notice: Will checkpoint to ./bug.ckpt
> >>>Condor: Notice: Remote system calls disabled.
> >>>$ zcat bug.gz
> >>>
> >>>zcat: bug.gz: invalid compressed data--format violated
> >>>
> >>>
> >>>This program works with and without condor_compile if
> >>>condor-6.8.5 is
> >>>used instead of 6.9.4. A copy of the source code and broken
> >>>condor_compiled
> >>>executable may be found at,
> >>>
> >>>http://www.ligo.caltech.edu/~anderson/condor.2123
> >>>
> >>>Thanks.
> >>>
> >>>--
> >>>Stuart Anderson anderson__AT__ligo.caltech.edu
> >>>http://www.ligo.caltech.edu/~anderson
> >>
> >>--
> >>
> >>Duncan Brown Room 263-1, Department of
> >>Physics,
> >>Assistant Professor of Physics Syracuse University, NY
> >>13244, USA
> >>Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/
> >>~duncan
> >>
> >>
> >
> >
> >
> >--
> >Stuart Anderson anderson__AT__ligo.caltech.edu
> >http://www.ligo.caltech.edu/~anderson
>
> --
>
> Duncan Brown Room 263-1, Department of Physics,
> Assistant Professor of Physics Syracuse University, NY 13244, USA
> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
>
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Mon Oct 15 12:58:29 2007 (1192471110)
CC: condor-support response tracking system <condor-support__AT__cs.wisc.edu>,
keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Date: Mon, 15 Oct 2007 14:55:09 -0400
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
--Apple-Mail-1--581055617
On Oct 15, 2007, at 1:58 PM, Stuart Anderson wrote:
> On Mon, Oct 15, 2007 at 01:38:45PM -0400, Duncan Brown wrote:
>> Hi Stuart,
>>
>> OK, thanks. I was confused, as Chad claimed that setting his path to
>> use the old compiler still resulted in corrupted data, but perhaps
>> there was confusion over versions.
>>
>> Are you going to remove 6.9.4 from the clusters until this is fixed?
>
> Another option would be to disable just the condor_compile executable?
Could we drop in the 6.8.5 condor_compile executable? I think paths
are hard coded in this.
Cheers,
Duncan.
>
>>
>> Cheers,
>> Duncan.
>>
>> On Oct 15, 2007, at 12:53 PM, Stuart Anderson wrote:
>>
>>> I didn't personally try this, but the actual application that
>>> triggered
>>> this bug report did work when condor_compiled on 6.8.5 and run on
>>> 6.9.4.
>>>
>>> On Mon, Oct 15, 2007 at 09:07:53AM -0400, Duncan Brown wrote:
>>>> Hi Stuart,
>>>>
>>>> If you condor compile this code with 6.8.5, but then run it under
>>>> 6.9.4, does it work?
>>>>
>>>> Cheers,
>>>> Duncan.
>>>>
>>>> On Oct 13, 2007, at 4:55 PM, Stuart Anderson wrote:
>>>>
>>>>> Peter,
>>>>> Here is a simple test case that shows the problem.
>>>>> The platform is FC4 x86_64 with the bundled gcc-4.0.2 and
>>>>>
>>>>> $ condor_version
>>>>> $CondorVersion: 6.9.4 Aug 30 2007 $
>>>>> $CondorPlatform: X86_64-LINUX_RHEL3 $
>>>>>
>>>>>
>>>>> $ cat bug.c
>>>>> /* Generates an invalid bug.gz file when condor_compile with
>>>>> 6.9.4 */
>>>>> /* [condor-support #2123] */
>>>>>
>>>>> #include<stdio.h>
>>>>> #include<zlib.h>
>>>>>
>>>>> main()
>>>>> {
>>>>> gzFile *gzfp;
>>>>>
>>>>> gzfp = gzopen("bug.gz","w");
>>>>> gzprintf(gzfp,"Can you read me?\n");
>>>>> gzclose(gzfp);
>>>>> }
>>>>>
>>>>> $ gcc bug.c -o bug -lz
>>>>> $ ./bug
>>>>> $ zcat bug.gz
>>>>> Can you read me?
>>>>>
>>>>> $ condor_compile gcc bug.c -o bug -lz
>>>>> LINKING FOR CONDOR : /usr/bin/ld -L/usr/lib64 -Bstatic --eh-frame-
>>>>> hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o
>>>>> bug /usr/lib64/condor_rt0.o /usr/lib64/crti.o /usr/lib/gcc/x86_64-
>>>>> redhat-linux/4.0.2/crtbeginT.o -L/usr/lib64 -L/usr/lib/gcc/x86_64-
>>>>> redhat-linux/4.0.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.0.2 -L/
>>>>> usr/
>>>>> lib/gcc/x86_64-redhat-linux/4.0.2/../../../../lib64 -L/usr/lib/
>>>>> gcc/
>>>>> x86_64-redhat-linux/4.0.2/../../.. -L/lib/../lib64 -L/usr/lib/../
>>>>> lib64 /tmp/ccIeWr6H.o -lz /usr/lib64/libcondorsyscall.a /usr/
>>>>> lib64/
>>>>> libcondor_z.a /usr/lib64/libcomp_libstdc++.a /usr/lib64/
>>>>> libcomp_libgcc.a /usr/lib64/libcomp_libgcc_eh.a --as-needed --
>>>>> no-as-
>>>>> needed -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
>>>>> lcondor_resolv -lcondor_c -lcondor_nss_files -lcondor_nss_dns -
>>>>> lcondor_resolv -lcondor_c /usr/lib64/libcomp_libgcc.a /usr/lib64/
>>>>> libcomp_libgcc_eh.a --as-needed --no-as-needed /usr/lib/gcc/
>>>>> x86_64-
>>>>> redhat-linux/4.0.2/crtend.o /usr/lib64/crtn.o
>>>>> /usr/lib64/libcondorsyscall.a(condor_file_agent.o)(.text
>>>>> +0x29b): In
>>>>> function `CondorFileAgent::open(char const*, int, int)':
>>>>> : warning: the use of `tmpnam' is dangerous, better use `mkstemp'
>>>>> /usr/lib64/libcondorsyscall.a(switches.o)(.text+0xa4bb): In
>>>>> function `__gets_chk':
>>>>> : warning: the `gets' function is dangerous and should not be
>>>>> used.
>>>>> $ ./bug
>>>>> Condor: Notice: Will checkpoint to ./bug.ckpt
>>>>> Condor: Notice: Remote system calls disabled.
>>>>> $ zcat bug.gz
>>>>>
>>>>> zcat: bug.gz: invalid compressed data--format violated
>>>>>
>>>>>
>>>>> This program works with and without condor_compile if
>>>>> condor-6.8.5 is
>>>>> used instead of 6.9.4. A copy of the source code and broken
>>>>> condor_compiled
>>>>> executable may be found at,
>>>>>
>>>>> http://www.ligo.caltech.edu/~anderson/condor.2123
>>>>>
>>>>> Thanks.
>>>>>
>>>>> --
>>>>> Stuart Anderson anderson__AT__ligo.caltech.edu
>>>>> http://www.ligo.caltech.edu/~anderson
>>>>
>>>> --
>>>>
>>>> Duncan Brown Room 263-1, Department of
>>>> Physics,
>>>> Assistant Professor of Physics Syracuse University, NY
>>>> 13244, USA
>>>> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/
>>>> ~duncan
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Stuart Anderson anderson__AT__ligo.caltech.edu
>>> http://www.ligo.caltech.edu/~anderson
>>
>> --
>>
>> Duncan Brown Room 263-1, Department of
>> Physics,
>> Assistant Professor of Physics Syracuse University, NY
>> 13244, USA
>> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/
>> ~duncan
>>
>>
>
>
>
> --
> Stuart Anderson anderson__AT__ligo.caltech.edu
> http://www.ligo.caltech.edu/~anderson
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-1--581055617
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAkeUMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDcwOTEyMTkwMjA3WhcNMDgwOTExMTkwMjA3WjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAM+cSlKBhUWHQNmcRf3bsHC3ngHwQC+E
/RJNoYdRC0vQPVQ6kt12TLn6IGX+Eq0p5cHxd+rmOAms7zMzdhw9CYypN4KE
rgQOubIV7zTzajSYsNBKYkX6TA/jyt1223fmruD5hEnnOCWevsruA1iLIUFS
NjZ0WqYA+2KbHvjrObPkcoB7zZ2x4CV1ayWlYRZwLUrSdzSXQKYpbmTJili/
c5GzXpd22oQCpWX8+472pM0zM9l4A3B7uTFpmunsvKox741+CeSbgJzaHkIZ
V/9TjsEO6zrg05+JEGOcXzII2mlgEWtJRmOTOco2QkZ5h0aU8XTsxVAbbtCU
Y7JH4Z91wCMCAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMB8GA1UdIwQYMBaAFMoZHRKObqQ4XULUMQ4I29mNFw1dMCIG
A1UdEQQbMBmBF2RhYnJvd25AcGh5c2ljcy5zeXIuZWR1MDoGA1UdHwQzMDEw
L6AtoCuGKWh0dHA6Ly9wa2kxLmRvZWdyaWRzLm9yZy9DUkwvMWMzZjJjYTgu
Y3JsMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgkwDQYJKoZIhvcNAQEFBQAD
ggEBAA3JV1r6C2MEwcNGarW8KBr3phLOLXoF2656DUFIy8sqler1t38f7ucX
hRSQLu26eLyGgUzrsPuiEAPqFYYNZa71DuQCcYbBs6wW7QFQrXMq7trHkXVG
qRhiHgT+tTVqxPkZgMKDcj853N9MiZod5QgYQCfEy+4A17WZ31W/2NzPgSYn
2beOsHTnMbkciPIi7Jq7E8IV0wvfPuv+ypRRhymG3VthKrRQCMKu0I4QaUfL
iX4BrlB07QesDw7X4kwR+o5flOjkjliQdBWZDcl+hyLNzbi20niOuLW1eoto
Gn9dnelZa9h2jQRqhyfvXDUpOt9jStxsSZjgkDK6L4BfmT4xggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAkeUMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA3MTAxNTE4
NTUwOVowIwYJKoZIhvcNAQkEMRYEFCGqE80qYerVmbIGj3+GYmz1kPC4MH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICR5QwgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJHlDANBgkqhkiG9w0BAQEF
AASCAQBhOzhcAFqIx9coRuzbTR2PWxJifxO4ukSKmOkx2hBelrP/3/pZxQSx
SMR9A8Q7WPGSDfG9vYtjz0BbESvVKVul6zHhrcbxK+L4WH6kW/xL/yrjRz45
tDZ8ffm22cP4TOUDGgT/j+0F1wft4NL1Fqf8t2gK/ors2JsEM/8nax2mzaiG
WsAb9TGijJrb0tLmkEXbpeangHrLlpuO0wbJ2l9SqatQhugDxKYQIV95fJ78
YbOd5T0hTQoZtGGVmcHsaHhUBguOSklmubiVI8NYDdvRabxjAVrT5Xlv13ie
XROnepeOvgCIuOqH5NTS9HSpvk+3+6LtiR2DnfJ2Mgol1Ym8AAAAAAAA
--Apple-Mail-1--581055617--
===========================================================================
Date mail was appended: Mon Oct 15 13:55:26 2007 (1192474527)
Date: Mon, 15 Oct 2007 16:40:38 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
> Could we drop in the 6.8.5 condor_compile executable? I think paths
> are hard coded in this.
Actually, you'd also need the libcondorzsyscall.a or libcondorsyscall.a
file in addtion to the different condor_compile.
(Still debugging....)
Thank you.
-pete
===========================================================================
Date mail was appended: Mon Oct 15 16:40:47 2007 (1192484447)
X-Authentication-Warning: jester.phys.uwm.edu: kipp owned process doing -bs
Date: Mon, 15 Oct 2007 16:47:36 -0500 (CDT)
From: Kipp Cannon <kipp__AT__gravity.phys.uwm.edu>
X-X-Sender: kipp__AT__jester.phys.uwm.edu
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: anderson__AT__ligo.caltech.edu, keppel_d__AT__ligo.caltech.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
--1501655869-1151271326-1192484856=:4756
Hi,
Attached is a program that does not use zlib and demonstrates the problem.
There is an #if/#else/#endif switch in the code to select between two
different tests. In both cases, the output when condor_compiled is
different from when not condor_compiled on a 64 bit machine with 6.9
installed, and the outputs are identical on a 32 bit machine with 6.8
installed. I compile with
$ gcc -Wall a.c
and
$ condor_compile gcc -Wall a.c
In one test, 10,000 ints generated by random() are written to a file as
binary data using write(). In the second, 10,000 ints are written in
order starting from 0. hexdump'ing the outputs and comparing with diff,
it appears that the occasional 16-bit word is being zeroed. When writing
random ints, there does not appear to be any pattern to which 16-bit words
are zeroed: sometimes just one gets zeroed, sometimes three in a row.
When writing the ints in order, it appears that every 512th word is zeroed
but maybe more (program doesn't write enough ints to detect zeroed data in
the odd 16-bit words).
-Kipp
On Mon, 15 Oct 2007, Kipp Cannon wrote:
> On Fri, 12 Oct 2007, condor-support response tracking system wrote:
>
>> Hello,
>>
>>> From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
>>>
>>> We have found that one of the main LIGO analysis workflows is generating
>>> corrupt output when using the Standard Universe in Condor 6.9.5 running
>>> on FC4 x86_64 machines. This happens whether the static executable is
>>> built on an FC4 x86 or x86_64 machine and it does not happen when running
>>> in the Vanilla Universe. The problem also does not occur when
>>> condor_compiling
>>> the same code on the same build machines with Condor version 6.8.5 on
>>> either
>>> x86 or x86_64.
>>
>> Hrm. I can check the differences between those revisions in our repository
>> and
>> see if anything jumps out as to why it fails.
>>
>>> While it is possible that Condor 6.9.5 has merely exposed a bug in the
>>> LIGO code, in my opinion there is a significant chance that this is a
>>> Condor bug. Therefore, while we attempt to reduce this down to a simple
>>> test
>>> program, I would like to know if there are any other known issues with
>>> 6.9.5
>>> standard universe file I/O? It would also be helpful to know if you are
>>> aware
>>> of any other Standard Universe applications running under 6.9.5 that use
>>> libz
>>> to successfully output their results?
>>
>> As far as we know, there aren't any other problems (beyond reading and
>> writing an fd at the same time) which could cause such corruption. While
>> I don't know of any people using gzipped output at this time, in the
>> past people had used it and it worked for them.
>
> I don't believe the problem is zlib-related. We have programs that write XML
> files, and can write either ASCII text XML or gzipped XML. The gzipped XML
> files are corrupt (gzip --test reports a "format violation"). But all we know
> about the ASCII files is that they are valid XML and that their payload, a
> comma-delimited table, decodes correctly. We don't really have a convenient
> way to check that all the data in them is correct.
>
> The reason I don't think it's the gzipping is that we also have programs that
> write non-gzipped files that contain time series data. These files are in a
> binary in-house format, and are also corrupted when written by programs
> condor-compiled against 6.9. If there is a pattern, it might be 8-bit vs.
> 7-bit files (the ASCII XML being, aparently, the only output that isn't
> corrupted).
>
> I don't believe this is a memory bug in our code that has been exposed by the
> 6.9 condor compiling process, because it affects two independent code paths:
> the .xml.gz output path, and the .gwf time series output path. I think the
> only code shared by these two paths is fwrite().
>
> -Kipp
>
--1501655869-1151271326-1192484856=:4756
Content-ID: <Pine.LNX.4.63.0710151647360.4756__AT__jester.phys.uwm.edu>
Content-Description:
I2luY2x1ZGUgPHN5cy90eXBlcy5oPg0KI2luY2x1ZGUgPHN5cy9zdGF0Lmg+
DQojaW5jbHVkZSA8ZmNudGwuaD4NCiNpbmNsdWRlIDxzdGRsaWIuaD4NCiNp
bmNsdWRlIDx1bmlzdGQuaD4NCg0KaW50IG1haW4oaW50IGFyZ2MsIGNoYXIg
KmFyZ3ZbXSkNCnsNCglpbnQgZiA9IGNyZWF0KCJvdXRwdXQuZGF0IiwgU19J
UlVTUiB8IFNfSVdVU1IpOw0KCWludCBpOw0KDQojaWYgMQ0KCXNyYW5kb20o
MCk7DQoNCglmb3IoaSA9IDA7IGkgPCAxMDAwMDsgaSsrKSB7DQoJCWludCB4
ID0gcmFuZG9tKCk7DQoJCXdyaXRlKGYsICZ4LCBzaXplb2YoeCkpOw0KCX0N
CiNlbHNlDQoJZm9yKGkgPSAwOyBpIDwgMTAwMDA7IGkrKykNCgkJd3JpdGUo
ZiwgJmksIHNpemVvZihpKSk7DQojZW5kaWYNCg0KCWNsb3NlKGYpOw0KCXJl
dHVybiAwOw0KfQ0K
--1501655869-1151271326-1192484856=:4756--
===========================================================================
Date mail was appended: Mon Oct 15 16:47:46 2007 (1192484867)
Date: Mon, 15 Oct 2007 16:51:38 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
> I don't believe this is a memory bug in our code that has been exposed by
> the 6.9 condor compiling process, because it affects two independent code
> paths: the .xml.gz output path, and the .gwf time series output path. I
> think the only code shared by these two paths is fwrite().
I've created a test program which, using only fwrite() has produced erroneous
output of binary data. Here is the program.
#include<stdio.h>
#include<stdlib.h>
#include<zlib.h>
int main(void)
{
FILE *f;
int i;
char array[37];
f = fopen("bug-test", "w");
if (f == NULL) {
printf("Sorry.\n");
exit(EXIT_FAILURE);
}
for (i = 0; i < 37; i++) {
array[i] = (char)i;
}
fwrite(array, 37, 1, f);
fclose(f);
return EXIT_SUCCESS;
}
The output was the correct length but seemed garbage and sometimes all zeroes.
Now that zlib is out of the equation, I'll probably be bale to find this bug
in reasonable time.
Of course, this program (or something similar to it) will end up in our
test suite so this doesn't happen again.
Thank you.
-pete
===========================================================================
Date mail was appended: Mon Oct 15 16:51:46 2007 (1192485107)
Date: Mon, 15 Oct 2007 16:52:14 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe This message is in MIME format. The first part should
be readable text, while the remaining parts are likely unreadable without
MIME-aware tools.
> Attached is a program that does not use zlib and demonstrates the problem.
That's really funny, since I just found one as well. :)
Either way, it helps me a lot in debugging it.
Thanks!
-pete
===========================================================================
Date mail was appended: Mon Oct 15 16:52:22 2007 (1192485142)
Date: Tue, 16 Oct 2007 10:53:17 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
I've found the actual bug and am now in the process of fixing the source
in question and testing it. Hopefully, I should have a new set of libraries
for you today. What architectures would you like them for? RHEL3 x86 and
RHEL3 x86_64?
Thank you.
-pete
===========================================================================
Date mail was appended: Tue Oct 16 10:53:24 2007 (1192550004)
Date: Tue, 16 Oct 2007 10:19:55 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu,
channa__AT__phys.lsu.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu
On Tue, Oct 16, 2007 at 10:53:24AM -0500, condor-support response tracking system wrote:
> Hello,
>
> I've found the actual bug and am now in the process of fixing the source
> in question and testing it. Hopefully, I should have a new set of libraries
> for you today. What architectures would you like them for? RHEL3 x86 and
> RHEL3 x86_64?
Peter,
Thanks for the very quick turn around on this. Yes, we would like
patched files both of the architectures you indicated. I wold also appreciate
a description of the actual bug, its scope and when it was introduced so we
can evaluate the impact on our previous analysis.
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Tue Oct 16 12:20:18 2007 (1192555218)
Date: Tue, 16 Oct 2007 14:53:52 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
> Thanks for the very quick turn around on this. Yes, we would like
> patched files both of the architectures you indicated. I wold also appreciate
> a description of the actual bug, its scope and when it was introduced so we
> can evaluate the impact on our previous analysis.
The binaries are still building at this time, but I can at least tell you what
happened.
On 6/11/2007, a code warning in condor_ckpt/file_state.C:847 was fixed:
In this method function of the CondorFileTable object:
ssize_t CondorFileTable::write( int fd, const void *data, size_t nbyte )
This line of code was found with the warning:
casting away const converting from const char* to char*
int actual = f->write( fp->offset, (char *) data, nbyte );
It was fixed like this, presumably to make it more "legit".
char *_data = strdup(data);
int actual = f->write( fp->offset, _data, nbyte );
free(_data);
However, the original coder didn't realize that binary data could have
existed in the 'data' buffer (among other problems with their fix). So
the strdup() could end early, in which case a pointer to a shortened
buffer would be passed to f->write() and garbage read from the heap
(up to nbyte) into the fd, or the strdup() could copy an area much
larger than the nbyte source array resulting in a segfault or trashed
memory in other parts of the heap. This would work fine for ASCII and
fail miserably for binary, which was the evidence you folks presented.
This code was code reviewed, but the person who reviewed it I don't think
understood the fine structure of this area of the code and the comments
and types around this code most likely led them astry.
The fix was to put the code back to the original form leaving the warning
in place. I also searched the nearby code, but did not find any similar
pieces of code. I commented the specifics about why fixing the warning is
actually very difficult because there exists a semantic problem between
const void * and char * as it exists in the FileTable API, and as those
two relate to the UNIX system API which has a different API (sometimes
write() can take char*, sometimes void*, sometimes const void*, it all
depends on architecture).
This broken behavior existed in 6.9.3 and 6.9.4. Our test suite did not
catch this error because we didn't have (surprisingly) a test which
read and wrote specifically binary data with the right edge cases in
the stduniv test suite.
So, I will be adding a test to the 6.8 series of Condor (which will
propogate to 6.9 when a merge happens) specifically to test binary
read/writes in the stduniv, and another to test the use of zlib in
stduniv. Also, the 6.9.4 manual is being updated to reflect the known bug.
Thank you.
-pete
===========================================================================
Date mail was appended: Tue Oct 16 14:54:00 2007 (1192564440)
Date: Tue, 16 Oct 2007 13:28:22 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu,
channa__AT__phys.lsu.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
Peter,
I appreciate the detailed explanation and eagerly await the new
binaries. Please confirm that the small set of patched files you will
be providing are completely independent from the rest of Condor and can
be safely dropped on top of an out-of-the-box 6.9.4 install?
Please also confirm that the scope of this bug is restricted to
applications that meet all of the following, and only the following:
1) Condor_compiled with 6.9.3 or .4
2) Run in the standard universe
3) Output binary data with write() --possibly called from a higher level
routine like fwrite().
and that the version of condor running on the pool makes no difference.
P.S. I would recommend you more widely advertising this bug beyond
retroactively updating the 6.9.4 release notes. I suspect many people read
the release notes once and do not expect them to change after that. I know
I would have been very grateful to learn about this bug via one or more of
the condor mailing lists if another group had found it first.
Thanks.
On Tue, Oct 16, 2007 at 02:54:00PM -0500, condor-support response tracking system wrote:
> Hello,
>
> > Thanks for the very quick turn around on this. Yes, we would like
> > patched files both of the architectures you indicated. I wold also appreciate
> > a description of the actual bug, its scope and when it was introduced so we
> > can evaluate the impact on our previous analysis.
>
> The binaries are still building at this time, but I can at least tell you what
> happened.
>
> On 6/11/2007, a code warning in condor_ckpt/file_state.C:847 was fixed:
>
> In this method function of the CondorFileTable object:
> ssize_t CondorFileTable::write( int fd, const void *data, size_t nbyte )
>
> This line of code was found with the warning:
> casting away const converting from const char* to char*
>
> int actual = f->write( fp->offset, (char *) data, nbyte );
>
> It was fixed like this, presumably to make it more "legit".
>
> char *_data = strdup(data);
> int actual = f->write( fp->offset, _data, nbyte );
> free(_data);
>
> However, the original coder didn't realize that binary data could have
> existed in the 'data' buffer (among other problems with their fix). So
> the strdup() could end early, in which case a pointer to a shortened
> buffer would be passed to f->write() and garbage read from the heap
> (up to nbyte) into the fd, or the strdup() could copy an area much
> larger than the nbyte source array resulting in a segfault or trashed
> memory in other parts of the heap. This would work fine for ASCII and
> fail miserably for binary, which was the evidence you folks presented.
>
> This code was code reviewed, but the person who reviewed it I don't think
> understood the fine structure of this area of the code and the comments
> and types around this code most likely led them astry.
>
> The fix was to put the code back to the original form leaving the warning
> in place. I also searched the nearby code, but did not find any similar
> pieces of code. I commented the specifics about why fixing the warning is
> actually very difficult because there exists a semantic problem between
> const void * and char * as it exists in the FileTable API, and as those
> two relate to the UNIX system API which has a different API (sometimes
> write() can take char*, sometimes void*, sometimes const void*, it all
> depends on architecture).
>
> This broken behavior existed in 6.9.3 and 6.9.4. Our test suite did not
> catch this error because we didn't have (surprisingly) a test which
> read and wrote specifically binary data with the right edge cases in
> the stduniv test suite.
>
> So, I will be adding a test to the 6.8 series of Condor (which will
> propogate to 6.9 when a merge happens) specifically to test binary
> read/writes in the stduniv, and another to test the use of zlib in
> stduniv. Also, the 6.9.4 manual is being updated to reflect the known bug.
>
> Thank you.
>
> -pete
>
>
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: anderson__AT__ligo.caltech.edu, keppel_d__AT__ligo.caltech.edu,kipp__AT__gravity.phys.uwm.edu,dabrown__AT__physics.syr.edu,skoranda__AT__gravity.phys.uwm.edu,channa__AT__phys.lsu.edu
>
> --
> ======================================================================
> This mail was sent from the RUST Mail System
> Please direct all replies to condor-support__AT__cs.wisc.edu
> Please include the current subject line in your reply.
> ======================================================================
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Tue Oct 16 15:28:39 2007 (1192566519)
Date: Tue, 16 Oct 2007 15:52:41 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
> I appreciate the detailed explanation and eagerly await the new
> binaries. Please confirm that the small set of patched files you will
> be providing are completely independent from the rest of Condor and can
> be safely dropped on top of an out-of-the-box 6.9.4 install?
I'll test this by hand to ensure it works before giving you the patch
files.
> Please also confirm that the scope of this bug is restricted to
> applications that meet all of the following, and only the following:
> 1) Condor_compiled with 6.9.3 or .4
> 2) Run in the standard universe
> 3) Output binary data with write() --possibly called from a higher level
> routine like fwrite().
> and that the version of condor running on the pool makes no difference.
I confirm the above to be the case. The user job could be running on
an execute node whose Condor version is very different from the submit
machine and it would still fail.
The bug only arises in stduniv user jobs due to the libcondorzsyscall.a
or libcondorsyscall.a library being linked in with the application via
condor_compile. No other feature sets of Condor should be affected.
> P.S. I would recommend you more widely advertising this bug beyond
> retroactively updating the 6.9.4 release notes. I suspect many people read
> the release notes once and do not expect them to change after that. I know
> I would have been very grateful to learn about this bug via one or more of
> the condor mailing lists if another group had found it first.
I'll see what I can do.
Thank you.
-pete
===========================================================================
Date mail was appended: Tue Oct 16 15:52:48 2007 (1192567969)
Date: Tue, 16 Oct 2007 14:15:55 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu,
channa__AT__phys.lsu.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
Peter,
To be pedantic, please also confirm that there is no corresonding
call to strdup() that would similarly corrupt reading binary data.
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Tue Oct 16 16:16:16 2007 (1192569376)
Date: Tue, 16 Oct 2007 16:19:25 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
On Tue, Oct 16, 2007 at 04:16:16PM -0500, condor-support response tracking system wrote:
> To be pedantic, please also confirm that there is no corresonding
> call to strdup() that would similarly corrupt reading binary data.
I had already checked and I did not see any corresponding situation for read.
However, before you get these files, I'll ensure the test program does the
right thing for both writing and reading.
Thank you.
-pete
===========================================================================
Date mail was appended: Tue Oct 16 16:19:33 2007 (1192569574)
Date: Tue, 16 Oct 2007 16:50:11 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
I'm still waiting for the test suites to finish to verify the test is ok,
plus I still have to verify the procedure and correctness for patching
a previously installed 6.9.4. So, you should get your fixed fileset
tomorrow mid-day thereabouts.
Thank you.
-pete
===========================================================================
Date mail was appended: Tue Oct 16 16:50:23 2007 (1192571424)
Date: Wed, 17 Oct 2007 12:14:44 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
Hello,
I have created the patch you need. You may find it at the ftp site
at ftp.cs.wisc.edu.
ftp ftp.cs.wisc.edu
login as ftp
password ftp@
> cd condor/temporary/forligo/6.9.4-stduniv-binary-patch
> bin
> get 6.9.4-stduniv-binary-patch.tar.gz
> bye
Untar the tarball, and copy the new libcondor(z)?syscall.a files, as
appropriate for your architecture, into Condor's <appropriate architecture
install directory>/lib directory.
Then, relink with condor_compile any 6.9.4 stduniv executables you may
have had and resubmit them.
This patch is against the exact 6.9.4 revision of Condor (but still
identified differently when you ident the stduniv executables), so there
isn't go to be any version compatibility problems.
I'm out for a few hours this afternoon, but I can check up later to see if
things are going ok.
I apologize for this bug. The new test going into the test suite tomorrow
should prevent this type of problem from happening again.
Once I see it working at your site, I'll do a more general announcement.
Thank you.
-pete
===========================================================================
Date mail was appended: Wed Oct 17 12:14:54 2007 (1192641295)
Date: Wed, 17 Oct 2007 10:35:05 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu,
channa__AT__phys.lsu.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
Peter,
Is there a reason that x86_64 directly only has a patched
libcondorsyscall.a file while the x86 directory also libcondorzsyscall.a?
Thanks.
On Wed, Oct 17, 2007 at 12:14:54PM -0500, condor-support response tracking system wrote:
> Hello,
>
> I have created the patch you need. You may find it at the ftp site
> at ftp.cs.wisc.edu.
>
> ftp ftp.cs.wisc.edu
> login as ftp
> password ftp@
> > cd condor/temporary/forligo/6.9.4-stduniv-binary-patch
> > bin
> > get 6.9.4-stduniv-binary-patch.tar.gz
> > bye
>
> Untar the tarball, and copy the new libcondor(z)?syscall.a files, as
> appropriate for your architecture, into Condor's <appropriate architecture
> install directory>/lib directory.
>
> Then, relink with condor_compile any 6.9.4 stduniv executables you may
> have had and resubmit them.
>
> This patch is against the exact 6.9.4 revision of Condor (but still
> identified differently when you ident the stduniv executables), so there
> isn't go to be any version compatibility problems.
>
> I'm out for a few hours this afternoon, but I can check up later to see if
> things are going ok.
>
> I apologize for this bug. The new test going into the test suite tomorrow
> should prevent this type of problem from happening again.
>
> Once I see it working at your site, I'll do a more general announcement.
>
> Thank you.
>
> -pete
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: anderson__AT__ligo.caltech.edu, keppel_d__AT__ligo.caltech.edu,kipp__AT__gravity.phys.uwm.edu,dabrown__AT__physics.syr.edu,skoranda__AT__gravity.phys.uwm.edu,channa__AT__phys.lsu.edu
>
> --
> ======================================================================
> This mail was sent from the RUST Mail System
> Please direct all replies to condor-support__AT__cs.wisc.edu
> Please include the current subject line in your reply.
> ======================================================================
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Wed Oct 17 12:35:26 2007 (1192642527)
Date: Wed, 17 Oct 2007 14:00:07 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
> Is there a reason that x86_64 directly only has a patched
> libcondorsyscall.a file while the x86 directory also libcondorzsyscall.a?
Yes. There isn't be a libcondorzsyscall.a for the x86_64 platform.
Checkpoint compression is not yet supported on that platform.
Thank you.
-pete
===========================================================================
Date mail was appended: Wed Oct 17 14:00:30 2007 (1192647632)
Date: Wed, 17 Oct 2007 15:56:30 -0500
From: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: anderson__AT__ligo.caltech.edu, keppel_d__AT__ligo.caltech.edu,
kipp__AT__gravity.phys.uwm.edu, dabrown__AT__physics.syr.edu, channa__AT__phys.lsu.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 lava.cs.wisc.edu
Hi,
This patch has been put into place on hydra.phys.uwm.edu and
marlin.phys.uwm.edu, as well as the s0001 node.
These are the only places on the UWM cluster where people
should be condor compiling code.
I have quickly tested by using bug.c sent earlier by Stuart
and I can no longer reproduce the bug.
Stuart, once you have this done at Caltech do you want to send
a note to the DASWG list to let people know?
I will also put a note in the motd on our cluster.
Scott
> > Is there a reason that x86_64 directly only has a patched
> > libcondorsyscall.a file while the x86 directory also libcondorzsyscall.a?
>
> Yes. There isn't be a libcondorzsyscall.a for the x86_64 platform.
> Checkpoint compression is not yet supported on that platform.
>
> Thank you.
>
> -pete
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: anderson__AT__ligo.caltech.edu, keppel_d__AT__ligo.caltech.edu,kipp__AT__gravity.phys.uwm.edu,dabrown__AT__physics.syr.edu,skoranda__AT__gravity.phys.uwm.edu,channa__AT__phys.lsu.edu
>
> --
> ======================================================================
> This mail was sent from the RUST Mail System
> Please direct all replies to condor-support__AT__cs.wisc.edu
> Please include the current subject line in your reply.
> ======================================================================
===========================================================================
Date mail was appended: Wed Oct 17 15:54:08 2007 (1192654449)
Date: Wed, 17 Oct 2007 15:13:30 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
CC: condor-support response tracking system <condor-support__AT__cs.wisc.edu>,
keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
dabrown__AT__physics.syr.edu, channa__AT__phys.lsu.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 lava.cs.wisc.edu
The patch libraries have also been confirmed to work on the test code and
the original science code for both 32-bit and 64-bit at Caltech so I will
send out an announcement to our users via DASWG.
On Wed, Oct 17, 2007 at 03:56:30PM -0500, Scott Koranda wrote:
> Hi,
>
> This patch has been put into place on hydra.phys.uwm.edu and
> marlin.phys.uwm.edu, as well as the s0001 node.
>
> These are the only places on the UWM cluster where people
> should be condor compiling code.
>
> I have quickly tested by using bug.c sent earlier by Stuart
> and I can no longer reproduce the bug.
>
> Stuart, once you have this done at Caltech do you want to send
> a note to the DASWG list to let people know?
>
> I will also put a note in the motd on our cluster.
>
> Scott
>
> > > Is there a reason that x86_64 directly only has a patched
> > > libcondorsyscall.a file while the x86 directory also libcondorzsyscall.a?
> >
> > Yes. There isn't be a libcondorzsyscall.a for the x86_64 platform.
> > Checkpoint compression is not yet supported on that platform.
> >
> > Thank you.
> >
> > -pete
> >
> >
> > ========================================
> > MESSAGE INFORMATION
> > ========================================
> > * From: Peter Keller <psilord__AT__cs.wisc.edu>
> > * Ticket Email List: anderson__AT__ligo.caltech.edu, keppel_d__AT__ligo.caltech.edu,kipp__AT__gravity.phys.uwm.edu,dabrown__AT__physics.syr.edu,skoranda__AT__gravity.phys.uwm.edu,channa__AT__phys.lsu.edu
> >
> > --
> > ======================================================================
> > This mail was sent from the RUST Mail System
> > Please direct all replies to condor-support__AT__cs.wisc.edu
> > Please include the current subject line in your reply.
> > ======================================================================
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Wed Oct 17 17:13:49 2007 (1192659230)
Date: Wed, 24 Oct 2007 11:45:23 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: keppel_d__AT__ligo.caltech.edu, kipp__AT__gravity.phys.uwm.edu,
dabrown__AT__physics.syr.edu, skoranda__AT__gravity.phys.uwm.edu,
channa__AT__phys.lsu.edu
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu
Pete,
This patch has been working well for a week so please consider
closing out this ticket.
Thanks.
On Wed, Oct 17, 2007 at 12:14:54PM -0500, condor-support response tracking system wrote:
> Hello,
>
> I have created the patch you need. You may find it at the ftp site
> at ftp.cs.wisc.edu.
>
> ftp ftp.cs.wisc.edu
> login as ftp
> password ftp@
> > cd condor/temporary/forligo/6.9.4-stduniv-binary-patch
> > bin
> > get 6.9.4-stduniv-binary-patch.tar.gz
> > bye
>
> Untar the tarball, and copy the new libcondor(z)?syscall.a files, as
> appropriate for your architecture, into Condor's <appropriate architecture
> install directory>/lib directory.
>
> Then, relink with condor_compile any 6.9.4 stduniv executables you may
> have had and resubmit them.
>
> This patch is against the exact 6.9.4 revision of Condor (but still
> identified differently when you ident the stduniv executables), so there
> isn't go to be any version compatibility problems.
>
> I'm out for a few hours this afternoon, but I can check up later to see if
> things are going ok.
>
> I apologize for this bug. The new test going into the test suite tomorrow
> should prevent this type of problem from happening again.
>
> Once I see it working at your site, I'll do a more general announcement.
>
> Thank you.
>
> -pete
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: anderson__AT__ligo.caltech.edu, keppel_d__AT__ligo.caltech.edu,kipp__AT__gravity.phys.uwm.edu,dabrown__AT__physics.syr.edu,skoranda__AT__gravity.phys.uwm.edu,channa__AT__phys.lsu.edu
>
> --
> ======================================================================
> This mail was sent from the RUST Mail System
> Please direct all replies to condor-support__AT__cs.wisc.edu
> Please include the current subject line in your reply.
> ======================================================================
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Wed Oct 24 13:45:39 2007 (1193251539)
Subject: Actions
Ticket resolved by psilord
===========================================================================
Date of actions: Wed Oct 24 16:07:01 2007 (1193260022)
Subject: Actions
Ticket was reopened by mailnull
===========================================================================
Date of actions: Wed Oct 24 16:07:03 2007 (1193260026)
Date: Wed, 24 Oct 2007 16:06:55 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #2123] LIGO: Corrupt file output in the
Standard Universe
> This patch has been working well for a week so please consider
> closing out this ticket.
No problem. I'm resolving it.
Thank you.
-pete
===========================================================================
Date mail was appended: Wed Oct 24 16:07:03 2007 (1193260026)
Subject: Actions
Ticket resolved by psilord
===========================================================================
Date of actions: Mon Oct 29 13:08:38 2007 (1193681318)