LIGO Support Ticket 15780
Ticket Information
Number: admin 15780
User: anderson@ligo.caltech.edu
Email: espinoza_e__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu
Status: pending
Assigned To: jfrey
Date: Tue, 26 Jun 2007 19:06:02 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
CC: Erik Espinoza <espinoza_e__AT__ligo.caltech.edu>, Scott Koranda
<skoranda__AT__gravity.phys.uwm.edu>
Subject: LIGO: schedd writes data to /dev/null
Why does the schedd write data to /dev/null?
# condor_version
$CondorVersion: 6.8.5 May 17 2007 $
$CondorPlatform: X86_64-LINUX_RHEL3 $
-bash-3.00$
Here is an strace snapshot of schedd during startup on an FC4 x86_64 machine:
open("/dev/null", O_WRONLY) = 10
fcntl(10, F_GETFL) = 0x8001 (flags
O_WRONLY|O_LARGEFILE)
fstat(10, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2400b5b0) = -1 ENOTTY
(Inappropriate ioctl for device)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2ab3876b9000
lseek(10, 0, SEEK_CUR) = 0
write(10, "Job", 3) = 3
close(10) = 0
munmap(0x2ab3876b9000, 4096) = 0
open("/dev/null", O_WRONLY) = 10
fcntl(10, F_GETFL) = 0x8001 (flags
O_WRONLY|O_LARGEFILE)
fstat(10, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2400b5b0) = -1 ENOTTY
(Inappropriate ioctl for device)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2ab3876b9000
lseek(10, 0, SEEK_CUR) = 0
write(10, "Machine", 7) = 7
close(10) = 0
munmap(0x2ab3876b9000, 4096) = 0
open("/dev/null", O_WRONLY) = 10
fcntl(10, F_GETFL) = 0x8001 (flags
O_WRONLY|O_LARGEFILE)
fstat(10, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2400b5b0) = -1 ENOTTY
(Inappropriate ioctl for device)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2ab3876b9000
lseek(10, 0, SEEK_CUR) = 0
write(10, "Job", 3) = 3
close(10) = 0
munmap(0x2ab3876b9000, 4096) = 0
open("/dev/null", O_WRONLY) = 10
fcntl(10, F_GETFL) = 0x8001 (flags
O_WRONLY|O_LARGEFILE)
fstat(10, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2400b5b0) = -1 ENOTTY
(Inappropriate ioctl for device)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2ab3876b9000
lseek(10, 0, SEEK_CUR) = 0
write(10, "Machine", 7) = 7
close(10) = 0
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date of creation: Tue Jun 26 21:06:24 2007 (1182909987)
Subject: Actions
Assigned to jfrey by jfrey
===========================================================================
Date of actions: Wed Jun 27 14:29:37 2007 (1182972577)
From: Jaime Frey <jfrey__AT__cs.wisc.edu>
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
Date: Wed, 27 Jun 2007 14:46:31 -0500
To: condor-admin__AT__cs.wisc.edu
> Why does the schedd write data to /dev/null?
>
> # condor_version
> $CondorVersion: 6.8.5 May 17 2007 $
> $CondorPlatform: X86_64-LINUX_RHEL3 $
> -bash-3.00$
>
> Here is an strace snapshot of schedd during startup on an FC4
> x86_64 machine:
>
> open("/dev/null", O_WRONLY) = 10
> fcntl(10, F_GETFL) = 0x8001 (flags
> O_WRONLY|O_LARGEFILE)
> fstat(10, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
> ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2400b5b0) = -1 ENOTTY
> (Inappropriate ioctl for device)
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0)
> = 0x2ab3876b9000
> lseek(10, 0, SEEK_CUR) = 0
> write(10, "Job", 3) = 3
> close(10) = 0
Condor has a safe version of sprintf() that allocates enough space
for the formatted string. It calculates the space needed by writing
the string to /dev/null with fprintf() and noting the number of
characters written.
Thanks and regards,
Jaime Frey
UW-Madison Condor Team
===========================================================================
Date mail was appended: Wed Jun 27 14:46:22 2007 (1182973583)
Subject: Actions
Status changed from open to pending by jfrey
===========================================================================
Date of actions: Wed Jun 27 14:46:22 2007 (1182973584)
Date: Wed, 27 Jun 2007 13:09:05 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
CC: espinoza_e__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
Why not use snprintf()? That is presumably orders of magnitude faster than
performing file I/O, even if there is no spinning media involved.
Thanks.
On Wed, Jun 27, 2007 at 02:46:22PM -0500, condor-admin response tracking system wrote:
> > Why does the schedd write data to /dev/null?
> >
> > # condor_version
> > $CondorVersion: 6.8.5 May 17 2007 $
> > $CondorPlatform: X86_64-LINUX_RHEL3 $
> > -bash-3.00$
> >
> > Here is an strace snapshot of schedd during startup on an FC4
> > x86_64 machine:
> >
> > open("/dev/null", O_WRONLY) = 10
> > fcntl(10, F_GETFL) = 0x8001 (flags
> > O_WRONLY|O_LARGEFILE)
> > fstat(10, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
> > ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2400b5b0) = -1 ENOTTY
> > (Inappropriate ioctl for device)
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0)
> > = 0x2ab3876b9000
> > lseek(10, 0, SEEK_CUR) = 0
> > write(10, "Job", 3) = 3
> > close(10) = 0
>
> Condor has a safe version of sprintf() that allocates enough space
> for the formatted string. It calculates the space needed by writing
> the string to /dev/null with fprintf() and noting the number of
> characters written.
>
>
> Thanks and regards,
> Jaime Frey
> UW-Madison Condor Team
>
>
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Jaime Frey <jfrey__AT__cs.wisc.edu>
> * Ticket Email List: anderson__AT__ligo.caltech.edu, espinoza_e__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu
>
> --
> ======================================================================
> This mail was sent from the RUST Mail System
> Please direct all replies to condor-admin__AT__cs.wisc.edu
> Please include the current subject line in your reply.
> ======================================================================
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Wed Jun 27 15:53:46 2007 (1182977626)
From: Jaime Frey <jfrey__AT__cs.wisc.edu>
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
Date: Wed, 27 Jun 2007 16:38:32 -0500
To: condor-admin__AT__cs.wisc.edu
On Jun 27, 2007, at 3:53 PM, condor-admin response tracking system
wrote:
> Why not use snprintf()? That is presumably orders of magnitude
> faster than
> performing file I/O, even if there is no spinning media involved.
On some platforms, snprintf() doesn't return the full size of the
string when the buffer is too small. On those platforms, you have to
call snprintf() in a loop, malloc()ing bigger and bigger buffers
until it succeeds. It's probably faster than printing to /dev/null
most of the time, but I doubt it's orders of magnitude faster. If we
discover that printing to /dev/null has a noticeable performance hit,
we'd probably change it.
Thanks and regards,
Jaime Frey
UW-Madison Condor Team
===========================================================================
Date mail was appended: Wed Jun 27 16:38:28 2007 (1182980309)
Subject: Actions
Status changed from open to pending by jfrey
===========================================================================
Date of actions: Wed Jun 27 16:38:28 2007 (1182980310)
Date: Wed, 27 Jun 2007 15:03:16 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
CC: espinoza_e__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
On Wed, Jun 27, 2007 at 04:38:28PM -0500, condor-admin response tracking system wrote:
> On Jun 27, 2007, at 3:53 PM, condor-admin response tracking system
> wrote:
>
> > Why not use snprintf()? That is presumably orders of magnitude
> > faster than
> > performing file I/O, even if there is no spinning media involved.
>
> On some platforms, snprintf() doesn't return the full size of the
> string when the buffer is too small. On those platforms, you have to
> call snprintf() in a loop, malloc()ing bigger and bigger buffers
Ouch. What platforms are that broken?
> until it succeeds. It's probably faster than printing to /dev/null
> most of the time, but I doubt it's orders of magnitude faster. If we
> discover that printing to /dev/null has a noticeable performance hit,
> we'd probably change it.
Hopefully you are right. The reason I noticed was that when the schedd
was CPU bound after a restart I ran strace and saw this loop of apparently
inefficient system calls.
Thanks for the explanation.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Wed Jun 27 17:22:00 2007 (1182982921)
From: Jaime Frey <jfrey__AT__cs.wisc.edu>
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
Date: Wed, 27 Jun 2007 17:33:53 -0500
To: condor-admin__AT__cs.wisc.edu
On Jun 27, 2007, at 5:22 PM, condor-admin response tracking system
wrote:
> On Wed, Jun 27, 2007 at 04:38:28PM -0500, condor-admin response
> tracking system wrote:
>> On Jun 27, 2007, at 3:53 PM, condor-admin response tracking system
>> wrote:
>>
>>> Why not use snprintf()? That is presumably orders of magnitude
>>> faster than
>>> performing file I/O, even if there is no spinning media involved.
>>
>> On some platforms, snprintf() doesn't return the full size of the
>> string when the buffer is too small. On those platforms, you have to
>> call snprintf() in a loop, malloc()ing bigger and bigger buffers
>
> Ouch. What platforms are that broken?
Digital Unix, HPUX and possibly others. Here's a webpage that lists
the different behaviors:
http://perfec.to/vsnprintf/
>> until it succeeds. It's probably faster than printing to /dev/null
>> most of the time, but I doubt it's orders of magnitude faster. If we
>> discover that printing to /dev/null has a noticeable performance hit,
>> we'd probably change it.
>
> Hopefully you are right. The reason I noticed was that when the schedd
> was CPU bound after a restart I ran strace and saw this loop of
> apparently
> inefficient system calls.
It may be worth running strace with -tt to get a quick idea of how
long each open/print/close cycle takes. We could improve things by
leaving the fd to /dev/null open if it's a lot of time.
Thanks and regards,
Jaime Frey
UW-Madison Condor Team
===========================================================================
Date mail was appended: Thu Jun 28 9:42:51 2007 (1183041772)
Subject: Actions
Status changed from open to pending by jfrey
===========================================================================
Date of actions: Thu Jun 28 9:42:51 2007 (1183041773)
Date: Thu, 28 Jun 2007 10:08:43 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
CC: espinoza_e__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
On Thu, Jun 28, 2007 at 09:42:51AM -0500, condor-admin response tracking system wrote:
>
> >> until it succeeds. It's probably faster than printing to /dev/null
> >> most of the time, but I doubt it's orders of magnitude faster. If we
> >> discover that printing to /dev/null has a noticeable performance hit,
> >> we'd probably change it.
> >
> > Hopefully you are right. The reason I noticed was that when the schedd
> > was CPU bound after a restart I ran strace and saw this loop of
> > apparently
> > inefficient system calls.
>
> It may be worth running strace with -tt to get a quick idea of how
> long each open/print/close cycle takes. We could improve things by
> leaving the fd to /dev/null open if it's a lot of time.
>
It looks like printing a few character string to /dev/null is taking a few
hundred usec on a 2.6GHz Opteroon, i.e., ~1M clock cycles. I do not know if
this a practicle limit to schedd performance that should be pursued or not.
Thanks.
09:56:48.943746 open("/dev/null", O_WRONLY) = 12
09:56:48.943807 fcntl(12, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE)
09:56:48.943860 fstat(12, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
09:56:48.943915 ioctl(12, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2400b320) = -1 ENOTTY (Inappropriate ioctl for device)
09:56:48.943949 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ab38780c000
09:56:48.943982 lseek(12, 0, SEEK_CUR) = 0
09:56:48.944020 write(12, "Machine", 7) = 7
09:56:48.944054 close(12) = 0
09:56:48.944085 munmap(0x2ab38780c000, 4096) = 0
09:56:48.944189 open("/dev/null", O_WRONLY) = 12
09:56:48.944231 fcntl(12, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE)
09:56:48.944265 fstat(12, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
09:56:48.944323 ioctl(12, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff2400b320) = -1 ENOTTY (Inappropriate ioctl for device)
09:56:48.944358 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ab38780c000
09:56:48.944390 lseek(12, 0, SEEK_CUR) = 0
09:56:48.944427 write(12, "Job", 3) = 3
09:56:48.944462 close(12) = 0
09:56:48.944492 munmap(0x2ab38780c000, 4096) = 0
09:56:48.944560 open("/dev/null", O_WRONLY) = 12
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Thu Jun 28 12:09:16 2007 (1183050556)
From: Jaime Frey <jfrey__AT__cs.wisc.edu>
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
Date: Mon, 2 Jul 2007 11:11:50 -0500
To: condor-admin__AT__cs.wisc.edu
--Apple-Mail-2--1072919271
On Jun 28, 2007, at 12:09 PM, condor-admin response tracking system
wrote:
>>>> until it succeeds. It's probably faster than printing to /dev/null
>>>> most of the time, but I doubt it's orders of magnitude faster.
>>>> If we
>>>> discover that printing to /dev/null has a noticeable performance
>>>> hit,
>>>> we'd probably change it.
>>>
>>> Hopefully you are right. The reason I noticed was that when the
>>> schedd
>>> was CPU bound after a restart I ran strace and saw this loop of
>>> apparently
>>> inefficient system calls.
>>
>> It may be worth running strace with -tt to get a quick idea of how
>> long each open/print/close cycle takes. We could improve things by
>> leaving the fd to /dev/null open if it's a lot of time.
>
> It looks like printing a few character string to /dev/null is
> taking a few
> hundred usec on a 2.6GHz Opteroon, i.e., ~1M clock cycles. I do not
> know if
> this a practicle limit to schedd performance that should be pursued
> or not.
It looks like strace is adding a lot of overhead. I wrote a little
program that is reproduces the two ways of doing an sprintf() that
allocates enough space for the whole string, timed with calls to
gettimeofday(). Here are the results in microseconds:
with strace
-----------
fprintf(/dev/null): 1175
fprintf(null_file): 421
snprintf(): 112
without strace
--------------
fprintf(/dev/null): 253
fprintf(null_file): 21
snprintf(): 5
This is on an Athlon XP1800+ with strace invoked with the -o option.
The middle line excludes the fopen() and fclose(). I've attached the
program in case you want to try it on your machine.
Thanks and regards,
Jaime Frey
UW-Madison Condor Team
--Apple-Mail-2--1072919271
#include <stdio.h>
#include <sys/time.h>
int main()
{
int rc;
char buf[1024];
FILE *null_file;
struct timeval start;
struct timeval stop;
gettimeofday( &start, NULL );
null_file = fopen("/dev/null", "w");
if ( NULL == null_file ) {
fprintf( stderr, "open failed\n" );
exit( 1 );
}
rc = fprintf( null_file, "Hello" );
fclose( null_file );
rc = snprintf( buf, 1024, "Hello" );
gettimeofday( &stop, NULL );
printf( "fprintf(/dev/null): %d\n",
1000000 * (stop.tv_sec - start.tv_sec) +
stop.tv_usec - start.tv_usec );
null_file = fopen("/dev/null", "w");
if ( NULL == null_file ) {
fprintf( stderr, "open failed\n" );
exit( 1 );
}
gettimeofday( &start, NULL );
rc = fprintf( null_file, "Hello" );
rc = snprintf( buf, 1024, "Hello" );
gettimeofday( &stop, NULL );
fclose( null_file );
printf( "fprintf(null_file): %d\n",
1000000 * (stop.tv_sec - start.tv_sec) +
stop.tv_usec - start.tv_usec );
gettimeofday( &start, NULL );
rc = snprintf( buf, 1024, "Hello" );
rc = snprintf( buf, 1024, "Hello" );
gettimeofday( &stop, NULL );
printf( "snprintf(): %d\n",
1000000 * (stop.tv_sec - start.tv_sec) +
stop.tv_usec - start.tv_usec );
return 0;
}
--Apple-Mail-2--1072919271
--Apple-Mail-2--1072919271--
===========================================================================
Date mail was appended: Mon Jul 2 11:11:52 2007 (1183392713)
Subject: Actions
Status changed from open to pending by jfrey
===========================================================================
Date of actions: Mon Jul 2 11:11:52 2007 (1183392714)
Date: Mon, 2 Jul 2007 09:50:19 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
CC: espinoza_e__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
What I take away from this is that fprintf(/dev/null) is 50x slower than
snprintf(), but it is up to you to decide if this is slowing down schedd
performance at all, and if so, how hard would it be to use snprintf() on
platforms with working versions and fall back to fprintf(/dev/null) only
when necessary.
Thanks.
> It looks like strace is adding a lot of overhead. I wrote a little
> program that is reproduces the two ways of doing an sprintf() that
> allocates enough space for the whole string, timed with calls to
> gettimeofday(). Here are the results in microseconds:
>
> with strace
> -----------
> fprintf(/dev/null): 1175
> fprintf(null_file): 421
> snprintf(): 112
>
> without strace
> --------------
> fprintf(/dev/null): 253
> fprintf(null_file): 21
> snprintf(): 5
>
> This is on an Athlon XP1800+ with strace invoked with the -o option.
> The middle line excludes the fopen() and fclose(). I've attached the
> program in case you want to try it on your machine.
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Mon Jul 2 11:50:39 2007 (1183395039)
From: Jaime Frey <jfrey__AT__cs.wisc.edu>
Subject: Re: [condor-admin #15780] LIGO: schedd writes data to /dev/null
Date: Mon, 2 Jul 2007 12:08:53 -0500
To: condor-admin__AT__cs.wisc.edu
> What I take away from this is that fprintf(/dev/null) is 50x slower
> than
> snprintf(), but it is up to you to decide if this is slowing down
> schedd
> performance at all, and if so, how hard would it be to use snprintf
> () on
> platforms with working versions and fall back to fprintf(/dev/null)
> only
> when necessary.
It's slower than I expected. It's still unclear if it's worth the
time to change, as we don't know how often this code gets called.
I'll put it down as a low-priority item.
Thanks and regards,
Jaime Frey
UW-Madison Condor Team
===========================================================================
Date mail was appended: Mon Jul 2 12:08:56 2007 (1183396136)
Subject: Actions
Status changed from open to pending by jfrey
===========================================================================
Date of actions: Mon Jul 2 12:08:56 2007 (1183396137)