LIGO Support Ticket 17159

Ticket Information
  Number:      admin 17159
  User:        volodya@mindspring.com
  Email:       anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,parmor__AT__gravity.phys.uwm.edu,condorligo__AT__aei.mpg.de
  Status:      resolved
  Assigned To: psilord
X-Orbl: [68.79.89.115]
Date: Thu, 1 Nov 2007 09:11:41 -0400 (EDT)
From: Vladimir Dergachev <volodya__AT__mindspring.com>
X-X-Sender: volodya__AT__node12.an-vo.com
Reply-To: Vladimir Dergachev <volodya__AT__mindspring.com>
To: condor-admin__AT__cs.wisc.edu
CC: Stuart Anderson <anderson__AT__ligo.caltech.edu>,         Scott Koranda
 <skoranda__AT__gravity.phys.uwm.edu>,         Paul Armor
 <parmor__AT__gravity.phys.uwm.edu>
Subject: LIGO: fcntl 64bit bug
Content-ID: <alpine.DEB.0.99.0711010904570.20559__AT__node12.an-vo.com>
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---1463809024-2116357048-1193922238=:20559
Content-ID: <alpine.DEB.0.99.0711010904571.20559__AT__node12.an-vo.com>


Hi,

    I have encountered a problem when using fcntl(fd, F_SETFL, &lock_structure)
function on 64bit platforms.

    It appears that condor version of fcntl only passes the first 32 bits 
of the structure pointer. This has been tested to happen on condor 
versions 6.8.2, 6.8.6 and 6.9.4 (with standard universe patch). On 
machines using both Intel Core2 and AMD Opteron CPUs.

    I have attached a test program that exhibits the bug. It takes one 
argument - the name of the lock file.

    If one runs it under strace -e raw=fcntl one can see how the high bits 
are chopped off. To check the correctness of the program move the 
declaration of struct flock to be a global variable - everything will work 
then (but, probably, only on low-endian hardware).

    Command lines:

    gcc lock_test.c -o lock_test_plain
    condor_compile gcc lock_test.c -o lock_test_condor

    strace -e raw lock_test_plain lock
    strace -e raw lock_test_condor lock

    Let me know whether there is any other info you might need..

                 thank you very much

                          Vladimir Dergachev

---1463809024-2116357048-1193922238=:20559
Content-ID: <alpine.DEB.0.99.0711010903580.20559__AT__node12.an-vo.com>
Content-Description: 

I2RlZmluZSBfR05VX1NPVVJDRQ0KI2luY2x1ZGUgPHN0cmluZy5oPg0KI2lu
Y2x1ZGUgPHN0ZGlvLmg+DQojaW5jbHVkZSA8c3RkbGliLmg+DQojaW5jbHVk
ZSA8dW5pc3RkLmg+DQojaW5jbHVkZSA8ZXJybm8uaD4NCiNpbmNsdWRlIDxh
bGxvY2EuaD4NCiNpbmNsdWRlIDxtYXRoLmg+DQojaW5jbHVkZSA8c3lzL3Rp
bWUuaD4NCiNpbmNsdWRlIDx0aW1lLmg+DQoNCiNpbmNsdWRlIDxzeXMvdHlw
ZXMuaD4NCiNpbmNsdWRlIDxzeXMvc3RhdC5oPg0KI2luY2x1ZGUgPGRpcmVu
dC5oPg0KI2luY2x1ZGUgPGZjbnRsLmg+DQoNCmludCBsb2NrX2ZpbGU9LTE7
DQoNCnZvaWQgY29uZG9yX3NhZmVfc2xlZXAoaW50IHNlY29uZHMpDQp7DQpz
dHJ1Y3QgdGltZXZhbCB0aW1lb3V0Ow0KdGltZW91dC50dl9zZWM9c2Vjb25k
czsNCnRpbWVvdXQudHZfdXNlYz0xOw0Kc2VsZWN0KDAsIE5VTEwsIE5VTEws
IE5VTEwsICZ0aW1lb3V0KTsNCn0NCg0Kc3RhdGljIHZvaWQgYWNxdWlyZV9s
b2NrKGNoYXIgKmZpbGVuYW1lKQ0Kew0KaW50IGk7DQpzdHJ1Y3QgZmxvY2sg
Zmw7DQoNCmxvY2tfZmlsZT1vcGVuKGZpbGVuYW1lLCBPX0NSRUFUIHwgT19S
RFdSLCAwNjY2KTsNCmlmKGxvY2tfZmlsZTwwKSB7DQoJZnByaW50ZihzdGRl
cnIsICJDb3VsZCBub3Qgb3BlbiBmaWxlIFwiJXNcIiBmb3IgbG9ja2luZ1xu
IiwgZmlsZW5hbWUpOw0KCXJldHVybjsNCgl9DQpmbC5sX3R5cGU9Rl9XUkxD
SzsNCmZsLmxfd2hlbmNlPVNFRUtfU0VUOw0KZmwubF9zdGFydD0wOw0KZmwu
bF9sZW49MTsNCmVycm5vPTA7DQppPTA7DQpmcHJpbnRmKHN0ZGVyciwgImxv
Y2sgc3RydWN0dXJlIGFkZHJlc3M6ICVwXG4iLCAmZmwpOw0Kd2hpbGUoZmNu
dGwobG9ja19maWxlLCBGX1NFVExLLCAmZmwpPDApew0KCWlmKGk+MSkgew0K
CQlmcHJpbnRmKHN0ZGVyciwgIldhaXRpbmcgZm9yIGxvY2s6ICVzXG4iLCBz
dHJlcnJvcihlcnJubykpOw0KCQlpPTA7DQoJCX0NCgljb25kb3Jfc2FmZV9z
bGVlcCgyKTsNCglpKys7DQoJfQ0KfQ0KDQpzdGF0aWMgdm9pZCByZWxlYXNl
X2xvY2sodm9pZCkNCnsNCmlmKGxvY2tfZmlsZT49MCljbG9zZShsb2NrX2Zp
bGUpOw0KbG9ja19maWxlPS0xOw0KfQ0KDQppbnQgbWFpbihpbnQgYXJnYywg
Y2hhciAqIGFyZ3ZbXSkNCnsNCmlmKGFyZ2M8Mikgew0KCWZwcmludGYoc3Rk
ZXJyLCAiVXNhZ2U6XG5cdHRlc3RfbG9jayBmaWxlbmFtZVxuIik7DQoJZXhp
dCgtMSk7DQoJfQ0KZnByaW50ZihzdGRlcnIsICJBY3F1aXJpbmcgbG9jayBm
b3IgJXNcbiIsIGFyZ3ZbMV0pOw0KYWNxdWlyZV9sb2NrKGFyZ3ZbMV0pOw0K
ZnByaW50ZihzdGRlcnIsICJBY3F1aXJlZCwgd2FpdGluZyBmb3IgMTAgc2Vj
XG4iKTsNCmNvbmRvcl9zYWZlX3NsZWVwKDIwKTsNCmZwcmludGYoc3RkZXJy
LCAiUmVsZWFzaW5nIGxvY2tcbiIpOw0KcmVsZWFzZV9sb2NrKCk7DQpmcHJp
bnRmKHN0ZGVyciwgImRvbmVcbiIpOw0KZXhpdCgwKTsNCn0NCg==

---1463809024-2116357048-1193922238=:20559--

===========================================================================
Date of creation: Thu Nov  1  8:11:59 2007 (1193922722)
Subject: Actions

Assigned to psilord by psilord
===========================================================================
Date of actions: Fri Nov  2 13:11:24 2007 (1194027084)
Date: Fri, 30 Nov 2007 14:35:20 -0600
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: psilord <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug

Hello,

> From: Vladimir Dergachev <volodya__AT__mindspring.com>
>     It appears that condor version of fcntl only passes the first 32 bits 
> of the structure pointer. This has been tested to happen on condor 
> versions 6.8.2, 6.8.6 and 6.9.4 (with standard universe patch). On 
> machines using both Intel Core2 and AMD Opteron CPUs.

After investigation, I'd say this is a bug, but sadly, by design. Condor's
interposition codebase doesn't allow more than two arguments to fcntl(),
with arguments beyond 2 being undefined on the call stack. It looks pretty
difficult to fix at this time.

It looks like this kind of fcntl() is to lock a region of a file, but
if you actually intend to lock the entire file, then you can use flock()
instead to get advisory locks. So, try that and let me know how it goes.

Thank you.

-pete

===========================================================================
Date mail was appended: Fri Nov 30 14:35:29 2007 (1196454929)
X-Orbl: [68.79.89.115]
Date: Fri, 30 Nov 2007 18:56:15 -0500 (EST)
From: Vladimir Dergachev <volodya__AT__mindspring.com>
X-X-Sender: volodya__AT__node12.an-vo.com
Reply-To: Vladimir Dergachev <volodya__AT__mindspring.com>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu,
 parmor__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Mailfromd-RBL: IP Address 207.115.20.71 is listed on db.wpbl.info
X-Mailfromd: Total of 1 RBL listing (15 mins)
X-Mailfromd-Greylist-Time: May have had a total greylist delay of 15 minutes
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu



On Fri, 30 Nov 2007, condor-admin response tracking system wrote:
>
> After investigation, I'd say this is a bug, but sadly, by design. Condor's
> interposition codebase doesn't allow more than two arguments to fcntl(),
> with arguments beyond 2 being undefined on the call stack. It looks pretty
> difficult to fix at this time.
>
> It looks like this kind of fcntl() is to lock a region of a file, but
> if you actually intend to lock the entire file, then you can use flock()
> instead to get advisory locks. So, try that and let me know how it goes.

Actually I picked fcntl over flock as the latter does not work over NFS 
(the purpose of locks is to serialize disk access).

Well, so far, everything appears to work when upper address bits are 
zeroed out, so I'll keep using that.

I don't suppose there a way to have a function that bypasses condor fcntl 
and uses regular one instead ? I don't need locks preserved across 
evictions.

If you coud point me to a piece of code that does something similar I can 
try to make a patch..

                        thank you !

                               Vladimir Dergachev



>
> Thank you.
>
> -pete
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: volodya__AT__mindspring.com, anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,parmor__AT__gravity.phys.uwm.edu
>
> --
> ======================================================================
> This mail was sent from the RUST Mail System
> Please direct all replies to condor-admin__AT__cs.wisc.edu
> Please include the current subject line in your reply.
> ======================================================================
>

===========================================================================
Date mail was appended: Fri Nov 30 18:15:04 2007 (1196468108)
Date: Fri, 30 Nov 2007 16:24:18 -0800
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Vladimir Dergachev <volodya__AT__mindspring.com>
CC: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>,
 skoranda__AT__gravity.phys.uwm.edu, parmor__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

On Fri, Nov 30, 2007 at 06:56:15PM -0500, Vladimir Dergachev wrote:
> 
> 
> On Fri, 30 Nov 2007, condor-admin response tracking system wrote:
> >
> >After investigation, I'd say this is a bug, but sadly, by design. Condor's
> >interposition codebase doesn't allow more than two arguments to fcntl(),
> >with arguments beyond 2 being undefined on the call stack. It looks pretty
> >difficult to fix at this time.
> >
> >It looks like this kind of fcntl() is to lock a region of a file, but
> >if you actually intend to lock the entire file, then you can use flock()
> >instead to get advisory locks. So, try that and let me know how it goes.
> 
> Actually I picked fcntl over flock as the latter does not work over NFS 
> (the purpose of locks is to serialize disk access).

flock() over NFS should work. What problem have you observed with this?

-- 
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson

===========================================================================
Date mail was appended: Fri Nov 30 18:24:34 2007 (1196468674)
X-Orbl: [68.79.89.115]
Date: Sat, 1 Dec 2007 00:14:57 -0500 (EST)
From: Vladimir Dergachev <volodya__AT__mindspring.com>
X-X-Sender: volodya__AT__node12.an-vo.com
Reply-To: Vladimir Dergachev <volodya__AT__mindspring.com>
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
CC: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>,
 skoranda__AT__gravity.phys.uwm.edu, parmor__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Mailfromd-RBL: IP Address 207.115.20.187 is listed on db.wpbl.info
X-Mailfromd: Total of 1 RBL listing (15 mins)
X-Mailfromd-Greylist-Time: May have had a total greylist delay of 15 minutes
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

>>> instead to get advisory locks. So, try that and let me know how it goes.
>>
>> Actually I picked fcntl over flock as the latter does not work over NFS
>> (the purpose of locks is to serialize disk access).
>
> flock() over NFS should work. What problem have you observed with this?

Maybe I am missing something, but on my system man 2 flock says:

NOTES
        flock(2) does not lock files over NFS.  Use fcntl(2) instead: that does
        work over NFS, given a sufficiently  recent  version  of  Linux  and  a
        server which supports locking.

        Since  kernel  2.0, flock(2) is implemented as a system call in its own
        right rather than being emulated in the GNU C  library  as  a  call  to
        fcntl(2).   This  yields  true  BSD  semantics: there is no interaction
        between the types of lock placed by flock(2) and fcntl(2), and flock(2)
        does not detect deadlock.

                      best

                         Vladimir Dergachev

>
> --
> Stuart Anderson  anderson__AT__ligo.caltech.edu
> http://www.ligo.caltech.edu/~anderson
>

===========================================================================
Date mail was appended: Fri Nov 30 23:40:42 2007 (1196487643)
Date: Fri, 30 Nov 2007 23:03:36 -0800
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Vladimir Dergachev <volodya__AT__mindspring.com>
CC: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>,
 skoranda__AT__gravity.phys.uwm.edu, parmor__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

Vladimir,
	My mistake. I recently solved another Condor race condition on a
shared filesystem with flock(), but that was a QFS filesystem not NFS.

On Sat, Dec 01, 2007 at 12:14:57AM -0500, Vladimir Dergachev wrote:
> >
> >flock() over NFS should work. What problem have you observed with this?
> 
> Maybe I am missing something, but on my system man 2 flock says:
> 
> NOTES
>        flock(2) does not lock files over NFS.  Use fcntl(2) instead: that 
>        does
>        work over NFS, given a sufficiently  recent  version  of  Linux  and 
>        a
>        server which supports locking.

-- 
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson

===========================================================================
Date mail was appended: Sat Dec  1  1:03:55 2007 (1196492637)
Date: Tue, 4 Dec 2007 14:22:56 -0600
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug

Hello,

If your standard universe job is doing remote i/o, then the flock will
be performed on the shadow side with the various shadows "seeing" the same lock
file (if it is in a shared location like /tmp).

If you are using NFS for your remote i/o, then this snippet of code should do
the right thing for you:

int flock_condor_local(int fd, int cmd, void *arg)
{
    int ret = -1;
    int unmapped_fd;

    /* This call would normally go through the file table. However, we want
        to do it locally here, so get the unmapped fd the kernel originally
        gave us for this fd and use that directly. */
    unmapped_fd = _condor_get_unmapped_fd(fd);
    ret = syscall(SYS_fcntl, unmapped_fd, cmd, arg);

    return ret;
}

This can only be compiled when linking with condor_compile. I would only
use it for your file locking and DO NOT use it for things that return
an fd e.g., F_DUPFD, as it will get that very wrong since this code is
working in a layer below the file descriptor translation table in the
condor libraries.

Thank you.

-pete

===========================================================================
Date mail was appended: Tue Dec  4 14:23:04 2007 (1196799786)
X-Orbl: [68.79.89.115]
Date: Tue, 4 Dec 2007 17:31:27 -0500 (EST)
From: Vladimir Dergachev <volodya__AT__mindspring.com>
X-X-Sender: volodya__AT__node12.an-vo.com
Reply-To: Vladimir Dergachev <volodya__AT__mindspring.com>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu,
 parmor__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Mailfromd-RBL: IP Address 207.115.20.187 is listed on db.wpbl.info
X-Mailfromd: Total of 1 RBL listing (15 mins)
X-Mailfromd-Greylist-Time: May have had a total greylist delay of 15 minutes
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu


Wonderful - thank you very much !

                Vladimir Dergachev

On Tue, 4 Dec 2007, condor-admin response tracking system wrote:

> Hello,
>
> If your standard universe job is doing remote i/o, then the flock will
> be performed on the shadow side with the various shadows "seeing" the same lock
> file (if it is in a shared location like /tmp).
>
> If you are using NFS for your remote i/o, then this snippet of code should do
> the right thing for you:
>
> int flock_condor_local(int fd, int cmd, void *arg)
> {
>    int ret = -1;
>    int unmapped_fd;
>
>    /* This call would normally go through the file table. However, we want
>        to do it locally here, so get the unmapped fd the kernel originally
>        gave us for this fd and use that directly. */
>    unmapped_fd = _condor_get_unmapped_fd(fd);
>    ret = syscall(SYS_fcntl, unmapped_fd, cmd, arg);
>
>    return ret;
> }
>
> This can only be compiled when linking with condor_compile. I would only
> use it for your file locking and DO NOT use it for things that return
> an fd e.g., F_DUPFD, as it will get that very wrong since this code is
> working in a layer below the file descriptor translation table in the
> condor libraries.
>
> Thank you.
>
> -pete
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: volodya__AT__mindspring.com, anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,parmor__AT__gravity.phys.uwm.edu
>
> --
> ======================================================================
> This mail was sent from the RUST Mail System
> Please direct all replies to condor-admin__AT__cs.wisc.edu
> Please include the current subject line in your reply.
> ======================================================================
>

===========================================================================
Date mail was appended: Tue Dec  4 17:00:28 2007 (1196809229)
Date: Fri, 14 Dec 2007 13:35:15 -0600
From: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
To: Vladimir Dergachev <volodya__AT__mindspring.com>
CC: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>,
 anderson__AT__ligo.caltech.edu, parmor__AT__gravity.phys.uwm.edu,
 condorligo__AT__aei.mpg.de
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

Hi Vladimir,

Can you please respond to this note when you have tried this
and let us know if it works?

Thanks,

Scott

> 
>  Wonderful - thank you very much !
> 
>                 Vladimir Dergachev
> 
>  On Tue, 4 Dec 2007, condor-admin response tracking system wrote:
> 
> > Hello,
> >
> > If your standard universe job is doing remote i/o, then the flock will
> > be performed on the shadow side with the various shadows "seeing" the same 
> > lock
> > file (if it is in a shared location like /tmp).
> >
> > If you are using NFS for your remote i/o, then this snippet of code should 
> > do
> > the right thing for you:
> >
> > int flock_condor_local(int fd, int cmd, void *arg)
> > {
> >    int ret = -1;
> >    int unmapped_fd;
> >
> >    /* This call would normally go through the file table. However, we want
> >        to do it locally here, so get the unmapped fd the kernel originally
> >        gave us for this fd and use that directly. */
> >    unmapped_fd = _condor_get_unmapped_fd(fd);
> >    ret = syscall(SYS_fcntl, unmapped_fd, cmd, arg);
> >
> >    return ret;
> > }
> >
> > This can only be compiled when linking with condor_compile. I would only
> > use it for your file locking and DO NOT use it for things that return
> > an fd e.g., F_DUPFD, as it will get that very wrong since this code is
> > working in a layer below the file descriptor translation table in the
> > condor libraries.
> >
> > Thank you.
> >
> > -pete
> >
> >
> > ========================================
> > MESSAGE INFORMATION
> > ========================================
> > * From: Peter Keller <psilord__AT__cs.wisc.edu>
> > * Ticket Email List: volodya__AT__mindspring.com, 
> > anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,parmor__AT__gravity.phys.uwm.edu
> >
> > --
> > ======================================================================
> > This mail was sent from the RUST Mail System
> > Please direct all replies to condor-admin__AT__cs.wisc.edu
> > Please include the current subject line in your reply.
> > ======================================================================
> >

===========================================================================
Date mail was appended: Fri Dec 14 13:35:38 2007 (1197660939)
X-Orbl: [68.79.89.115]
Date: Sat, 15 Dec 2007 00:59:37 -0500 (EST)
From: Vladimir Dergachev <volodya__AT__mindspring.com>
X-X-Sender: volodya__AT__node12.an-vo.com
Reply-To: Vladimir Dergachev <volodya__AT__mindspring.com>
To: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
CC: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>,
 anderson__AT__ligo.caltech.edu, parmor__AT__gravity.phys.uwm.edu,
 condorligo__AT__aei.mpg.de
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu



On Fri, 14 Dec 2007, Scott Koranda wrote:

> Hi Vladimir,
>
> Can you please respond to this note when you have tried this
> and let us know if it works?

Absolutely. I got distracted by the meeting and have not made a try yet - 
sorry !

I'll make a test run this weekend.

                         best

                           Vladimir Dergachev

>
> Thanks,
>
> Scott
>
>>
>>  Wonderful - thank you very much !
>>
>>                 Vladimir Dergachev
>>
>>  On Tue, 4 Dec 2007, condor-admin response tracking system wrote:
>>
>>> Hello,
>>>
>>> If your standard universe job is doing remote i/o, then the flock will
>>> be performed on the shadow side with the various shadows "seeing" the same
>>> lock
>>> file (if it is in a shared location like /tmp).
>>>
>>> If you are using NFS for your remote i/o, then this snippet of code should
>>> do
>>> the right thing for you:
>>>
>>> int flock_condor_local(int fd, int cmd, void *arg)
>>> {
>>>    int ret = -1;
>>>    int unmapped_fd;
>>>
>>>    /* This call would normally go through the file table. However, we want
>>>        to do it locally here, so get the unmapped fd the kernel originally
>>>        gave us for this fd and use that directly. */
>>>    unmapped_fd = _condor_get_unmapped_fd(fd);
>>>    ret = syscall(SYS_fcntl, unmapped_fd, cmd, arg);
>>>
>>>    return ret;
>>> }
>>>
>>> This can only be compiled when linking with condor_compile. I would only
>>> use it for your file locking and DO NOT use it for things that return
>>> an fd e.g., F_DUPFD, as it will get that very wrong since this code is
>>> working in a layer below the file descriptor translation table in the
>>> condor libraries.
>>>
>>> Thank you.
>>>
>>> -pete
>>>
>>>
>>> ========================================
>>> MESSAGE INFORMATION
>>> ========================================
>>> * From: Peter Keller <psilord__AT__cs.wisc.edu>
>>> * Ticket Email List: volodya__AT__mindspring.com,
>>> anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,parmor__AT__gravity.phys.uwm.edu
>>>
>>> --
>>> ======================================================================
>>> This mail was sent from the RUST Mail System
>>> Please direct all replies to condor-admin__AT__cs.wisc.edu
>>> Please include the current subject line in your reply.
>>> ======================================================================
>>>
>

===========================================================================
Date mail was appended: Sat Dec 15  0:00:29 2007 (1197698434)
X-Orbl: [68.79.89.115]
Date: Mon, 17 Dec 2007 00:38:04 -0500 (EST)
From: Vladimir Dergachev <volodya__AT__mindspring.com>
X-X-Sender: volodya__AT__node12.an-vo.com
Reply-To: Vladimir Dergachev <volodya__AT__mindspring.com>
To: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
CC: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>,
 anderson__AT__ligo.caltech.edu, parmor__AT__gravity.phys.uwm.edu,
 condorligo__AT__aei.mpg.de
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu



Hi Scott, Pete:

    The function you suggested works perfectly - judging from the 
performance of a test run on the CIT cluster.

                thank you very much !

                     Vladimir Dergachev

On Fri, 14 Dec 2007, Scott Koranda wrote:

> Hi Vladimir,
>
> Can you please respond to this note when you have tried this
> and let us know if it works?
>
> Thanks,
>
> Scott
>
>>
>>  Wonderful - thank you very much !
>>
>>                 Vladimir Dergachev
>>
>>  On Tue, 4 Dec 2007, condor-admin response tracking system wrote:
>>
>>> Hello,
>>>
>>> If your standard universe job is doing remote i/o, then the flock will
>>> be performed on the shadow side with the various shadows "seeing" the same
>>> lock
>>> file (if it is in a shared location like /tmp).
>>>
>>> If you are using NFS for your remote i/o, then this snippet of code should
>>> do
>>> the right thing for you:
>>>
>>> int flock_condor_local(int fd, int cmd, void *arg)
>>> {
>>>    int ret = -1;
>>>    int unmapped_fd;
>>>
>>>    /* This call would normally go through the file table. However, we want
>>>        to do it locally here, so get the unmapped fd the kernel originally
>>>        gave us for this fd and use that directly. */
>>>    unmapped_fd = _condor_get_unmapped_fd(fd);
>>>    ret = syscall(SYS_fcntl, unmapped_fd, cmd, arg);
>>>
>>>    return ret;
>>> }
>>>
>>> This can only be compiled when linking with condor_compile. I would only
>>> use it for your file locking and DO NOT use it for things that return
>>> an fd e.g., F_DUPFD, as it will get that very wrong since this code is
>>> working in a layer below the file descriptor translation table in the
>>> condor libraries.
>>>
>>> Thank you.
>>>
>>> -pete
>>>
>>>
>>> ========================================
>>> MESSAGE INFORMATION
>>> ========================================
>>> * From: Peter Keller <psilord__AT__cs.wisc.edu>
>>> * Ticket Email List: volodya__AT__mindspring.com,
>>> anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,parmor__AT__gravity.phys.uwm.edu
>>>
>>> --
>>> ======================================================================
>>> This mail was sent from the RUST Mail System
>>> Please direct all replies to condor-admin__AT__cs.wisc.edu
>>> Please include the current subject line in your reply.
>>> ======================================================================
>>>
>

===========================================================================
Date mail was appended: Sun Dec 16 23:38:34 2007 (1197869917)
Date: Sun, 16 Dec 2007 21:44:48 -0800
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Vladimir Dergachev <volodya__AT__mindspring.com>
CC: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>,         condor-admin
 response tracking system <condor-admin__AT__cs.wisc.edu>,
 parmor__AT__gravity.phys.uwm.edu, condorligo__AT__aei.mpg.de
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu

Peter,
	One quick question before you close this ticket out. How is this
effected, if at all, by the major new glibc port you are doing for
RHEL/CentOS 5, both the work around and the original attempt ot use
3-argument calls to fcntl()?

Thanks.


On Mon, Dec 17, 2007 at 12:38:04AM -0500, Vladimir Dergachev wrote:
> 
> 
> Hi Scott, Pete:
> 
>    The function you suggested works perfectly - judging from the 
> performance of a test run on the CIT cluster.
> 
>                thank you very much !
> 
>                     Vladimir Dergachev
> 
> On Fri, 14 Dec 2007, Scott Koranda wrote:
> 
> >Hi Vladimir,
> >
> >Can you please respond to this note when you have tried this
> >and let us know if it works?
> >
> >Thanks,
> >
> >Scott
> >
> >>
> >> Wonderful - thank you very much !
> >>
> >>                Vladimir Dergachev
> >>
> >> On Tue, 4 Dec 2007, condor-admin response tracking system wrote:
> >>
> >>>Hello,
> >>>
> >>>If your standard universe job is doing remote i/o, then the flock will
> >>>be performed on the shadow side with the various shadows "seeing" the 
> >>>same
> >>>lock
> >>>file (if it is in a shared location like /tmp).
> >>>
> >>>If you are using NFS for your remote i/o, then this snippet of code 
> >>>should
> >>>do
> >>>the right thing for you:
> >>>
> >>>int flock_condor_local(int fd, int cmd, void *arg)
> >>>{
> >>>   int ret = -1;
> >>>   int unmapped_fd;
> >>>
> >>>   /* This call would normally go through the file table. However, we 
> >>>   want
> >>>       to do it locally here, so get the unmapped fd the kernel 
> >>>       originally
> >>>       gave us for this fd and use that directly. */
> >>>   unmapped_fd = _condor_get_unmapped_fd(fd);
> >>>   ret = syscall(SYS_fcntl, unmapped_fd, cmd, arg);
> >>>
> >>>   return ret;
> >>>}
> >>>
> >>>This can only be compiled when linking with condor_compile. I would only
> >>>use it for your file locking and DO NOT use it for things that return
> >>>an fd e.g., F_DUPFD, as it will get that very wrong since this code is
> >>>working in a layer below the file descriptor translation table in the
> >>>condor libraries.
> >>>
> >>>Thank you.
> >>>
> >>>-pete
> >>>
> >>>
> >>>========================================
> >>>MESSAGE INFORMATION
> >>>========================================
> >>>* From: Peter Keller <psilord__AT__cs.wisc.edu>
> >>>* Ticket Email List: volodya__AT__mindspring.com,
> >>>anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,parmor__AT__gravity.phys.uwm.edu
> >>>
> >>>--
> >>>======================================================================
> >>>This mail was sent from the RUST Mail System
> >>>Please direct all replies to condor-admin__AT__cs.wisc.edu
> >>>Please include the current subject line in your reply.
> >>>======================================================================
> >>>
> >

-- 
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson

===========================================================================
Date mail was appended: Sun Dec 16 23:45:24 2007 (1197870327)
Subject: Actions

Ticket resolved by tannenba
===========================================================================
Date of actions: Fri Feb 15 14:02:13 2008 (1203105733)
Subject: Actions

Ticket was reopened by mailnull
===========================================================================
Date of actions: Wed Feb 27 10:34:18 2008 (1204130060)
Date: Wed, 27 Feb 2008 10:34:09 -0600
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #17159] LIGO: fcntl 64bit bug

On Sun, Dec 16, 2007 at 11:45:24PM -0600, condor-admin response tracking system wrote:
> 	One quick question before you close this ticket out. How is this
> effected, if at all, by the major new glibc port you are doing for
> RHEL/CentOS 5, both the work around and the original attempt ot use
> 3-argument calls to fcntl()?

We answered this over a phone call, and the answer is it will not be worked
around.

I'll resolve this ticket now.

Thank you.

-pete

===========================================================================
Date mail was appended: Wed Feb 27 10:34:18 2008 (1204130060)
Subject: Actions

Ticket resolved by tannenba
===========================================================================
Date of actions: Fri Feb 29 13:42:59 2008 (1204314179)