LIGO Support Ticket 7591
Ticket Information
Number: support 7591
User: dabrown@physics.syr.edu
Email:
Status: resolved
Assigned To: wenger
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: LIGO: problem with startd starting backfill jobs
Date: Tue, 5 Aug 2008 11:45:40 -0400
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
Hi Todd,
I am seeing the following error in my Startd logs when they try and
fire up boinc backfill jobs:
8/5 10:56:30 ******************************************************
8/5 10:56:30 ** condor_starter (CONDOR_STARTER) STARTING UP
8/5 10:56:30 ** /usr/sbin/condor_starter
8/5 10:56:30 ** $CondorVersion: 7.0.2 Jun 9 2008 BuildID: 89891 $
8/5 10:56:30 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
8/5 10:56:30 ** PID = 4039
8/5 10:56:30 ** Log last touched 8/4 23:32:11
8/5 10:56:30 ******************************************************
8/5 10:56:30 Using config source: /usr1/condor/condor_config
8/5 10:56:30 Using local config sources:
8/5 10:56:30 /usr1/condor/condor_config.local
8/5 10:56:30 DaemonCore: Command Socket at <10.20.2.55:55356>
8/5 10:56:30 Done setting resource limits
8/5 10:56:31 condor_read(): recv() returned -1, errno = 104, assuming
failure reading 5 bytes from unknown source.
8/5 10:56:31 IO: Failed to read packet header
8/5 10:56:31 ERROR "Assertion ERROR on (result)" at line 207 in file
NTsenders.C
8/5 10:56:31 ERROR "LocalUserLog::logStarterError() called before init
()" at line 223 in file local_user_log.C
Any idea what is happening? This occurred both before and after
upgrading the schedd to 7.0.4.
Cheers,
Duncan.
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date of creation: Tue Aug 5 10:45:44 2008 (1217951146)
Subject: Actions
Assigned to wenger by wenger
===========================================================================
Date of actions: Tue Aug 5 11:06:12 2008 (1217952372)
Date: Tue, 5 Aug 2008 11:15:20 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-support__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-support #7591] LIGO: problem with startd starting
backfill jobs
Duncan,
> I am seeing the following error in my Startd logs when they try and
> fire up boinc backfill jobs:
>
> 8/5 10:56:30 ******************************************************
> 8/5 10:56:30 ** condor_starter (CONDOR_STARTER) STARTING UP
> 8/5 10:56:30 ** /usr/sbin/condor_starter
> 8/5 10:56:30 ** $CondorVersion: 7.0.2 Jun 9 2008 BuildID: 89891 $
> 8/5 10:56:30 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
> 8/5 10:56:30 ** PID = 4039
> 8/5 10:56:30 ** Log last touched 8/4 23:32:11
> 8/5 10:56:30 ******************************************************
> 8/5 10:56:30 Using config source: /usr1/condor/condor_config
> 8/5 10:56:30 Using local config sources:
> 8/5 10:56:30 /usr1/condor/condor_config.local
> 8/5 10:56:30 DaemonCore: Command Socket at <10.20.2.55:55356>
> 8/5 10:56:30 Done setting resource limits
> 8/5 10:56:31 condor_read(): recv() returned -1, errno = 104, assuming
> failure reading 5 bytes from unknown source.
> 8/5 10:56:31 IO: Failed to read packet header
> 8/5 10:56:31 ERROR "Assertion ERROR on (result)" at line 207 in file
> NTsenders.C
> 8/5 10:56:31 ERROR "LocalUserLog::logStarterError() called before init
> ()" at line 223 in file local_user_log.C
>
> Any idea what is happening? This occurred both before and after
> upgrading the schedd to 7.0.4.
I'm grabbing this because I'm on RUST duty this week, and Todd is on
"grant-writing vacation" right now.
Anyhow, I'll take a look at it. Does this relate to some previous RUST
ticket that Todd handled or something like that?
I don't have any great ideas about this offhand -- I'll have to take at
look at the code where the error is coming from.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Tue Aug 5 11:19:40 2008 (1217953180)
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-support #7591] LIGO: problem with startd starting
backfill jobs
Date: Tue, 5 Aug 2008 12:22:48 -0400
To: condor-support__AT__cs.wisc.edu
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Hi Kent,
Nope, this is a new error. Thanks for looking into it.
Cheers,
Duncan.
On Aug 5, 2008, at 12:19 PM, condor-support response tracking system
wrote:
> Duncan,
>
>> I am seeing the following error in my Startd logs when they try and
>> fire up boinc backfill jobs:
>>
>> 8/5 10:56:30 ******************************************************
>> 8/5 10:56:30 ** condor_starter (CONDOR_STARTER) STARTING UP
>> 8/5 10:56:30 ** /usr/sbin/condor_starter
>> 8/5 10:56:30 ** $CondorVersion: 7.0.2 Jun 9 2008 BuildID: 89891 $
>> 8/5 10:56:30 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
>> 8/5 10:56:30 ** PID = 4039
>> 8/5 10:56:30 ** Log last touched 8/4 23:32:11
>> 8/5 10:56:30 ******************************************************
>> 8/5 10:56:30 Using config source: /usr1/condor/condor_config
>> 8/5 10:56:30 Using local config sources:
>> 8/5 10:56:30 /usr1/condor/condor_config.local
>> 8/5 10:56:30 DaemonCore: Command Socket at <10.20.2.55:55356>
>> 8/5 10:56:30 Done setting resource limits
>> 8/5 10:56:31 condor_read(): recv() returned -1, errno = 104, assuming
>> failure reading 5 bytes from unknown source.
>> 8/5 10:56:31 IO: Failed to read packet header
>> 8/5 10:56:31 ERROR "Assertion ERROR on (result)" at line 207 in file
>> NTsenders.C
>> 8/5 10:56:31 ERROR "LocalUserLog::logStarterError() called before
>> init
>> ()" at line 223 in file local_user_log.C
>>
>> Any idea what is happening? This occurred both before and after
>> upgrading the schedd to 7.0.4.
>
> I'm grabbing this because I'm on RUST duty this week, and Todd is on
> "grant-writing vacation" right now.
>
> Anyhow, I'll take a look at it. Does this relate to some previous
> RUST
> ticket that Todd handled or something like that?
>
> I don't have any great ideas about this offhand -- I'll have to
> take at
> look at the code where the error is coming from.
>
> Kent Wenger
> Condor Team
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu,
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Tue Aug 5 11:22:52 2008 (1217953373)
Date: Tue, 5 Aug 2008 13:08:01 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-support__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-support #7591] LIGO: problem with startd starting
backfill jobs
Duncan,
> I am seeing the following error in my Startd logs when they try and
> fire up boinc backfill jobs:
One other thing -- could you send the job classad of one of the boinc
jobs that's causing this error to show up?
Thanks...
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Tue Aug 5 13:12:56 2008 (1217959976)
From: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-support #7591] LIGO: problem with startd starting
backfill jobs
Date: Tue, 5 Aug 2008 16:04:21 -0400
To: condor-support__AT__cs.wisc.edu
X-Scanner: InterScan AntiVirus for Sendmail
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Hi Kent,
How do I get the classad of a backfill job?
Cheers,
Duncan.
On Aug 5, 2008, at 2:12 PM, condor-support response tracking system
wrote:
> Duncan,
>
>> I am seeing the following error in my Startd logs when they try and
>> fire up boinc backfill jobs:
>
> One other thing -- could you send the job classad of one of the boinc
> jobs that's causing this error to show up?
>
> Thanks...
>
> Kent Wenger
> Condor Team
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu,
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Tue Aug 5 15:04:57 2008 (1217966698)
Date: Wed, 6 Aug 2008 13:54:30 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-support #7591] LIGO: problem with startd starting
backfill jobs
Duncan,
> How do I get the classad of a backfill job?
Ah, yeah, I wasn't thinking too well when I asked that.
Taking another tack, a few more things:
* What previous version of Condor were you running that exhibited this
problem?
* Do you know if this happens *every* time a backfill job is started?
* Can you send us your configuration file that has the backfill-related
configuration?
* Can you send a full StarterLog and StartLog that include the error?
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Wed Aug 6 13:58:02 2008 (1218049083)
Subject: Actions
Ticket resolved by wenger
===========================================================================
Date of actions: Fri Aug 29 13:53:31 2008 (1220036011)