LIGO Support Ticket 18994

Ticket Information
  Number:      admin 18994
  User:        anderson@ligo.caltech.edu
  Email:       
  Status:      resolved
  Assigned To: nleroy
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: LIGO: condor_startd core dump
Date: Mon, 9 Feb 2009 09:27:26 -0800
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

The startd from version,

[root@node154 log]# condor_version
$CondorVersion: 7.2.0 Dec 19 2008 BuildID: 121001 $
$CondorPlatform: X86_64-LINUX_RHEL3 $

generated a core dump file on an FC4 x86_64 machine when the LOG  
filesystem ran out of free space. Since Condor was going to shutdown  
with exit status 44 anyways, this is not a high priority bug, but here  
is the log file entry and the stack trace form the core file.

Thanks.


[root@node154 log]# tail StartLog
2/9 07:20:28 Received TCP command 404 (DEACTIVATE_CLAIM_FORCIBLY)  
from  <10.14.0.18:34143>, access level DAEMON
2/9 07:20:28 Calling HandleReq <command_handler> (0)
2/9 07:20:28 slot4: Called deactivate_claim_forcibly()
2/9 07:20:28 Return from HandleReq <command_handler> (handler: 0.000s,  
sec: 0.000s)
2/9 07:20:28 Return from Handler <DaemonCore::HandleReqSocketHandler>
2/9 07:20:28 Received UDP command 443 (RELEASE_CLAIM) from   
<10.14.0.18:35175>, access level DAEMON
2/9 07:20:28 Calling HandleReq <command_release_claim> (0)
2/9 07:20:28 slot4: State change: received RELEASE_CLAIM command
2/9 07:20:28 slot4: Changing state and activity: Claimed/Idle ->  
Preempting/Vacating
2/9 07:20:2[root@node154 log]#
[root@node154 log]#
[root@node154 log]# gdb  /usr/sbin/condor_startd core
GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for  
details.
This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging  
symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".

Core was generated by `condor_startd -f'.
Reading symbols from /lib64/libdl.so.2...(no debugging symbols  
found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols  
found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols  
found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /usr/lib64/libstdc++.so.5...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.5
Reading symbols from /lib64/libm.so.6...(no debugging symbols  
found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols  
found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging  
symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols  
found)...done.
Loaded symbols for /lib64/libnss_files.so.2

#0  0x0000000000520f20 in WriteCoreDump ()
(gdb) where
#0  0x0000000000520f20 in WriteCoreDump ()
#1  0x000000000051933f in linux_sig_coredump ()
#2  <signal handler called>
#3  0x00000000004be450 in Claim::vacate ()
#4  0x00000000004d3def in Resource::leave_preempting_state ()
#5  0x00000000004d0569 in ResState::enter_action ()
#6  0x00000000004cf66f in ResState::change ()
#7  0x00000000004d7365 in Resource::change_state ()
#8  0x00000000004d2f0d in Resource::kill_claim ()
#9  0x00000000004cb5bb in ResMgr::walk ()
#10 0x00000000004e20e8 in do_cleanup ()
#11 0x0000000000525076 in _condor_dprintf_exit ()
#12 0x0000000000524764 in debug_unlock ()
#13 0x0000000000524173 in _condor_dprintf_va ()
#14 0x00000000004d60da in Resource::dprintf_va ()
#15 0x00000000004d61f5 in Resource::dprintf ()
#16 0x00000000004d3faa in Resource::leave_preempting_state ()
#17 0x00000000004d05ec in ResState::enter_action ()
#18 0x00000000004cf66f in ResState::change ()
#19 0x00000000004d7365 in Resource::change_state ()
#20 0x00000000004d2e4f in Resource::release_claim ()
#21 0x00000000004db1bb in command_release_claim ()
#22 0x000000000050ceed in DaemonCore::HandleReq ()
#23 0x0000000000509812 in DaemonCore::HandleReq ()
#24 0x00000000005092bb in DaemonCore::CallSocketHandler ()
#25 0x0000000000508ef4 in DaemonCore::Driver ()
#26 0x000000000051bcb6 in main ()


--
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson




===========================================================================
Date of creation: Mon Feb  9 11:27:34 2009 (1234200456)
Subject: Actions

Assigned to nleroy by nleroy
===========================================================================
Date of actions: Mon Feb  9  9:04:53 2009 (1234201496)
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18994] LIGO: condor_startd core dump
Date: Mon, 9 Feb 2009 09:45:56 -0800
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

The core file may be found at,
http://www.ligo.caltech.edu/~anderson/condor.18994

Thanks.

--
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson




===========================================================================
Date mail was appended: Mon Feb  9 11:46:01 2009 (1234201561)
Subject: Comments added

made note of this in gittrac #323

Comments added by tannenba

===========================================================================
Date comments were added: Fri Mar 13 16:24:08 2009 (1236979449)
Subject: Actions

Ticket resolved by tannenba
===========================================================================
Date of actions: Fri Mar 13 16:24:30 2009 (1236979470)