LIGO Support Ticket 17465
Ticket Information
Number: admin 17465
User: anderson@ligo.caltech.edu
Email: skoranda__AT__gravity.phys.uwm.edu
Status: bug
Assigned To: psilord
Date: Sun, 10 Feb 2008 21:56:07 -0800
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
CC: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
Subject: condor_master core dump
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
# condor_version
$CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
$CondorPlatform: I386-LINUX_RHEL3 $
is prone to having the condor_master core dump on startup if you make a
mistake in condor_config and point CONDOR_HOST to an invalid hostname.
The resulting core file does not appear to be that useful, but if you
want it please let me know.
# ls -l core
-rw-r--r-- 1 root root 962560 Feb 10 21:41 core
# file core
core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'condor_master'
# ident core
core:
$CondorPlatform: I386-LINUX_RHEL3 $
$CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
$CondorPlatform: I386-LINUX_RHEL3 $
# gdb /usr/sbin/condor_master core
GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `/usr/sbin/condor_master'.
warning: svr4_current_sos: Can't read pathname for load map: Input/output error
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /usr/lib/libstdc++.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libstdc++.so.5
Reading symbols from /lib/libm.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_dns.so.2
#0 0x0810c426 in WriteCoreDump ()
(gdb) where
#0 0x0810c426 in WriteCoreDump ()
#1 0x080fb552 in linux_sig_coredump ()
#2 0x00cae420 in ?? ()
#3 0x0000000b in ?? ()
#4 0x00000033 in ?? ()
#5 0x00000000 in ?? ()
This occured on an FC4 machine,
# cat /etc/redhat-release
Fedora Core release 4 (Stentz)
# uname -a
Linux ldas-gridmon 2.6.17-1.2143_FC4smp #1 SMP Sat Jul 15 16:21:47 EDT 2006 i686 i686 i386 GNU/Linux
Here is the MasterLog,
2/10 21:41:30 ******************************************************
2/10 21:41:30 ** condor_master (CONDOR_MASTER) STARTING UP
2/10 21:41:30 ** /usr/sbin/condor_master
2/10 21:41:30 ** $CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
2/10 21:41:30 ** $CondorPlatform: I386-LINUX_RHEL3 $
2/10 21:41:30 ** PID = 3941
2/10 21:41:30 ** Log last touched time unavailable (No such file or directory)
2/10 21:41:30 ******************************************************
2/10 21:41:30 Using config source: /usr1/condor/condor_config
2/10 21:41:30 Using local config sources:
2/10 21:41:30 /usr1/condor/condor_config.local
2/10 21:41:30 DaemonCore: Command Socket at <10.12.0.13:58917>
2/10 21:41:30 IPVERIFY: unable to resolve IP address of ldas-condori
2/10 21:41:30 IPVERIFY: unable to resolve IP address of ldas-condori
2/10 21:41:30 IPVERIFY: unable to resolve IP address of ldas-condori
2/10 21:41:30 Log file not found in config file: VIEW_SERVER_LOG
2/10 21:41:30 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup =
3942
2/10 21:41:31 Calling HandleReq <HandleChildAliveCommand> (0)
2/10 21:41:31 Return from HandleReq <HandleChildAliveCommand>
2/10 21:41:35 Failed to start non-blocking update to unknown.
Presumably the case of gethostbyname(CONDOR_HOST) returning HOST_NOT_FOUND
is not properly caught.
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date of creation: Sun Feb 10 23:56:24 2008 (1202709387)
Subject: Actions
Assigned to tannenba by tannenba
===========================================================================
Date of actions: Wed Feb 13 10:21:13 2008 (1202920233)
Subject: Actions
Status changed from new to bug by tannenba
===========================================================================
Date of actions: Wed Feb 13 10:21:13 2008 (1202920242)
Subject: Actions
Assigned to psilord by pfc
===========================================================================
Date of actions: Thu Feb 21 10:32:01 2008 (1203612093)
Date: Sat, 19 Jul 2008 17:31:58 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #17465] condor_master core dump
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Was this fixed. If not, please prepend the string LIGO to the subject so it
shows up at, http://www.cs.wisc.edu/condor/ligo-tickets/index-all.html
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Sat Jul 19 19:32:12 2008 (1216513932)
Subject: Actions
Subject changed from condor_master core dump to LIGO: by psilord
===========================================================================
Date of actions: Mon Aug 4 16:58:34 2008 (1217887114)
Subject: Actions
Subject changed from LIGO: to LIGO by psilord
===========================================================================
Date of actions: Mon Aug 4 16:59:40 2008 (1217887180)
Subject: Actions
Subject changed from LIGO to LIGO\ by psilord
===========================================================================
Date of actions: Mon Aug 4 16:59:58 2008 (1217887198)
Subject: Actions
Subject changed from LIGO\ to LIGO: by psilord
===========================================================================
Date of actions: Mon Aug 4 17:00:20 2008 (1217887220)
Date: Mon, 4 Aug 2008 17:01:43 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #17465] LIGO: condor_master core dump
On Sat, Jul 19, 2008 at 07:32:12PM -0500, condor-admin response tracking system wrote:
> Was this fixed. If not, please prepend the string LIGO to the subject so it
> shows up at, http://www.cs.wisc.edu/condor/ligo-tickets/index-all.html
Not fixed, subject changed.
-pete
===========================================================================
Date mail was appended: Mon Aug 4 17:01:45 2008 (1217887306)