LIGO Support Ticket 17465

Ticket Information
  Number:      admin 17465
  User:        anderson@ligo.caltech.edu
  Email:       skoranda__AT__gravity.phys.uwm.edu
  Status:      bug
  Assigned To: psilord
Date: Sun, 10 Feb 2008 21:56:07 -0800
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
CC: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
Subject: condor_master core dump
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

# condor_version
$CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
$CondorPlatform: I386-LINUX_RHEL3 $

is prone to having the condor_master core dump on startup if you make a
mistake in condor_config and point CONDOR_HOST to an invalid hostname.

The resulting core file does not appear to be that useful, but if you
want it please let me know.

# ls -l core
-rw-r--r--  1 root root 962560 Feb 10 21:41 core

# file core
core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'condor_master'

# ident core
core:
     $CondorPlatform: I386-LINUX_RHEL3 $
     $CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
     $CondorPlatform: I386-LINUX_RHEL3 $

# gdb /usr/sbin/condor_master core
GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".

Core was generated by `/usr/sbin/condor_master'.

warning: svr4_current_sos: Can't read pathname for load map: Input/output error

Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /usr/lib/libstdc++.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libstdc++.so.5
Reading symbols from /lib/libm.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_dns.so.2

#0  0x0810c426 in WriteCoreDump ()
(gdb) where
#0  0x0810c426 in WriteCoreDump ()
#1  0x080fb552 in linux_sig_coredump ()
#2  0x00cae420 in ?? ()
#3  0x0000000b in ?? ()
#4  0x00000033 in ?? ()
#5  0x00000000 in ?? ()

This occured on an FC4 machine,

# cat /etc/redhat-release 
Fedora Core release 4 (Stentz)

# uname -a
Linux ldas-gridmon 2.6.17-1.2143_FC4smp #1 SMP Sat Jul 15 16:21:47 EDT 2006 i686 i686 i386 GNU/Linux


Here is the MasterLog,

2/10 21:41:30 ******************************************************
2/10 21:41:30 ** condor_master (CONDOR_MASTER) STARTING UP
2/10 21:41:30 ** /usr/sbin/condor_master
2/10 21:41:30 ** $CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
2/10 21:41:30 ** $CondorPlatform: I386-LINUX_RHEL3 $
2/10 21:41:30 ** PID = 3941
2/10 21:41:30 ** Log last touched time unavailable (No such file or directory)
2/10 21:41:30 ******************************************************
2/10 21:41:30 Using config source: /usr1/condor/condor_config
2/10 21:41:30 Using local config sources: 
2/10 21:41:30    /usr1/condor/condor_config.local
2/10 21:41:30 DaemonCore: Command Socket at <10.12.0.13:58917>
2/10 21:41:30 IPVERIFY: unable to resolve IP address of ldas-condori
2/10 21:41:30 IPVERIFY: unable to resolve IP address of ldas-condori
2/10 21:41:30 IPVERIFY: unable to resolve IP address of ldas-condori
2/10 21:41:30 Log file not found in config file: VIEW_SERVER_LOG
2/10 21:41:30 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 
3942
2/10 21:41:31 Calling HandleReq <HandleChildAliveCommand> (0)
2/10 21:41:31 Return from HandleReq <HandleChildAliveCommand>
2/10 21:41:35 Failed to start non-blocking update to unknown.


Presumably the case of gethostbyname(CONDOR_HOST) returning HOST_NOT_FOUND
is not properly caught.

Thanks.

-- 
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson

===========================================================================
Date of creation: Sun Feb 10 23:56:24 2008 (1202709387)
Subject: Actions

Assigned to tannenba by tannenba
===========================================================================
Date of actions: Wed Feb 13 10:21:13 2008 (1202920233)
Subject: Actions

Status changed from new to bug by tannenba
===========================================================================
Date of actions: Wed Feb 13 10:21:13 2008 (1202920242)
Subject: Actions

Assigned to psilord by pfc
===========================================================================
Date of actions: Thu Feb 21 10:32:01 2008 (1203612093)
Date: Sat, 19 Jul 2008 17:31:58 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #17465] condor_master core dump
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

Was this fixed. If not, please prepend the string LIGO to the subject so it
shows up at, http://www.cs.wisc.edu/condor/ligo-tickets/index-all.html

Thanks.

-- 
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson

===========================================================================
Date mail was appended: Sat Jul 19 19:32:12 2008 (1216513932)
Subject: Actions

Subject changed from condor_master core dump to LIGO: by psilord
===========================================================================
Date of actions: Mon Aug  4 16:58:34 2008 (1217887114)
Subject: Actions

Subject changed from LIGO: to LIGO by psilord
===========================================================================
Date of actions: Mon Aug  4 16:59:40 2008 (1217887180)
Subject: Actions

Subject changed from LIGO to LIGO\ by psilord
===========================================================================
Date of actions: Mon Aug  4 16:59:58 2008 (1217887198)
Subject: Actions

Subject changed from LIGO\ to LIGO: by psilord
===========================================================================
Date of actions: Mon Aug  4 17:00:20 2008 (1217887220)
Date: Mon, 4 Aug 2008 17:01:43 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #17465] LIGO: condor_master core dump

On Sat, Jul 19, 2008 at 07:32:12PM -0500, condor-admin response tracking system wrote:
> Was this fixed. If not, please prepend the string LIGO to the subject so it
> shows up at, http://www.cs.wisc.edu/condor/ligo-tickets/index-all.html

Not fixed, subject changed.

-pete

===========================================================================
Date mail was appended: Mon Aug  4 17:01:45 2008 (1217887306)