LIGO Support Ticket 18999

Ticket Information
  Number:      admin 18999
  User:        dabrown@physics.syr.edu
  Email:       anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu
  Status:      pending
  Assigned To: psilord
CC: Stuart Anderson <anderson__AT__ligo.caltech.edu>,        Scott Koranda
 <skoranda__AT__gravity.phys.uwm.edu>
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Fwd: [Condor] Problem sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 08:53:58 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-02-10_06:2009-02-10,2009-02-10,2009-02-09 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

--Apple-Mail-3-416284763

Hi,

My schedd just crashed with the error below. The only thing that I did  
that might have caused this is putting some parallel universe jobs on  
hold.

Cheers,
Duncan.

Begin forwarded message:

> From: Condor <condor__AT__sugar.phy.syr.edu>
> Date: February 10, 2009 8:12:42 AM EST
> To: dabrown__AT__physics.syr.edu
> Subject: [Condor] Problem sugar.phy.syr.edu: condor_schedd exited (1)
>
> This is an automated email from the Condor system
> on machine "sugar.phy.syr.edu".  Do not reply.
>
> "/usr/sbin/condor_schedd" on "sugar.phy.syr.edu" exited with status 1.
> Condor will automatically restart this process in 10 seconds.
>
> *** Last 20 line(s) of file /usr1/condor/log/SchedLog:
> 2/10 08:12:08 (pid:5229) ZKM: setting default map to condor@sugar
> 2/10 08:12:08 (pid:5229) ZKM: post-map: current user is 'condor'
> 2/10 08:12:08 (pid:5229) ZKM: post-map: current domain is 'sugar'
> 2/10 08:12:08 (pid:5229) ZKM: post-map: current FQU is 'condor@sugar'
> Stack dump for process 5229 at timestamp 1234271529 (15 frames)
> condor_schedd(dprintf_dump_stack+0xc0)[0x5ada4b]
> condor_schedd(_Z18linux_sig_coredumpi+0x16)[0x5a13ba]
> /lib64/libc.so.6[0x36aee301b0]
> condor_schedd(_ZN8AttrList9ResetNameEv+0x14)[0x5377f2]
> condor_schedd(_Z18dollarDollarExpandiiP7ClassAdS0_+0x11ba)[0x532f64]
> condor_schedd(_Z8GetJobAdiib+0x11c)[0x533af2]
> condor_schedd(_Z12do_Q_requestP8ReliSockRb+0x1e8e)[0x53ab30]
> condor_schedd(_Z8handle_qP7ServiceiP6Stream+0xef)[0x535b59]
> condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x3959)[0x598827]
> condor_schedd(_ZN10DaemonCore9HandleReqEi+0x36)[0x598bb6]
> condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x2c7)[0x5991af]
> condor_schedd(_ZN10DaemonCore6DriverEv+0x172e)[0x59a9fa]
> condor_schedd(main+0x18a2)[0x5a3610]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x36aee1d8b4]
> condor_schedd(__gxx_personality_v0+0x319)[0x4f7b19]
> *** End of file SchedLog
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Questions about this message or Condor in general?
> Email address of the local Condor administrator: dabrown__AT__physics.syr.edu
> The Official Condor Homepage is http://www.cs.wisc.edu/condor

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




--Apple-Mail-3-416284763

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMDEz
NTM1OVowIwYJKoZIhvcNAQkEMRYEFJhjxfYf/J8ecovUC/hntHQcA3p2MH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQB1fixmu3Xk4sdcxEeUf+qEw0YEIoXTiLJTqH9JA32yR0kUrIpKhp/6
W1VodPWGIeqfMwTN9+V2u+0lmdgtyNqEwqWvAptX+MKHbd7M7keBrsptqwul
yImTpfCJH3i4SEMeieR7Fi0bTnwagMPgzAe9aKQ/9JXvOSdNIpqNrKGXiojH
0xV2VyOohTyj5ypRg3ngUqnTGASIV6rClX2r+OaDCe39Q3VZosxgquvPri1w
saEc4oNRGy8pdNL2odEhAFzE71if65CESt4s0hsPj7pJ6uRR3YwpidHsy0Ms
qosbz+Oqtv+mOjFjuIFBaaOiMzS+XQe02s1vMqxFLNUIlbf0AAAAAAAA

--Apple-Mail-3-416284763--

===========================================================================
Date of creation: Tue Feb 10  7:54:09 2009 (1234274051)
Subject: Actions

Assigned to wenger by wenger
===========================================================================
Date of actions: Tue Feb 10 10:42:08 2009 (1234284129)
Date: Tue, 10 Feb 2009 14:07:29 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
 condor_schedd exited (1)

Duncan,

> My schedd just crashed with the error below. The only thing that I did
> that might have caused this is putting some parallel universe jobs on
> hold.

Hmm, interesting.

Did it create a core file?  If so, getting that would help a lot.

Also, what version/platform is your schedd?

Kent Wenger
Condor Team

===========================================================================
Date mail was appended: Tue Feb 10 14:07:32 2009 (1234296453)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 17:36:31 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-02-10_13:2009-02-10,2009-02-10,2009-02-10 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

--Apple-Mail-17-447637525

Hi Kent,

On Feb 10, 2009, at 3:07 PM, condor-admin response tracking system  
wrote:
>> My schedd just crashed with the error below. The only thing that I  
>> did
>> that might have caused this is putting some parallel universe jobs on
>> hold.
>
> Hmm, interesting.
>
> Did it create a core file?  If so, getting that would help a lot.

I should probably know this, but where does the schedd dump core?

> Also, what version/platform is your schedd?

[root@sugar log]# rpm -qa | grep condor
condor-7.2.0-1

It's the x86_64 RHEL5 shared RPM.

[root@sugar log]# uname -a
Linux sugar.phy.syr.edu 2.6.18-92.1.6.el5xen #1 SMP Wed Jun 25  
14:13:10 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@sugar log]# rpm -qa | grep glibc
glibc-headers-2.5-24
compat-glibc-2.3.4-2.26
glibc-2.5-24
glibc-devel-2.5-24
glibc-2.5-18
compat-glibc-2.3.4-2.26
glibc-common-2.5-24
compat-glibc-headers-2.3.4-2.26
glibc-devel-2.5-24
glibc-2.5-24

Cheers,
Duncan.

>

>
> Kent Wenger
> Condor Team
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
> ,skoranda__AT__gravity.phys.uwm.edu

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




--Apple-Mail-17-447637525

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMDIy
MzYzMVowIwYJKoZIhvcNAQkEMRYEFAjLJvQrvR+y19mfTvXeNHHJ/EKMMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQAI6HxWL+NQHbWoXR1vbILbTmLO7CgQq9VpMnDohESD5JumGhJ18j1J
Cw0AY97Jj0SymjwKxyWFjsVdlekn+Ly3JYrqD5yGr6GXZQWeHI1xNRqjeyGZ
o5ynOW4ZFzwvKdJiVO8sqSjaYusQHbWLV9ztkqGb/nf4qE/lLkslOcMHNS+d
DaaQ/RD4bcHpWTydelDgqmS3aGejl3EwDbhlhYOUk+LgkOAiarasCFRq9hXQ
Cw29ahrUhN5i3P9twVZmaOUCJPthmllu81XSQH5p0S6a1hmq9MjmMo1VwvOP
ssWpd3cNQa5wemH+Bijw1Twz4SWmGck20hSUYUnETI44zMYfAAAAAAAA

--Apple-Mail-17-447637525--

===========================================================================
Date mail was appended: Tue Feb 10 16:36:38 2009 (1234305398)
CC: condor-admin__AT__cs.wisc.edu, skoranda__AT__gravity.phys.uwm.edu
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 14:44:10 -0800
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu

It is probably,

[root@ldas-grid log]# lsof -p `pgrep condor_schedd` | grep cwd
condor_sc 27020 condor  cwd    DIR      8,21       4096   9124417 / 
usr1/condor/log

Thanks.


On Feb 10, 2009, at 2:36 PM, Duncan Brown wrote:

> Hi Kent,
>
> On Feb 10, 2009, at 3:07 PM, condor-admin response tracking system  
> wrote:
>>> My schedd just crashed with the error below. The only thing that I  
>>> did
>>> that might have caused this is putting some parallel universe jobs  
>>> on
>>> hold.
>>
>> Hmm, interesting.
>>
>> Did it create a core file?  If so, getting that would help a lot.
>
> I should probably know this, but where does the schedd dump core?
>
>> Also, what version/platform is your schedd?
>
> [root@sugar log]# rpm -qa | grep condor
> condor-7.2.0-1
>
> It's the x86_64 RHEL5 shared RPM.
>
> [root@sugar log]# uname -a
> Linux sugar.phy.syr.edu 2.6.18-92.1.6.el5xen #1 SMP Wed Jun 25  
> 14:13:10 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
> [root@sugar log]# rpm -qa | grep glibc
> glibc-headers-2.5-24
> compat-glibc-2.3.4-2.26
> glibc-2.5-24
> glibc-devel-2.5-24
> glibc-2.5-18
> compat-glibc-2.3.4-2.26
> glibc-common-2.5-24
> compat-glibc-headers-2.3.4-2.26
> glibc-devel-2.5-24
> glibc-2.5-24
>
> Cheers,
> Duncan.
>
>>
>
>>
>> Kent Wenger
>> Condor Team
>>
>>
>> ========================================
>> MESSAGE INFORMATION
>> ========================================
>> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
>> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
>> ,skoranda__AT__gravity.phys.uwm.edu
>
> -- 
>
> Duncan Brown                          Room 263-1, Department of  
> Physics,
> Assistant Professor of Physics        Syracuse University, NY 13244,  
> USA
> Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan
>
>
>

--
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson




===========================================================================
Date mail was appended: Tue Feb 10 16:44:16 2009 (1234305857)
CC: condor-admin__AT__cs.wisc.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 17:55:32 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-02-10_13:2009-02-10,2009-02-10,2009-02-10 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu

--Apple-Mail-19-448778450

No core file that I can see:

[root@sugar log]# ls -l
total 5926356
-rw------- 1 condor condor          0 Feb 10 11:11 InstanceLock
-rw-r--r-- 1 condor condor    4641494 Feb 10 17:38 MasterLog
prw------- 1 condor condor          0 Feb 10 17:54 procd_pipe.SCHEDD
prw------- 1 condor condor          0 Feb 10 17:54  
procd_pipe.SCHEDD.watchdog
-rw-r--r-- 1 condor condor 1189186493 Feb 10 17:54 SchedLog
-rw-r--r-- 1 condor condor 2000000096 Feb  5 15:44 SchedLog.old
-rw-r----- 1 condor condor          0 Apr 29  2008 ShadowLock
-rw-r--r-- 1 condor condor  846112564 Feb 10 17:54 ShadowLog
-rw-r--r-- 1 condor condor 2000000168 Nov 18 14:26 ShadowLog.old
-rw-r--r-- 1 condor condor   22677050 Jan 31 02:12 StarterLog

[root@sugar log]# condor_version
$CondorVersion: 7.2.0 Dec 19 2008 BuildID: 121001 $
$CondorPlatform: X86_64-LINUX_RHEL5 $

Cheers,
Duncan.

On Feb 10, 2009, at 5:44 PM, Stuart Anderson wrote:

> It is probably,
>
> [root@ldas-grid log]# lsof -p `pgrep condor_schedd` | grep cwd
> condor_sc 27020 condor  cwd    DIR      8,21       4096   9124417 / 
> usr1/condor/log
>
> Thanks.
>
>
> On Feb 10, 2009, at 2:36 PM, Duncan Brown wrote:
>
>> Hi Kent,
>>
>> On Feb 10, 2009, at 3:07 PM, condor-admin response tracking system  
>> wrote:
>>>> My schedd just crashed with the error below. The only thing that  
>>>> I did
>>>> that might have caused this is putting some parallel universe  
>>>> jobs on
>>>> hold.
>>>
>>> Hmm, interesting.
>>>
>>> Did it create a core file?  If so, getting that would help a lot.
>>
>> I should probably know this, but where does the schedd dump core?
>>
>>> Also, what version/platform is your schedd?
>>
>> [root@sugar log]# rpm -qa | grep condor
>> condor-7.2.0-1
>>
>> It's the x86_64 RHEL5 shared RPM.
>>
>> [root@sugar log]# uname -a
>> Linux sugar.phy.syr.edu 2.6.18-92.1.6.el5xen #1 SMP Wed Jun 25  
>> 14:13:10 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
>> [root@sugar log]# rpm -qa | grep glibc
>> glibc-headers-2.5-24
>> compat-glibc-2.3.4-2.26
>> glibc-2.5-24
>> glibc-devel-2.5-24
>> glibc-2.5-18
>> compat-glibc-2.3.4-2.26
>> glibc-common-2.5-24
>> compat-glibc-headers-2.3.4-2.26
>> glibc-devel-2.5-24
>> glibc-2.5-24
>>
>> Cheers,
>> Duncan.
>>
>>>
>>
>>>
>>> Kent Wenger
>>> Condor Team
>>>
>>>
>>> ========================================
>>> MESSAGE INFORMATION
>>> ========================================
>>> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
>>> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
>>> ,skoranda__AT__gravity.phys.uwm.edu
>>
>> -- 
>>
>> Duncan Brown                          Room 263-1, Department of  
>> Physics,
>> Assistant Professor of Physics        Syracuse University, NY  
>> 13244, USA
>> Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan
>>
>>
>>
>
> --
> Stuart Anderson  anderson__AT__ligo.caltech.edu
> http://www.ligo.caltech.edu/~anderson
>
>

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




--Apple-Mail-19-448778450

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMDIy
NTUzMlowIwYJKoZIhvcNAQkEMRYEFOjUzICWrW5uZMQGa7pW8tYPkV7HMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQBWxJ4LQSuSY7EsJQzvgdikm2hUDqJAtpxIPIbaLdC8iLwet3+tjxUs
xKtlw6bu7Qs1cjWesR3an8Yf5CROI0/rIt4mAnYUteA87OKr1JseV+Diw8lm
+foR0oVmaNbh3/Mps5Ia6vEGqrGABB6yespykeYWvD/4d05YFJkhDXciUhDH
8xDOfLUtgc95sK+i3b0M007W402m5tbZq3O9roVdM4LmmCC0lMpnKRbrxP7P
4kcbd6jq79u3ZhJxH6bXLetHkrHliJtbDkr6HykRPSizaXtRgL9oGoZrzBNv
rdvGCcLw+u5oM7eUXQuOZ8c9AV6fBoygP97xdCe0fIsvQTMzAAAAAAAA

--Apple-Mail-19-448778450--

===========================================================================
Date mail was appended: Tue Feb 10 16:55:45 2009 (1234306545)
CC: condor-admin__AT__cs.wisc.edu, skoranda__AT__gravity.phys.uwm.edu
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 20:45:44 -0800
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu


On Feb 10, 2009, at 2:55 PM, Duncan Brown wrote:

> No core file that I can see:


Duncan,
	If you are starting Condor from a shell with "ulimit -c 0" to avoid  
lots of user generated core files from polluting your filesystem(s),  
you might want to add "CREATE_CORE_FILES	= True" to your condor  
configuration file so you can still get condor daemon core files.

Thanks.


--
Stuart Anderson  anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson




===========================================================================
Date mail was appended: Tue Feb 10 22:45:54 2009 (1234327554)
CC: condor-admin__AT__cs.wisc.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Thu, 12 Feb 2009 10:28:59 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-02-12_05:2009-02-10,2009-02-12,2009-02-12 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

--Apple-Mail-18-594785878

Hi Stuart,

I already have that:

CREATE_CORE_FILES       = True

Cheers,
Duncan.

On Feb 10, 2009, at 11:45 PM, Stuart Anderson wrote:

>
> On Feb 10, 2009, at 2:55 PM, Duncan Brown wrote:
>
>> No core file that I can see:
>
>
> Duncan,
> 	If you are starting Condor from a shell with "ulimit -c 0" to avoid  
> lots of user generated core files from polluting your filesystem(s),  
> you might want to add "CREATE_CORE_FILES	= True" to your condor  
> configuration file so you can still get condor daemon core files.
>
> Thanks.
>
>
> --
> Stuart Anderson  anderson__AT__ligo.caltech.edu
> http://www.ligo.caltech.edu/~anderson
>
>

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




--Apple-Mail-18-594785878

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMjE1
Mjg2MFowIwYJKoZIhvcNAQkEMRYEFIddrZ3OqnphAkNu8UxkZAbW2km0MH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQDoWEC4ofnqDHlcBnaNxjh5flt7J24fiIKNa3wxsSliaVN+g3kve/3O
q0D8enMFaoP8LB4ngJWSHee8O6HMeHq+8ogdGJLyTogfCol6NcmsNQH9gdFK
AuNj9pe6IOrCFbji92ndlsXy7ZPygJDYXgy3SNAiw4nRvwaXIOa0mlMV2Y2I
RKgFh/3Yy9JHSGzVhmT0baWiHH2qdSS/dQry5nqCZUNJ9kQL5d1MTzCYTPZy
NG9PuUjLgb9/qzXPTqoKSNR8ZMubHySYFg2r+JUdiTCaxU5z7F6UN2Wx5l/s
u6LCSA/rbXqyg0PBs8rNpJiC0EOwkWEUscot2wmCmhDRdoPeAAAAAAAA

--Apple-Mail-18-594785878--

===========================================================================
Date mail was appended: Thu Feb 12  9:29:05 2009 (1234452545)
Date: Thu, 12 Feb 2009 09:54:54 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
 condor_schedd exited (1)

Duncan,

> My schedd just crashed with the error below. The only thing that I did
> that might have caused this is putting some parallel universe jobs on
> hold.

Sorry it's taken a while for me to make much progress on this.  I was 
frantically fixing DAGMan bugs prior to the 7.3.0 release the last couple 
of days.  It seems like that's under control now, so I'm taking a look at 
your problem again.

Kent Wenger
Condor Team

===========================================================================
Date mail was appended: Thu Feb 12  9:54:57 2009 (1234454097)
Date: Thu, 12 Feb 2009 14:55:22 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
 condor_schedd exited (1)

Duncan,

> My schedd just crashed with the error below. The only thing that I did
> that might have caused this is putting some parallel universe jobs on
> hold.

It sounds like you've seen this only once -- is that right?  We're 
thinking that it is a combination of $$ expansion and parallel universe -- 
that may not be something that we test very well.

Kent Wenger
Condor Team

===========================================================================
Date mail was appended: Thu Feb 12 14:55:24 2009 (1234472124)
Subject: Actions

Assigned to gthain by wenger
===========================================================================
Date of actions: Thu Feb 12 15:00:04 2009 (1234472404)
Date: Thu, 12 Feb 2009 15:00:47 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
 condor_schedd exited (1)

Duncan,

I'm sending this ticket over to Greg Thain -- he's more expert on the 
schedd than I am.

Kent Wenger
Condor Team

===========================================================================
Date mail was appended: Thu Feb 12 15:00:49 2009 (1234472450)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
 condor_schedd exited (1)
Date: Thu, 12 Feb 2009 16:59:09 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-02-12_06:2009-02-10,2009-02-12,2009-02-12 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu

--Apple-Mail-25-618195306

Yes, this was a one-off when we had parallel universe jobs running and  
I did a condor_hold.

Cheers,
Duncan.

On Feb 12, 2009, at 3:55 PM, condor-admin response tracking system  
wrote:

> Duncan,
>
>> My schedd just crashed with the error below. The only thing that I  
>> did
>> that might have caused this is putting some parallel universe jobs on
>> hold.
>
> It sounds like you've seen this only once -- is that right?  We're
> thinking that it is a combination of $$ expansion and parallel  
> universe --
> that may not be something that we test very well.
>
> Kent Wenger
> Condor Team
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
> ,skoranda__AT__gravity.phys.uwm.edu

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




--Apple-Mail-25-618195306

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMjIx
NTkwOVowIwYJKoZIhvcNAQkEMRYEFEv3ovfZ7p2d8ID3lQRMB/vcB+ZIMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQB/+A+zE+4Rh9AdiXCCpKgx85XZVTa7os64wKSL7C9Z12VUuaGbamMk
XIpKb3qphpyet0Ohdv+LZJ1S/aSwGW6ktM710JTm2Zz+s6aRpuJjx5T7tkkZ
rpSn9QAR4fIoEfydvndzhat99G8a9NerbQzjXbCIp/Cxj0ZdiEtfIeqQgy65
F2fOZUxXx70bwAWVDLSiH5oXeIfh3rKDq0VX/hvNa+fh4yyUFWtNdBuYMl9C
keAiBwHFkpmUNiORYVjO48Nzk2MJJCpfUUSUbMtOyMo2wmcJcvkB6SyW6rHd
jSzDMVAR0vVMTUpfe8TOzCVvE1ZvcWWDz2C3wVPs6QtyuDVeAAAAAAAA

--Apple-Mail-25-618195306--

===========================================================================
Date mail was appended: Thu Feb 12 15:59:16 2009 (1234475957)
Date: Wed, 18 Feb 2009 16:48:47 -0600
From: Greg Thain <gthain__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
 condor_schedd exited (1)


>>> My schedd just crashed with the error below. The only thing that I  
>>> did
>>> that might have caused this is putting some parallel universe jobs on
>>> hold.
>>>       
Duncan:

Well, holding and releasing parallel jobs with $$ expressions _should_ 
work.  I've double checked the code and tested it here on a simple job, 
and it seemed to work.  There might be something specific to the $$ 
expression that could cause a problem -- do you still have the job ad 
around, and if so could you send me that condor_history -l xxx of it, 
where xxx is the job id?

Thanks,

_greg

===========================================================================
Date mail was appended: Wed Feb 18 16:48:51 2009 (1234997332)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Sun, 15 Mar 2009 09:10:35 -0700
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-03-15_01:2009-03-13,2009-03-15,2009-03-15 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0
 reason=mlx engine=5.0.0-0811170000 definitions=main-0903150119
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

Hi Greg,

Sorry for the delay in replying to you. As punishment, my schedd  
crashed three times in the last few days with the same error:

This is an automated email from the Condor system
on machine "sugar.phy.syr.edu".  Do not reply.

"/usr/sbin/condor_schedd" on "sugar.phy.syr.edu" exited with status 1.
Condor will automatically restart this process in 10 seconds.

*** Last 20 line(s) of file /usr1/condor/log/SchedLog:
3/14 10:12:34 (pid:4093) ZKM: setting default map to condor@sugar
3/14 10:12:34 (pid:4093) ZKM: post-map: current user is 'condor'
3/14 10:12:34 (pid:4093) ZKM: post-map: current domain is 'sugar'
3/14 10:12:34 (pid:4093) ZKM: post-map: current FQU is 'condor@sugar'
Stack dump for process 4093 at timestamp 1237039956 (15 frames)
condor_schedd(dprintf_dump_stack+0xc0)[0x5ada4b]
condor_schedd(_Z18linux_sig_coredumpi+0x16)[0x5a13ba]
/lib64/libc.so.6[0x36aee301b0]
condor_schedd(_ZN8AttrList9ResetNameEv+0x14)[0x5377f2]
condor_schedd(_Z18dollarDollarExpandiiP7ClassAdS0_+0x11ba)[0x532f64]
condor_schedd(_Z8GetJobAdiib+0x11c)[0x533af2]
condor_schedd(_Z12do_Q_requestP8ReliSockRb+0x1e8e)[0x53ab30]
condor_schedd(_Z8handle_qP7ServiceiP6Stream+0xef)[0x535b59]
condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x3959)[0x598827]
condor_schedd(_ZN10DaemonCore22HandleReqSocketHandlerEP6Stream+0x94) 
[0x59aaea]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x28c)[0x599174]
condor_schedd(_ZN10DaemonCore6DriverEv+0x172e)[0x59a9fa]
condor_schedd(main+0x18a2)[0x5a3610]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x36aee1d8b4]
condor_schedd(__gxx_personality_v0+0x319)[0x4f7b19]
*** End of file SchedLog

I don't think anyone was putting jobs on hold at the time. I'll check  
with the user to see if they did anything funny. Where should I look  
for more debugging information?

Cheers,
Duncan.

On Feb 18, 2009, at 2:48 PM, condor-admin response tracking system  
wrote:

>
>>>> My schedd just crashed with the error below. The only thing that I
>>>> did
>>>> that might have caused this is putting some parallel universe  
>>>> jobs on
>>>> hold.
>>>>
> Duncan:
>
> Well, holding and releasing parallel jobs with $$ expressions _should_
> work.  I've double checked the code and tested it here on a simple  
> job,
> and it seemed to work.  There might be something specific to the $$
> expression that could cause a problem -- do you still have the job ad
> around, and if so could you send me that condor_history -l xxx of it,
> where xxx is the job id?
>
> Thanks,
>
> _greg
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Greg Thain <gthain__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
> ,skoranda__AT__gravity.phys.uwm.edu

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




===========================================================================
Date mail was appended: Sun Mar 15 11:10:46 2009 (1237133447)
Date: Mon, 16 Mar 2009 09:34:10 -0500
From: Greg Thain <gthain__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)

condor-admin response tracking system wrote:
> Hi Greg,
>
> Sorry for the delay in replying to you. As punishment, my schedd  
> crashed three times in the last few days with the same error:
I'd like to see a bit more of the schedd -- say 100 lines before the 
crash.  Do you know which job is?  I'd like to see a copy of the job ad, 
too (e.g. with condor_history -l)

-Greg


===========================================================================
Date mail was appended: Mon Mar 16  9:34:15 2009 (1237214055)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Thu, 26 Mar 2009 12:10:36 -0400
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu

--Apple-Mail-17--68884715

Hi Greg,

Could you send me an ssh public key and I'll give you access to the  
log files?

Cheers,
Duncan.

On Mar 16, 2009, at 10:34 AM, condor-admin response tracking system  
wrote:

> condor-admin response tracking system wrote:
>> Hi Greg,
>>
>> Sorry for the delay in replying to you. As punishment, my schedd
>> crashed three times in the last few days with the same error:
> I'd like to see a bit more of the schedd -- say 100 lines before the
> crash.  Do you know which job is?  I'd like to see a copy of the job  
> ad,
> too (e.g. with condor_history -l)
>
> -Greg
>
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Greg Thain <gthain__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
> ,skoranda__AT__gravity.phys.uwm.edu

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




--Apple-Mail-17--68884715

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDMyNjE2
MTAzN1owIwYJKoZIhvcNAQkEMRYEFC2s2ZTgo9lrqSnSBuo+YLcHD0e2MH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQAKdQy64ng1dq8+X0GhQj4egTHNCijnGvXR36T3ol2oQj9p0KXXLpW5
QLS5B1+TrDonOEGHIKeNH3lzS0Gwjr781/ad/mULgDXSeGimFZyeaD4ul5D3
IY8ZvLGYYx0QTc5LH217qpzPJusgVLnPMuUkaXCR+rlveiMxvCSXi3WvEwWr
1oh7ZRkuknyMsnPoQYptVCrxsTD92ub1daS3n7wdOvpYVQQOw6RU696pqsnM
JqB2biAPr5dPJb5CyjxQv+vwGaEg9mH9nRMMI9zs0TlO5rwRvWql8iTBunCY
L8gtI9GtxSg+lRxsig7r7E/TmLLQZXx5mJ00dVFhU0Im9Uv4AAAAAAAA

--Apple-Mail-17--68884715--

===========================================================================
Date mail was appended: Thu Mar 26 11:17:01 2009 (1238084221)
Date: Thu, 26 Mar 2009 11:26:58 -0500
From: Greg Thain <gthain__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)

This is a multi-part message in MIME format.
--------------000409030106040305010501

condor-admin response tracking system wrote:
> --Apple-Mail-17--68884715
> Content-Type: text/plain;
> 	charset=US-ASCII;
> 	format=flowed;
> 	delsp=yes
> Content-Transfer-Encoding: 7bit
>
> Hi Greg,
>
> Could you send me an ssh public key and I'll give you access to the  
> log files?
>
>   
Attached.

-Greg


--------------000409030106040305010501

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvOYdASjK8r6a0w9BTOD5Pr/U8IJbYOji4IxGY5OCiph3bAkMRcxm2ULbzvaM5ByBSGY0vtBRjRomGMi2ZQB3wc3byYAlMoJEK4e1RJGygmB3WUocj1e9to0I+ZZmFYPONIAMlOB8pIVGinZ4B4c9Eweaex3qzQg7Uq1xoR9c+jE= gthain__AT__chevre.cs.wisc.edu

--------------000409030106040305010501--

===========================================================================
Date mail was appended: Thu Mar 26 11:27:03 2009 (1238084823)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Thu, 26 Mar 2009 12:33:38 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-03-26_09:2009-03-25,2009-03-26,2009-03-26 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

--Apple-Mail-23--67502898

Thanks. You have root__AT__condor.phy.syr.edu (the central manager) and  
sugar.phy.syr.edu (the schedd). The config files are in /opt/condor  
and the log files are in /usr1/condor

Cheers,
Duncan.

On Mar 26, 2009, at 12:27 PM, condor-admin response tracking system  
wrote:

> This is a multi-part message in MIME format.
> --------------000409030106040305010501
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
>
> condor-admin response tracking system wrote:
>> --Apple-Mail-17--68884715
>> Content-Type: text/plain;
>> 	charset=US-ASCII;
>> 	format=flowed;
>> 	delsp=yes
>> Content-Transfer-Encoding: 7bit
>>
>> Hi Greg,
>>
>> Could you send me an ssh public key and I'll give you access to the
>> log files?
>>
>>
> Attached.
>
> -Greg
>
>
> --------------000409030106040305010501
> Content-Type: text/plain;
> name="id_rsa.pub"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
> filename="id_rsa.pub"
>
> ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvOYdASjK8r6a0w9BTOD5Pr/ 
> U8IJbYOji4IxGY5OCiph3bAkMRcxm2ULbzvaM5ByBSGY0vtBRjRomGMi2ZQB3wc3byYAlMoJEK4e1RJGygmB3WUocj1e9to0I 
> +ZZmFYPONIAMlOB8pIVGinZ4B4c9Eweaex3qzQg7Uq1xoR9c+jE= gthain__AT__chevre.cs.wisc.edu
>
> --------------000409030106040305010501--
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Greg Thain <gthain__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
> ,skoranda__AT__gravity.phys.uwm.edu

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




--Apple-Mail-23--67502898

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDMyNjE2
MzMzOFowIwYJKoZIhvcNAQkEMRYEFMRRHG2+LDz1gmBVuXFwhKQwTu/aMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQCQQGOl6F4rJONBO+lv63jrRl2H/WC1r6TqPMt0B2rI0dJCjXY2fRqn
myCB6RjTCTMTD8lvtcqozKzYypFHdwMfMBIUNDm4R1INegHd7uw8mPel1DSb
MxSU6S7uZXPpuH2AqYKVvu2Dtw7FeqC89GkJlDZjVe4/dKzJsG+7JqbcQd1+
Y53dMd5axDyCV7K53dngXXU/qO6FRzgwxrLiG1DxjDnfxIMsDuXXE4ZqAUIA
0Oe0e5Dg/eUFKvA1xrZL8ddChQ/Cbw8LEAWYTj2iQLN2wnIiONdHa1Vu8C/U
797aA88p7OSDRa6kAWMTsjKAlDNzkUN4vqe/SkrQnR+55Kd2AAAAAAAA

--Apple-Mail-23--67502898--

===========================================================================
Date mail was appended: Thu Mar 26 11:33:42 2009 (1238085223)
CC: condor-admin__AT__cs.wisc.edu, anderson__AT__ligo.caltech.edu,
 skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Wed, 6 May 2009 07:36:14 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-05-06_06:2009-04-28,2009-05-06,2009-05-06 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0
 reason=mlx engine=5.0.0-0811170000 definitions=main-0905060054
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu

Hi Greg,

I don't know if you got chance to look into this, but the problem  
appears to have gone away after I upgraded the pool to 7.2.2.

Cheers,
Duncan.

On Mar 26, 2009, at 12:33 PM, Duncan Brown wrote:

> Thanks. You have root__AT__condor.phy.syr.edu (the central manager) and  
> sugar.phy.syr.edu (the schedd). The config files are in /opt/condor  
> and the log files are in /usr1/condor
>
> Cheers,
> Duncan.
>
> On Mar 26, 2009, at 12:27 PM, condor-admin response tracking system  
> wrote:
>
>> This is a multi-part message in MIME format.
>> --------------000409030106040305010501
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>> Content-Transfer-Encoding: 7bit
>>
>> condor-admin response tracking system wrote:
>>> --Apple-Mail-17--68884715
>>> Content-Type: text/plain;
>>> 	charset=US-ASCII;
>>> 	format=flowed;
>>> 	delsp=yes
>>> Content-Transfer-Encoding: 7bit
>>>
>>> Hi Greg,
>>>
>>> Could you send me an ssh public key and I'll give you access to the
>>> log files?
>>>
>>>
>> Attached.
>>
>> -Greg
>>
>>
>> --------------000409030106040305010501
>> Content-Type: text/plain;
>> name="id_rsa.pub"
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline;
>> filename="id_rsa.pub"
>>
>> ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvOYdASjK8r6a0w9BTOD5Pr/ 
>> U8IJbYOji4IxGY5OCiph3bAkMRcxm2ULbzvaM5ByBSGY0vtBRjRomGMi2ZQB3wc3byYAlMoJEK4e1RJGygmB3WUocj1e9to0I 
>> +ZZmFYPONIAMlOB8pIVGinZ4B4c9Eweaex3qzQg7Uq1xoR9c+jE= gthain__AT__chevre.cs.wisc.edu
>>
>> --------------000409030106040305010501--
>>
>>
>> ========================================
>> MESSAGE INFORMATION
>> ========================================
>> * From: Greg Thain <gthain__AT__cs.wisc.edu>
>> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
>> ,skoranda__AT__gravity.phys.uwm.edu
>
> -- 
>
> Duncan Brown                          Room 263-1, Department of  
> Physics,
> Assistant Professor of Physics        Syracuse University, NY 13244,  
> USA
> Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan
>
>
>

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




===========================================================================
Date mail was appended: Wed May  6  6:36:28 2009 (1241609788)
Date: Wed, 06 May 2009 20:45:29 -0500
From: Greg Thain <gthain__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)

condor-admin response tracking system wrote:
> Hi Greg,
> 
> I don't know if you got chance to look into this, but the problem  
> appears to have gone away after I upgraded the pool to 7.2.2.

Interesting.  I'm going to keep this ticket open, though just in case.

-Greg


===========================================================================
Date mail was appended: Wed May  6 20:45:39 2009 (1241660739)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Fri, 8 May 2009 08:44:57 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
 definitions=2009-05-08_06:2009-04-28,2009-05-08,2009-05-08 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

Hi Greg,

Good call, it just happened again:

"/usr/sbin/condor_schedd" on "sugar.phy.syr.edu" exited with status 1.
Condor will automatically restart this process in 10 seconds.

*** Last 20 line(s) of file /usr1/condor/log/SchedLog:
5/8 02:19:53 (pid:22609) ZKM: setting default map to condor@sugar
5/8 02:19:53 (pid:22609) ZKM: post-map: current user is 'condor'
5/8 02:19:53 (pid:22609) ZKM: post-map: current domain is 'sugar'
5/8 02:19:53 (pid:22609) ZKM: post-map: current FQU is 'condor@sugar'
Stack dump for process 22609 at timestamp 1241763593 (15 frames)
condor_schedd(dprintf_dump_stack+0xb7)[0x5ae42e]
condor_schedd(_Z18linux_sig_coredumpi+0x2c)[0x5a1f3a]
/lib64/libc.so.6[0x36aee301b0]
condor_schedd(_ZN8AttrList9ResetNameEv+0xc)[0x537f88]
condor_schedd(_Z18dollarDollarExpandiiP7ClassAdS0_+0x11ba)[0x5336f0]
condor_schedd(_Z8GetJobAdiib+0x11c)[0x53427e]
condor_schedd(_Z12do_Q_requestP8ReliSockRb+0x1e8e)[0x53b2d0]
condor_schedd(_Z8handle_qP7ServiceiP6Stream+0xef)[0x5362f7]
condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x3959)[0x599357]
condor_schedd(_ZN10DaemonCore9HandleReqEi+0x36)[0x5996e6]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x2c7)[0x599cdf]
condor_schedd(_ZN10DaemonCore6DriverEv+0x172e)[0x59b52a]
condor_schedd(main+0x18c3)[0x5a41b5]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x36aee1d8b4]
condor_schedd(__gxx_personality_v0+0x321)[0x4f8119]
*** End of file SchedLog

Cheers,
Duncan.

On May 6, 2009, at 9:45 PM, condor-admin response tracking system wrote:

> condor-admin response tracking system wrote:
>> Hi Greg,
>>
>> I don't know if you got chance to look into this, but the problem
>> appears to have gone away after I upgraded the pool to 7.2.2.
>
> Interesting.  I'm going to keep this ticket open, though just in case.
>
> -Greg
>
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Greg Thain <gthain__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
> ,skoranda__AT__gravity.phys.uwm.edu
>

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




===========================================================================
Date mail was appended: Fri May  8  7:45:14 2009 (1241786715)
CC: condor-admin__AT__cs.wisc.edu, anderson__AT__ligo.caltech.edu,
 skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Mon, 11 May 2009 08:38:24 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166
 definitions=2009-05-11_05:2009-04-28,2009-05-11,2009-05-11 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0
 reason=mlx engine=5.0.0-0811170000 definitions=main-0905110073
X-Mailfromd-RBL: IP Address 128.230.18.82 is listed on ix.dnsbl.manitu.net
X-Mailfromd: Total of 1 RBL listing (15 mins)
X-Mailfromd-Greylist-Time: May have had a total greylist delay of 15 minutes
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

Hi Greg,

This happened again this morning, followed by a second email from the  
schedd (below). Any chance you could take a look at this, as it is  
becoming a problem.

Cheers,
Duncan.

"/usr/sbin/condor_schedd" on "sugar.phy.syr.edu" exited with status 4.
Condor will automatically restart this process in 11 seconds.

*** Last 20 line(s) of file /usr1/condor/log/SchedLog:
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node074.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node056.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) DedicatedScheduler creating Allocations for  
reconnected job (2847951.0)
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot1__AT__node008.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node008.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot3__AT__node061.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot4__AT__node061.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node020.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot3__AT__node020.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot4__AT__node020.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node032.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot3__AT__node032.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node016.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot3__AT__node016.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot4__AT__node016.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot1__AT__node045.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node045.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot4__AT__node045.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node029.sugar.phy.syr.edu 
  to reconnect to
5/11 06:20:31 (pid:15792) ERROR "spawnJobs(): allocation node has no  
matches!" at line 2156 in file dedicated_scheduler.cpp
*** End of file SchedLog


On May 6, 2009, at 7:36 AM, Duncan Brown wrote:

> Hi Greg,
>
> I don't know if you got chance to look into this, but the problem  
> appears to have gone away after I upgraded the pool to 7.2.2.
>
> Cheers,
> Duncan.
>
> On Mar 26, 2009, at 12:33 PM, Duncan Brown wrote:
>
>> Thanks. You have root__AT__condor.phy.syr.edu (the central manager) and  
>> sugar.phy.syr.edu (the schedd). The config files are in /opt/condor  
>> and the log files are in /usr1/condor
>>
>> Cheers,
>> Duncan.
>>
>> On Mar 26, 2009, at 12:27 PM, condor-admin response tracking system  
>> wrote:
>>
>>> This is a multi-part message in MIME format.
>>> --------------000409030106040305010501
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>> Content-Transfer-Encoding: 7bit
>>>
>>> condor-admin response tracking system wrote:
>>>> --Apple-Mail-17--68884715
>>>> Content-Type: text/plain;
>>>> 	charset=US-ASCII;
>>>> 	format=flowed;
>>>> 	delsp=yes
>>>> Content-Transfer-Encoding: 7bit
>>>>
>>>> Hi Greg,
>>>>
>>>> Could you send me an ssh public key and I'll give you access to the
>>>> log files?
>>>>
>>>>
>>> Attached.
>>>
>>> -Greg
>>>
>>>
>>> --------------000409030106040305010501
>>> Content-Type: text/plain;
>>> name="id_rsa.pub"
>>> Content-Transfer-Encoding: 7bit
>>> Content-Disposition: inline;
>>> filename="id_rsa.pub"
>>>
>>> ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvOYdASjK8r6a0w9BTOD5Pr/ 
>>> U8IJbYOji4IxGY5OCiph3bAkMRcxm2ULbzvaM5ByBSGY0vtBRjRomGMi2ZQB3wc3byYAlMoJEK4e1RJGygmB3WUocj1e9to0I 
>>> +ZZmFYPONIAMlOB8pIVGinZ4B4c9Eweaex3qzQg7Uq1xoR9c+jE= gthain__AT__chevre.cs.wisc.edu
>>>
>>> --------------000409030106040305010501--
>>>
>>>
>>> ========================================
>>> MESSAGE INFORMATION
>>> ========================================
>>> * From: Greg Thain <gthain__AT__cs.wisc.edu>
>>> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
>>> ,skoranda__AT__gravity.phys.uwm.edu
>>
>> -- 
>>
>> Duncan Brown                          Room 263-1, Department of  
>> Physics,
>> Assistant Professor of Physics        Syracuse University, NY  
>> 13244, USA
>> Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan
>>
>>
>>
>
> -- 
>
> Duncan Brown                          Room 263-1, Department of  
> Physics,
> Assistant Professor of Physics        Syracuse University, NY 13244,  
> USA
> Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan
>
>
>

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




===========================================================================
Date mail was appended: Mon May 11  8:56:33 2009 (1242050194)
Subject: Actions

Assigned to psilord by psilord
===========================================================================
Date of actions: Fri Oct  9 16:11:31 2009 (1255122691)
Subject: Comments added

gthain is too busy, so I'm taking over this ticket from him.

Comments added by psilord

===========================================================================
Date comments were added: Fri Oct  9 16:11:36 2009 (1255122696)
Date: Fri, 9 Oct 2009 16:13:51 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: psilord <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)

Hello,

Greg Thain is too busy to work on this ticket, so I've taken it over from him.

Please let me know if I need to gain access to any machines in order to debug
this problem and what credentials you need.

Thank you.

Peter Keller

===========================================================================
Date mail was appended: Fri Oct  9 16:13:57 2009 (1255122838)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)
Date: Mon, 12 Oct 2009 08:49:17 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166
 definitions=2009-10-12_04:2009-09-29,2009-10-12,2009-10-12 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu

Hi Pete,

Can you send me your ssh public key and I'll give you access to sugar.  
I haven't seen this bug in a while, so it's not a high priority at the  
moment, though.

Cheers,
Duncan.

On Oct 9, 2009, at 5:13 PM, condor-admin response tracking system wrote:

> Hello,
>
> Greg Thain is too busy to work on this ticket, so I've taken it over  
> from him.
>
> Please let me know if I need to gain access to any machines in order  
> to debug
> this problem and what credentials you need.
>
> Thank you.
>
> Peter Keller
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu 
> ,skoranda__AT__gravity.phys.uwm.edu
>

-- 

Duncan Brown                          Room 263-1, Department of Physics,
Assistant Professor of Physics        Syracuse University, NY 13244, USA
Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan




===========================================================================
Date mail was appended: Mon Oct 12  7:49:24 2009 (1255351765)
Date: Fri, 20 Nov 2009 11:04:34 -0600
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
 sugar.phy.syr.edu: condor_schedd exited (1)

On Mon, Oct 12, 2009 at 07:49:24AM -0500, condor-admin response tracking system wrote:
> Can you send me your ssh public key and I'll give you access to sugar.  
> I haven't seen this bug in a while, so it's not a high priority at the  
> moment, though.

Here you go:

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAsnTMmrh3/N1GKW0MThhBQ8wF5sfsA/y7YI51bXKTMc615DHezgpgkzT+RjJLfqSrFH5thmCYdHcWSKDBXQs5G862F64HtHlM3w4tPhi/Aw45461Ak65cki31L/4Qv3g5QfbXOlYu3NA5aHyPhNtaExSKc9H4JoOk6wvzBysjfa8= psilord__AT__merlin.cs.wisc.edu

Thank you.

-pete

===========================================================================
Date mail was appended: Fri Nov 20 11:04:39 2009 (1258736679)
Subject: Actions

Status changed from open to pending by psilord
===========================================================================
Date of actions: Fri Nov 20 11:04:45 2009 (1258736685)