LIGO Support Ticket 18999
Ticket Information
Number: admin 18999
User: dabrown@physics.syr.edu
Email: anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu
Status: pending
Assigned To: psilord
CC: Stuart Anderson <anderson__AT__ligo.caltech.edu>, Scott Koranda
<skoranda__AT__gravity.phys.uwm.edu>
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Fwd: [Condor] Problem sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 08:53:58 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-02-10_06:2009-02-10,2009-02-10,2009-02-09 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
--Apple-Mail-3-416284763
Hi,
My schedd just crashed with the error below. The only thing that I did
that might have caused this is putting some parallel universe jobs on
hold.
Cheers,
Duncan.
Begin forwarded message:
> From: Condor <condor__AT__sugar.phy.syr.edu>
> Date: February 10, 2009 8:12:42 AM EST
> To: dabrown__AT__physics.syr.edu
> Subject: [Condor] Problem sugar.phy.syr.edu: condor_schedd exited (1)
>
> This is an automated email from the Condor system
> on machine "sugar.phy.syr.edu". Do not reply.
>
> "/usr/sbin/condor_schedd" on "sugar.phy.syr.edu" exited with status 1.
> Condor will automatically restart this process in 10 seconds.
>
> *** Last 20 line(s) of file /usr1/condor/log/SchedLog:
> 2/10 08:12:08 (pid:5229) ZKM: setting default map to condor@sugar
> 2/10 08:12:08 (pid:5229) ZKM: post-map: current user is 'condor'
> 2/10 08:12:08 (pid:5229) ZKM: post-map: current domain is 'sugar'
> 2/10 08:12:08 (pid:5229) ZKM: post-map: current FQU is 'condor@sugar'
> Stack dump for process 5229 at timestamp 1234271529 (15 frames)
> condor_schedd(dprintf_dump_stack+0xc0)[0x5ada4b]
> condor_schedd(_Z18linux_sig_coredumpi+0x16)[0x5a13ba]
> /lib64/libc.so.6[0x36aee301b0]
> condor_schedd(_ZN8AttrList9ResetNameEv+0x14)[0x5377f2]
> condor_schedd(_Z18dollarDollarExpandiiP7ClassAdS0_+0x11ba)[0x532f64]
> condor_schedd(_Z8GetJobAdiib+0x11c)[0x533af2]
> condor_schedd(_Z12do_Q_requestP8ReliSockRb+0x1e8e)[0x53ab30]
> condor_schedd(_Z8handle_qP7ServiceiP6Stream+0xef)[0x535b59]
> condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x3959)[0x598827]
> condor_schedd(_ZN10DaemonCore9HandleReqEi+0x36)[0x598bb6]
> condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x2c7)[0x5991af]
> condor_schedd(_ZN10DaemonCore6DriverEv+0x172e)[0x59a9fa]
> condor_schedd(main+0x18a2)[0x5a3610]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x36aee1d8b4]
> condor_schedd(__gxx_personality_v0+0x319)[0x4f7b19]
> *** End of file SchedLog
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Questions about this message or Condor in general?
> Email address of the local Condor administrator: dabrown__AT__physics.syr.edu
> The Official Condor Homepage is http://www.cs.wisc.edu/condor
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-3-416284763
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMDEz
NTM1OVowIwYJKoZIhvcNAQkEMRYEFJhjxfYf/J8ecovUC/hntHQcA3p2MH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQB1fixmu3Xk4sdcxEeUf+qEw0YEIoXTiLJTqH9JA32yR0kUrIpKhp/6
W1VodPWGIeqfMwTN9+V2u+0lmdgtyNqEwqWvAptX+MKHbd7M7keBrsptqwul
yImTpfCJH3i4SEMeieR7Fi0bTnwagMPgzAe9aKQ/9JXvOSdNIpqNrKGXiojH
0xV2VyOohTyj5ypRg3ngUqnTGASIV6rClX2r+OaDCe39Q3VZosxgquvPri1w
saEc4oNRGy8pdNL2odEhAFzE71if65CESt4s0hsPj7pJ6uRR3YwpidHsy0Ms
qosbz+Oqtv+mOjFjuIFBaaOiMzS+XQe02s1vMqxFLNUIlbf0AAAAAAAA
--Apple-Mail-3-416284763--
===========================================================================
Date of creation: Tue Feb 10 7:54:09 2009 (1234274051)
Subject: Actions
Assigned to wenger by wenger
===========================================================================
Date of actions: Tue Feb 10 10:42:08 2009 (1234284129)
Date: Tue, 10 Feb 2009 14:07:29 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
condor_schedd exited (1)
Duncan,
> My schedd just crashed with the error below. The only thing that I did
> that might have caused this is putting some parallel universe jobs on
> hold.
Hmm, interesting.
Did it create a core file? If so, getting that would help a lot.
Also, what version/platform is your schedd?
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Tue Feb 10 14:07:32 2009 (1234296453)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 17:36:31 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-02-10_13:2009-02-10,2009-02-10,2009-02-10 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
--Apple-Mail-17-447637525
Hi Kent,
On Feb 10, 2009, at 3:07 PM, condor-admin response tracking system
wrote:
>> My schedd just crashed with the error below. The only thing that I
>> did
>> that might have caused this is putting some parallel universe jobs on
>> hold.
>
> Hmm, interesting.
>
> Did it create a core file? If so, getting that would help a lot.
I should probably know this, but where does the schedd dump core?
> Also, what version/platform is your schedd?
[root@sugar log]# rpm -qa | grep condor
condor-7.2.0-1
It's the x86_64 RHEL5 shared RPM.
[root@sugar log]# uname -a
Linux sugar.phy.syr.edu 2.6.18-92.1.6.el5xen #1 SMP Wed Jun 25
14:13:10 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@sugar log]# rpm -qa | grep glibc
glibc-headers-2.5-24
compat-glibc-2.3.4-2.26
glibc-2.5-24
glibc-devel-2.5-24
glibc-2.5-18
compat-glibc-2.3.4-2.26
glibc-common-2.5-24
compat-glibc-headers-2.3.4-2.26
glibc-devel-2.5-24
glibc-2.5-24
Cheers,
Duncan.
>
>
> Kent Wenger
> Condor Team
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
> ,skoranda__AT__gravity.phys.uwm.edu
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-17-447637525
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMDIy
MzYzMVowIwYJKoZIhvcNAQkEMRYEFAjLJvQrvR+y19mfTvXeNHHJ/EKMMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQAI6HxWL+NQHbWoXR1vbILbTmLO7CgQq9VpMnDohESD5JumGhJ18j1J
Cw0AY97Jj0SymjwKxyWFjsVdlekn+Ly3JYrqD5yGr6GXZQWeHI1xNRqjeyGZ
o5ynOW4ZFzwvKdJiVO8sqSjaYusQHbWLV9ztkqGb/nf4qE/lLkslOcMHNS+d
DaaQ/RD4bcHpWTydelDgqmS3aGejl3EwDbhlhYOUk+LgkOAiarasCFRq9hXQ
Cw29ahrUhN5i3P9twVZmaOUCJPthmllu81XSQH5p0S6a1hmq9MjmMo1VwvOP
ssWpd3cNQa5wemH+Bijw1Twz4SWmGck20hSUYUnETI44zMYfAAAAAAAA
--Apple-Mail-17-447637525--
===========================================================================
Date mail was appended: Tue Feb 10 16:36:38 2009 (1234305398)
CC: condor-admin__AT__cs.wisc.edu, skoranda__AT__gravity.phys.uwm.edu
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 14:44:10 -0800
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu
It is probably,
[root@ldas-grid log]# lsof -p `pgrep condor_schedd` | grep cwd
condor_sc 27020 condor cwd DIR 8,21 4096 9124417 /
usr1/condor/log
Thanks.
On Feb 10, 2009, at 2:36 PM, Duncan Brown wrote:
> Hi Kent,
>
> On Feb 10, 2009, at 3:07 PM, condor-admin response tracking system
> wrote:
>>> My schedd just crashed with the error below. The only thing that I
>>> did
>>> that might have caused this is putting some parallel universe jobs
>>> on
>>> hold.
>>
>> Hmm, interesting.
>>
>> Did it create a core file? If so, getting that would help a lot.
>
> I should probably know this, but where does the schedd dump core?
>
>> Also, what version/platform is your schedd?
>
> [root@sugar log]# rpm -qa | grep condor
> condor-7.2.0-1
>
> It's the x86_64 RHEL5 shared RPM.
>
> [root@sugar log]# uname -a
> Linux sugar.phy.syr.edu 2.6.18-92.1.6.el5xen #1 SMP Wed Jun 25
> 14:13:10 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
> [root@sugar log]# rpm -qa | grep glibc
> glibc-headers-2.5-24
> compat-glibc-2.3.4-2.26
> glibc-2.5-24
> glibc-devel-2.5-24
> glibc-2.5-18
> compat-glibc-2.3.4-2.26
> glibc-common-2.5-24
> compat-glibc-headers-2.3.4-2.26
> glibc-devel-2.5-24
> glibc-2.5-24
>
> Cheers,
> Duncan.
>
>>
>
>>
>> Kent Wenger
>> Condor Team
>>
>>
>> ========================================
>> MESSAGE INFORMATION
>> ========================================
>> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
>> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
>> ,skoranda__AT__gravity.phys.uwm.edu
>
> --
>
> Duncan Brown Room 263-1, Department of
> Physics,
> Assistant Professor of Physics Syracuse University, NY 13244,
> USA
> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
>
>
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Tue Feb 10 16:44:16 2009 (1234305857)
CC: condor-admin__AT__cs.wisc.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 17:55:32 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-02-10_13:2009-02-10,2009-02-10,2009-02-10 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
--Apple-Mail-19-448778450
No core file that I can see:
[root@sugar log]# ls -l
total 5926356
-rw------- 1 condor condor 0 Feb 10 11:11 InstanceLock
-rw-r--r-- 1 condor condor 4641494 Feb 10 17:38 MasterLog
prw------- 1 condor condor 0 Feb 10 17:54 procd_pipe.SCHEDD
prw------- 1 condor condor 0 Feb 10 17:54
procd_pipe.SCHEDD.watchdog
-rw-r--r-- 1 condor condor 1189186493 Feb 10 17:54 SchedLog
-rw-r--r-- 1 condor condor 2000000096 Feb 5 15:44 SchedLog.old
-rw-r----- 1 condor condor 0 Apr 29 2008 ShadowLock
-rw-r--r-- 1 condor condor 846112564 Feb 10 17:54 ShadowLog
-rw-r--r-- 1 condor condor 2000000168 Nov 18 14:26 ShadowLog.old
-rw-r--r-- 1 condor condor 22677050 Jan 31 02:12 StarterLog
[root@sugar log]# condor_version
$CondorVersion: 7.2.0 Dec 19 2008 BuildID: 121001 $
$CondorPlatform: X86_64-LINUX_RHEL5 $
Cheers,
Duncan.
On Feb 10, 2009, at 5:44 PM, Stuart Anderson wrote:
> It is probably,
>
> [root@ldas-grid log]# lsof -p `pgrep condor_schedd` | grep cwd
> condor_sc 27020 condor cwd DIR 8,21 4096 9124417 /
> usr1/condor/log
>
> Thanks.
>
>
> On Feb 10, 2009, at 2:36 PM, Duncan Brown wrote:
>
>> Hi Kent,
>>
>> On Feb 10, 2009, at 3:07 PM, condor-admin response tracking system
>> wrote:
>>>> My schedd just crashed with the error below. The only thing that
>>>> I did
>>>> that might have caused this is putting some parallel universe
>>>> jobs on
>>>> hold.
>>>
>>> Hmm, interesting.
>>>
>>> Did it create a core file? If so, getting that would help a lot.
>>
>> I should probably know this, but where does the schedd dump core?
>>
>>> Also, what version/platform is your schedd?
>>
>> [root@sugar log]# rpm -qa | grep condor
>> condor-7.2.0-1
>>
>> It's the x86_64 RHEL5 shared RPM.
>>
>> [root@sugar log]# uname -a
>> Linux sugar.phy.syr.edu 2.6.18-92.1.6.el5xen #1 SMP Wed Jun 25
>> 14:13:10 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
>> [root@sugar log]# rpm -qa | grep glibc
>> glibc-headers-2.5-24
>> compat-glibc-2.3.4-2.26
>> glibc-2.5-24
>> glibc-devel-2.5-24
>> glibc-2.5-18
>> compat-glibc-2.3.4-2.26
>> glibc-common-2.5-24
>> compat-glibc-headers-2.3.4-2.26
>> glibc-devel-2.5-24
>> glibc-2.5-24
>>
>> Cheers,
>> Duncan.
>>
>>>
>>
>>>
>>> Kent Wenger
>>> Condor Team
>>>
>>>
>>> ========================================
>>> MESSAGE INFORMATION
>>> ========================================
>>> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
>>> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
>>> ,skoranda__AT__gravity.phys.uwm.edu
>>
>> --
>>
>> Duncan Brown Room 263-1, Department of
>> Physics,
>> Assistant Professor of Physics Syracuse University, NY
>> 13244, USA
>> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
>>
>>
>>
>
> --
> Stuart Anderson anderson__AT__ligo.caltech.edu
> http://www.ligo.caltech.edu/~anderson
>
>
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-19-448778450
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMDIy
NTUzMlowIwYJKoZIhvcNAQkEMRYEFOjUzICWrW5uZMQGa7pW8tYPkV7HMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQBWxJ4LQSuSY7EsJQzvgdikm2hUDqJAtpxIPIbaLdC8iLwet3+tjxUs
xKtlw6bu7Qs1cjWesR3an8Yf5CROI0/rIt4mAnYUteA87OKr1JseV+Diw8lm
+foR0oVmaNbh3/Mps5Ia6vEGqrGABB6yespykeYWvD/4d05YFJkhDXciUhDH
8xDOfLUtgc95sK+i3b0M007W402m5tbZq3O9roVdM4LmmCC0lMpnKRbrxP7P
4kcbd6jq79u3ZhJxH6bXLetHkrHliJtbDkr6HykRPSizaXtRgL9oGoZrzBNv
rdvGCcLw+u5oM7eUXQuOZ8c9AV6fBoygP97xdCe0fIsvQTMzAAAAAAAA
--Apple-Mail-19-448778450--
===========================================================================
Date mail was appended: Tue Feb 10 16:55:45 2009 (1234306545)
CC: condor-admin__AT__cs.wisc.edu, skoranda__AT__gravity.phys.uwm.edu
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Tue, 10 Feb 2009 20:45:44 -0800
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu
On Feb 10, 2009, at 2:55 PM, Duncan Brown wrote:
> No core file that I can see:
Duncan,
If you are starting Condor from a shell with "ulimit -c 0" to avoid
lots of user generated core files from polluting your filesystem(s),
you might want to add "CREATE_CORE_FILES = True" to your condor
configuration file so you can still get condor daemon core files.
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Tue Feb 10 22:45:54 2009 (1234327554)
CC: condor-admin__AT__cs.wisc.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: Stuart Anderson <anderson__AT__ligo.caltech.edu>
Subject: Re: [condor-admin #18999] Fwd: [LIGO] [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Thu, 12 Feb 2009 10:28:59 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-02-12_05:2009-02-10,2009-02-12,2009-02-12 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
--Apple-Mail-18-594785878
Hi Stuart,
I already have that:
CREATE_CORE_FILES = True
Cheers,
Duncan.
On Feb 10, 2009, at 11:45 PM, Stuart Anderson wrote:
>
> On Feb 10, 2009, at 2:55 PM, Duncan Brown wrote:
>
>> No core file that I can see:
>
>
> Duncan,
> If you are starting Condor from a shell with "ulimit -c 0" to avoid
> lots of user generated core files from polluting your filesystem(s),
> you might want to add "CREATE_CORE_FILES = True" to your condor
> configuration file so you can still get condor daemon core files.
>
> Thanks.
>
>
> --
> Stuart Anderson anderson__AT__ligo.caltech.edu
> http://www.ligo.caltech.edu/~anderson
>
>
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-18-594785878
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMjE1
Mjg2MFowIwYJKoZIhvcNAQkEMRYEFIddrZ3OqnphAkNu8UxkZAbW2km0MH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQDoWEC4ofnqDHlcBnaNxjh5flt7J24fiIKNa3wxsSliaVN+g3kve/3O
q0D8enMFaoP8LB4ngJWSHee8O6HMeHq+8ogdGJLyTogfCol6NcmsNQH9gdFK
AuNj9pe6IOrCFbji92ndlsXy7ZPygJDYXgy3SNAiw4nRvwaXIOa0mlMV2Y2I
RKgFh/3Yy9JHSGzVhmT0baWiHH2qdSS/dQry5nqCZUNJ9kQL5d1MTzCYTPZy
NG9PuUjLgb9/qzXPTqoKSNR8ZMubHySYFg2r+JUdiTCaxU5z7F6UN2Wx5l/s
u6LCSA/rbXqyg0PBs8rNpJiC0EOwkWEUscot2wmCmhDRdoPeAAAAAAAA
--Apple-Mail-18-594785878--
===========================================================================
Date mail was appended: Thu Feb 12 9:29:05 2009 (1234452545)
Date: Thu, 12 Feb 2009 09:54:54 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
condor_schedd exited (1)
Duncan,
> My schedd just crashed with the error below. The only thing that I did
> that might have caused this is putting some parallel universe jobs on
> hold.
Sorry it's taken a while for me to make much progress on this. I was
frantically fixing DAGMan bugs prior to the 7.3.0 release the last couple
of days. It seems like that's under control now, so I'm taking a look at
your problem again.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Thu Feb 12 9:54:57 2009 (1234454097)
Date: Thu, 12 Feb 2009 14:55:22 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
condor_schedd exited (1)
Duncan,
> My schedd just crashed with the error below. The only thing that I did
> that might have caused this is putting some parallel universe jobs on
> hold.
It sounds like you've seen this only once -- is that right? We're
thinking that it is a combination of $$ expansion and parallel universe --
that may not be something that we test very well.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Thu Feb 12 14:55:24 2009 (1234472124)
Subject: Actions
Assigned to gthain by wenger
===========================================================================
Date of actions: Thu Feb 12 15:00:04 2009 (1234472404)
Date: Thu, 12 Feb 2009 15:00:47 -0600 (CST)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
condor_schedd exited (1)
Duncan,
I'm sending this ticket over to Greg Thain -- he's more expert on the
schedd than I am.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Thu Feb 12 15:00:49 2009 (1234472450)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
condor_schedd exited (1)
Date: Thu, 12 Feb 2009 16:59:09 -0500
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-02-12_06:2009-02-10,2009-02-12,2009-02-12 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
--Apple-Mail-25-618195306
Yes, this was a one-off when we had parallel universe jobs running and
I did a condor_hold.
Cheers,
Duncan.
On Feb 12, 2009, at 3:55 PM, condor-admin response tracking system
wrote:
> Duncan,
>
>> My schedd just crashed with the error below. The only thing that I
>> did
>> that might have caused this is putting some parallel universe jobs on
>> hold.
>
> It sounds like you've seen this only once -- is that right? We're
> thinking that it is a combination of $$ expansion and parallel
> universe --
> that may not be something that we test very well.
>
> Kent Wenger
> Condor Team
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
> ,skoranda__AT__gravity.phys.uwm.edu
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-25-618195306
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDIxMjIx
NTkwOVowIwYJKoZIhvcNAQkEMRYEFEv3ovfZ7p2d8ID3lQRMB/vcB+ZIMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQB/+A+zE+4Rh9AdiXCCpKgx85XZVTa7os64wKSL7C9Z12VUuaGbamMk
XIpKb3qphpyet0Ohdv+LZJ1S/aSwGW6ktM710JTm2Zz+s6aRpuJjx5T7tkkZ
rpSn9QAR4fIoEfydvndzhat99G8a9NerbQzjXbCIp/Cxj0ZdiEtfIeqQgy65
F2fOZUxXx70bwAWVDLSiH5oXeIfh3rKDq0VX/hvNa+fh4yyUFWtNdBuYMl9C
keAiBwHFkpmUNiORYVjO48Nzk2MJJCpfUUSUbMtOyMo2wmcJcvkB6SyW6rHd
jSzDMVAR0vVMTUpfe8TOzCVvE1ZvcWWDz2C3wVPs6QtyuDVeAAAAAAAA
--Apple-Mail-25-618195306--
===========================================================================
Date mail was appended: Thu Feb 12 15:59:16 2009 (1234475957)
Date: Wed, 18 Feb 2009 16:48:47 -0600
From: Greg Thain <gthain__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] Fwd: [Condor] Problem sugar.phy.syr.edu:
condor_schedd exited (1)
>>> My schedd just crashed with the error below. The only thing that I
>>> did
>>> that might have caused this is putting some parallel universe jobs on
>>> hold.
>>>
Duncan:
Well, holding and releasing parallel jobs with $$ expressions _should_
work. I've double checked the code and tested it here on a simple job,
and it seemed to work. There might be something specific to the $$
expression that could cause a problem -- do you still have the job ad
around, and if so could you send me that condor_history -l xxx of it,
where xxx is the job id?
Thanks,
_greg
===========================================================================
Date mail was appended: Wed Feb 18 16:48:51 2009 (1234997332)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Sun, 15 Mar 2009 09:10:35 -0700
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-03-15_01:2009-03-13,2009-03-15,2009-03-15 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0
reason=mlx engine=5.0.0-0811170000 definitions=main-0903150119
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
Hi Greg,
Sorry for the delay in replying to you. As punishment, my schedd
crashed three times in the last few days with the same error:
This is an automated email from the Condor system
on machine "sugar.phy.syr.edu". Do not reply.
"/usr/sbin/condor_schedd" on "sugar.phy.syr.edu" exited with status 1.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /usr1/condor/log/SchedLog:
3/14 10:12:34 (pid:4093) ZKM: setting default map to condor@sugar
3/14 10:12:34 (pid:4093) ZKM: post-map: current user is 'condor'
3/14 10:12:34 (pid:4093) ZKM: post-map: current domain is 'sugar'
3/14 10:12:34 (pid:4093) ZKM: post-map: current FQU is 'condor@sugar'
Stack dump for process 4093 at timestamp 1237039956 (15 frames)
condor_schedd(dprintf_dump_stack+0xc0)[0x5ada4b]
condor_schedd(_Z18linux_sig_coredumpi+0x16)[0x5a13ba]
/lib64/libc.so.6[0x36aee301b0]
condor_schedd(_ZN8AttrList9ResetNameEv+0x14)[0x5377f2]
condor_schedd(_Z18dollarDollarExpandiiP7ClassAdS0_+0x11ba)[0x532f64]
condor_schedd(_Z8GetJobAdiib+0x11c)[0x533af2]
condor_schedd(_Z12do_Q_requestP8ReliSockRb+0x1e8e)[0x53ab30]
condor_schedd(_Z8handle_qP7ServiceiP6Stream+0xef)[0x535b59]
condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x3959)[0x598827]
condor_schedd(_ZN10DaemonCore22HandleReqSocketHandlerEP6Stream+0x94)
[0x59aaea]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x28c)[0x599174]
condor_schedd(_ZN10DaemonCore6DriverEv+0x172e)[0x59a9fa]
condor_schedd(main+0x18a2)[0x5a3610]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x36aee1d8b4]
condor_schedd(__gxx_personality_v0+0x319)[0x4f7b19]
*** End of file SchedLog
I don't think anyone was putting jobs on hold at the time. I'll check
with the user to see if they did anything funny. Where should I look
for more debugging information?
Cheers,
Duncan.
On Feb 18, 2009, at 2:48 PM, condor-admin response tracking system
wrote:
>
>>>> My schedd just crashed with the error below. The only thing that I
>>>> did
>>>> that might have caused this is putting some parallel universe
>>>> jobs on
>>>> hold.
>>>>
> Duncan:
>
> Well, holding and releasing parallel jobs with $$ expressions _should_
> work. I've double checked the code and tested it here on a simple
> job,
> and it seemed to work. There might be something specific to the $$
> expression that could cause a problem -- do you still have the job ad
> around, and if so could you send me that condor_history -l xxx of it,
> where xxx is the job id?
>
> Thanks,
>
> _greg
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Greg Thain <gthain__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
> ,skoranda__AT__gravity.phys.uwm.edu
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Sun Mar 15 11:10:46 2009 (1237133447)
Date: Mon, 16 Mar 2009 09:34:10 -0500
From: Greg Thain <gthain__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
condor-admin response tracking system wrote:
> Hi Greg,
>
> Sorry for the delay in replying to you. As punishment, my schedd
> crashed three times in the last few days with the same error:
I'd like to see a bit more of the schedd -- say 100 lines before the
crash. Do you know which job is? I'd like to see a copy of the job ad,
too (e.g. with condor_history -l)
-Greg
===========================================================================
Date mail was appended: Mon Mar 16 9:34:15 2009 (1237214055)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Thu, 26 Mar 2009 12:10:36 -0400
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu
--Apple-Mail-17--68884715
Hi Greg,
Could you send me an ssh public key and I'll give you access to the
log files?
Cheers,
Duncan.
On Mar 16, 2009, at 10:34 AM, condor-admin response tracking system
wrote:
> condor-admin response tracking system wrote:
>> Hi Greg,
>>
>> Sorry for the delay in replying to you. As punishment, my schedd
>> crashed three times in the last few days with the same error:
> I'd like to see a bit more of the schedd -- say 100 lines before the
> crash. Do you know which job is? I'd like to see a copy of the job
> ad,
> too (e.g. with condor_history -l)
>
> -Greg
>
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Greg Thain <gthain__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
> ,skoranda__AT__gravity.phys.uwm.edu
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-17--68884715
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDMyNjE2
MTAzN1owIwYJKoZIhvcNAQkEMRYEFC2s2ZTgo9lrqSnSBuo+YLcHD0e2MH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQAKdQy64ng1dq8+X0GhQj4egTHNCijnGvXR36T3ol2oQj9p0KXXLpW5
QLS5B1+TrDonOEGHIKeNH3lzS0Gwjr781/ad/mULgDXSeGimFZyeaD4ul5D3
IY8ZvLGYYx0QTc5LH217qpzPJusgVLnPMuUkaXCR+rlveiMxvCSXi3WvEwWr
1oh7ZRkuknyMsnPoQYptVCrxsTD92ub1daS3n7wdOvpYVQQOw6RU696pqsnM
JqB2biAPr5dPJb5CyjxQv+vwGaEg9mH9nRMMI9zs0TlO5rwRvWql8iTBunCY
L8gtI9GtxSg+lRxsig7r7E/TmLLQZXx5mJ00dVFhU0Im9Uv4AAAAAAAA
--Apple-Mail-17--68884715--
===========================================================================
Date mail was appended: Thu Mar 26 11:17:01 2009 (1238084221)
Date: Thu, 26 Mar 2009 11:26:58 -0500
From: Greg Thain <gthain__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
This is a multi-part message in MIME format.
--------------000409030106040305010501
condor-admin response tracking system wrote:
> --Apple-Mail-17--68884715
> Content-Type: text/plain;
> charset=US-ASCII;
> format=flowed;
> delsp=yes
> Content-Transfer-Encoding: 7bit
>
> Hi Greg,
>
> Could you send me an ssh public key and I'll give you access to the
> log files?
>
>
Attached.
-Greg
--------------000409030106040305010501
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvOYdASjK8r6a0w9BTOD5Pr/U8IJbYOji4IxGY5OCiph3bAkMRcxm2ULbzvaM5ByBSGY0vtBRjRomGMi2ZQB3wc3byYAlMoJEK4e1RJGygmB3WUocj1e9to0I+ZZmFYPONIAMlOB8pIVGinZ4B4c9Eweaex3qzQg7Uq1xoR9c+jE= gthain__AT__chevre.cs.wisc.edu
--------------000409030106040305010501--
===========================================================================
Date mail was appended: Thu Mar 26 11:27:03 2009 (1238084823)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Thu, 26 Mar 2009 12:33:38 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-03-26_09:2009-03-25,2009-03-26,2009-03-26 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
--Apple-Mail-23--67502898
Thanks. You have root__AT__condor.phy.syr.edu (the central manager) and
sugar.phy.syr.edu (the schedd). The config files are in /opt/condor
and the log files are in /usr1/condor
Cheers,
Duncan.
On Mar 26, 2009, at 12:27 PM, condor-admin response tracking system
wrote:
> This is a multi-part message in MIME format.
> --------------000409030106040305010501
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
>
> condor-admin response tracking system wrote:
>> --Apple-Mail-17--68884715
>> Content-Type: text/plain;
>> charset=US-ASCII;
>> format=flowed;
>> delsp=yes
>> Content-Transfer-Encoding: 7bit
>>
>> Hi Greg,
>>
>> Could you send me an ssh public key and I'll give you access to the
>> log files?
>>
>>
> Attached.
>
> -Greg
>
>
> --------------000409030106040305010501
> Content-Type: text/plain;
> name="id_rsa.pub"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
> filename="id_rsa.pub"
>
> ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvOYdASjK8r6a0w9BTOD5Pr/
> U8IJbYOji4IxGY5OCiph3bAkMRcxm2ULbzvaM5ByBSGY0vtBRjRomGMi2ZQB3wc3byYAlMoJEK4e1RJGygmB3WUocj1e9to0I
> +ZZmFYPONIAMlOB8pIVGinZ4B4c9Eweaex3qzQg7Uq1xoR9c+jE= gthain__AT__chevre.cs.wisc.edu
>
> --------------000409030106040305010501--
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Greg Thain <gthain__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
> ,skoranda__AT__gravity.phys.uwm.edu
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
--Apple-Mail-23--67502898
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH
AQAAoIIIBTCCA/gwggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEG
CgmSJomT8ixkARkWA25ldDESMBAGCgmSJomT8ixkARkWAkVTMQ4wDAYDVQQK
EwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxGDAW
BgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAx
MjUwODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/Is
ZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRp
ZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwggEiMA0GCSqGSIb3DQEBAQUA
A4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKCFciWcfe8
Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM
4VGTSFdLVG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8
ljV5OSWa/mfsCACyS5zFIWu0yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2
k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4VdGy0eIIPw1pfvYwxO36rm
0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3rz9L
AgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIA
hzAdBgNVHQ4EFgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAU
vF1NSC/4NZRZq1yJSz7RsjoUAeowDwYDVR0TAQH/BAUwAwEB/zAlBgNVHREE
HjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzANBgkqhkiG9w0BAQUF
AAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm
ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZ
FAycbIUEOJDBHR4vtQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5
LsBJq3PmuubeMcc7mbQAfJZ7h/3QghgkFIhmE1+LBXPJbkuP8vgfg6h2BKoA
f5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0gqNVPm392Uch
XGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBAUwggLt
oAMCAQICAmZOMA0GCSqGSIb3DQEBBQUAMGkxEzARBgoJkiaJk/IsZAEZFgNv
cmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEgMB4GA1UECxMXQ2VydGlm
aWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEwHhcN
MDgwOTA1MTYxMTIzWhcNMDkwOTA1MTYxMTIzWjBeMRMwEQYKCZImiZPyLGQB
GRYDb3JnMRgwFgYKCZImiZPyLGQBGRYIZG9lZ3JpZHMxDzANBgNVBAsTBlBl
b3BsZTEcMBoGA1UEAxMTRHVuY2FuIEJyb3duIDUwNjU5MjCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAOpLBfBJgphB1ybbgq9hG7Gjoe44Lufl
iaMTlF4jr1YfbJgeqnQssfZnW5f8hR1NOoCnCHWT761zJt8SXiouiQxMxAhn
dfU7mI4/i0hHBo7ShuSCo9haQkaFYOv1BoIuXTDtAuFrNdPcmyUCHxiEVeJJ
BPjZMp6jt6u0NwLzZ2BNEkDbFz09GF3AE5ErUCiIBYu7wxGqYD1iPKV93eng
zKtnkK1znDtb2n97iXKzQ5pIB1zxHRubx1hrMosuq6KJSw5qT5lpRuzFj+Is
tyeNTyuRSuilephjZWY5SefIi1fZula0qj5dAv/JntznewCt1FOZM3uq3yKQ
2X27IcNyaYECAwEAAaOBwTCBvjARBglghkgBhvhCAQEEBAMCBeAwDgYDVR0P
AQH/BAQDAgXgMBgGA1UdIAQRMA8wDQYLKoZIhvdMAwcBAgowOgYDVR0fBDMw
MTAvoC2gK4YpaHR0cDovL3BraTEuZG9lZ3JpZHMub3JnL0NSTC8xYzNmMmNh
OC5jcmwwIgYDVR0RBBswGYEXZGFicm93bkBwaHlzaWNzLnN5ci5lZHUwHwYD
VR0jBBgwFoAUyhkdEo5upDhdQtQxDgjb2Y0XDV0wDQYJKoZIhvcNAQEFBQAD
ggEBAITBGGGbG01UXKPT/mdcUv9wqtyYE5qAbK8M9+3C8qxzLxYqI8EqixO/
APRwrZp8f7nMch3RtBfnS6FAqIEXmuk44vtcfW17XBsbKx7tFk3lXQlZ6gv+
UpcnfGe0axc4oDz9hUho/y8jFswao5Jo9wsSP1yKjv3ETmfxVld1h6OeJa8v
UuzapYW99P5GWKgfPJXAEZXgvZR8nrvv0H5tlHiZ9JvrlKu6vUZMX5qk6Z/Z
OEIe1WeMQhYxwsKFsk0+c1u4K8QL0kGAdj5lGjEL+NdOm5RwOabMrafde4XB
O3qV1ns6BjUUfSYHFoN++UeJls7BnIMXzO5XU6TnhIuWp+MxggL6MIIC9gIB
ATBvMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghE
T0VHcmlkczEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAU
BgNVBAMTDURPRUdyaWRzIENBIDECAmZOMAkGBSsOAwIaBQCgggFgMBgGCSqG
SIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA5MDMyNjE2
MzMzOFowIwYJKoZIhvcNAQkEMRYEFMRRHG2+LDz1gmBVuXFwhKQwTu/aMH4G
CSsGAQQBgjcQBDFxMG8waTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmS
JomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRo
b3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQICZk4wgYAGCyqGSIb3
DQEJEAILMXGgbzBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy
LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0
aWVzMRYwFAYDVQQDEw1ET0VHcmlkcyBDQSAxAgJmTjANBgkqhkiG9w0BAQEF
AASCAQCQQGOl6F4rJONBO+lv63jrRl2H/WC1r6TqPMt0B2rI0dJCjXY2fRqn
myCB6RjTCTMTD8lvtcqozKzYypFHdwMfMBIUNDm4R1INegHd7uw8mPel1DSb
MxSU6S7uZXPpuH2AqYKVvu2Dtw7FeqC89GkJlDZjVe4/dKzJsG+7JqbcQd1+
Y53dMd5axDyCV7K53dngXXU/qO6FRzgwxrLiG1DxjDnfxIMsDuXXE4ZqAUIA
0Oe0e5Dg/eUFKvA1xrZL8ddChQ/Cbw8LEAWYTj2iQLN2wnIiONdHa1Vu8C/U
797aA88p7OSDRa6kAWMTsjKAlDNzkUN4vqe/SkrQnR+55Kd2AAAAAAAA
--Apple-Mail-23--67502898--
===========================================================================
Date mail was appended: Thu Mar 26 11:33:42 2009 (1238085223)
CC: condor-admin__AT__cs.wisc.edu, anderson__AT__ligo.caltech.edu,
skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Wed, 6 May 2009 07:36:14 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-05-06_06:2009-04-28,2009-05-06,2009-05-06 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0
reason=mlx engine=5.0.0-0811170000 definitions=main-0905060054
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
Hi Greg,
I don't know if you got chance to look into this, but the problem
appears to have gone away after I upgraded the pool to 7.2.2.
Cheers,
Duncan.
On Mar 26, 2009, at 12:33 PM, Duncan Brown wrote:
> Thanks. You have root__AT__condor.phy.syr.edu (the central manager) and
> sugar.phy.syr.edu (the schedd). The config files are in /opt/condor
> and the log files are in /usr1/condor
>
> Cheers,
> Duncan.
>
> On Mar 26, 2009, at 12:27 PM, condor-admin response tracking system
> wrote:
>
>> This is a multi-part message in MIME format.
>> --------------000409030106040305010501
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>> Content-Transfer-Encoding: 7bit
>>
>> condor-admin response tracking system wrote:
>>> --Apple-Mail-17--68884715
>>> Content-Type: text/plain;
>>> charset=US-ASCII;
>>> format=flowed;
>>> delsp=yes
>>> Content-Transfer-Encoding: 7bit
>>>
>>> Hi Greg,
>>>
>>> Could you send me an ssh public key and I'll give you access to the
>>> log files?
>>>
>>>
>> Attached.
>>
>> -Greg
>>
>>
>> --------------000409030106040305010501
>> Content-Type: text/plain;
>> name="id_rsa.pub"
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline;
>> filename="id_rsa.pub"
>>
>> ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvOYdASjK8r6a0w9BTOD5Pr/
>> U8IJbYOji4IxGY5OCiph3bAkMRcxm2ULbzvaM5ByBSGY0vtBRjRomGMi2ZQB3wc3byYAlMoJEK4e1RJGygmB3WUocj1e9to0I
>> +ZZmFYPONIAMlOB8pIVGinZ4B4c9Eweaex3qzQg7Uq1xoR9c+jE= gthain__AT__chevre.cs.wisc.edu
>>
>> --------------000409030106040305010501--
>>
>>
>> ========================================
>> MESSAGE INFORMATION
>> ========================================
>> * From: Greg Thain <gthain__AT__cs.wisc.edu>
>> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
>> ,skoranda__AT__gravity.phys.uwm.edu
>
> --
>
> Duncan Brown Room 263-1, Department of
> Physics,
> Assistant Professor of Physics Syracuse University, NY 13244,
> USA
> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
>
>
>
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Wed May 6 6:36:28 2009 (1241609788)
Date: Wed, 06 May 2009 20:45:29 -0500
From: Greg Thain <gthain__AT__cs.wisc.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
condor-admin response tracking system wrote:
> Hi Greg,
>
> I don't know if you got chance to look into this, but the problem
> appears to have gone away after I upgraded the pool to 7.2.2.
Interesting. I'm going to keep this ticket open, though just in case.
-Greg
===========================================================================
Date mail was appended: Wed May 6 20:45:39 2009 (1241660739)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Fri, 8 May 2009 08:44:57 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166
definitions=2009-05-08_06:2009-04-28,2009-05-08,2009-05-08 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Hi Greg,
Good call, it just happened again:
"/usr/sbin/condor_schedd" on "sugar.phy.syr.edu" exited with status 1.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /usr1/condor/log/SchedLog:
5/8 02:19:53 (pid:22609) ZKM: setting default map to condor@sugar
5/8 02:19:53 (pid:22609) ZKM: post-map: current user is 'condor'
5/8 02:19:53 (pid:22609) ZKM: post-map: current domain is 'sugar'
5/8 02:19:53 (pid:22609) ZKM: post-map: current FQU is 'condor@sugar'
Stack dump for process 22609 at timestamp 1241763593 (15 frames)
condor_schedd(dprintf_dump_stack+0xb7)[0x5ae42e]
condor_schedd(_Z18linux_sig_coredumpi+0x2c)[0x5a1f3a]
/lib64/libc.so.6[0x36aee301b0]
condor_schedd(_ZN8AttrList9ResetNameEv+0xc)[0x537f88]
condor_schedd(_Z18dollarDollarExpandiiP7ClassAdS0_+0x11ba)[0x5336f0]
condor_schedd(_Z8GetJobAdiib+0x11c)[0x53427e]
condor_schedd(_Z12do_Q_requestP8ReliSockRb+0x1e8e)[0x53b2d0]
condor_schedd(_Z8handle_qP7ServiceiP6Stream+0xef)[0x5362f7]
condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x3959)[0x599357]
condor_schedd(_ZN10DaemonCore9HandleReqEi+0x36)[0x5996e6]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x2c7)[0x599cdf]
condor_schedd(_ZN10DaemonCore6DriverEv+0x172e)[0x59b52a]
condor_schedd(main+0x18c3)[0x5a41b5]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x36aee1d8b4]
condor_schedd(__gxx_personality_v0+0x321)[0x4f8119]
*** End of file SchedLog
Cheers,
Duncan.
On May 6, 2009, at 9:45 PM, condor-admin response tracking system wrote:
> condor-admin response tracking system wrote:
>> Hi Greg,
>>
>> I don't know if you got chance to look into this, but the problem
>> appears to have gone away after I upgraded the pool to 7.2.2.
>
> Interesting. I'm going to keep this ticket open, though just in case.
>
> -Greg
>
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Greg Thain <gthain__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
> ,skoranda__AT__gravity.phys.uwm.edu
>
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Fri May 8 7:45:14 2009 (1241786715)
CC: condor-admin__AT__cs.wisc.edu, anderson__AT__ligo.caltech.edu,
skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: Duncan Brown <dabrown__AT__physics.syr.edu>
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Mon, 11 May 2009 08:38:24 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166
definitions=2009-05-11_05:2009-04-28,2009-05-11,2009-05-11 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0
reason=mlx engine=5.0.0-0811170000 definitions=main-0905110073
X-Mailfromd-RBL: IP Address 128.230.18.82 is listed on ix.dnsbl.manitu.net
X-Mailfromd: Total of 1 RBL listing (15 mins)
X-Mailfromd-Greylist-Time: May have had a total greylist delay of 15 minutes
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Hi Greg,
This happened again this morning, followed by a second email from the
schedd (below). Any chance you could take a look at this, as it is
becoming a problem.
Cheers,
Duncan.
"/usr/sbin/condor_schedd" on "sugar.phy.syr.edu" exited with status 4.
Condor will automatically restart this process in 11 seconds.
*** Last 20 line(s) of file /usr1/condor/log/SchedLog:
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node074.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node056.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) DedicatedScheduler creating Allocations for
reconnected job (2847951.0)
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot1__AT__node008.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node008.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot3__AT__node061.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot4__AT__node061.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node020.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot3__AT__node020.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot4__AT__node020.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node032.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot3__AT__node032.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node016.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot3__AT__node016.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot4__AT__node016.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot1__AT__node045.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node045.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot4__AT__node045.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) Dedicated Scheduler:: couldn't find machine slot2__AT__node029.sugar.phy.syr.edu
to reconnect to
5/11 06:20:31 (pid:15792) ERROR "spawnJobs(): allocation node has no
matches!" at line 2156 in file dedicated_scheduler.cpp
*** End of file SchedLog
On May 6, 2009, at 7:36 AM, Duncan Brown wrote:
> Hi Greg,
>
> I don't know if you got chance to look into this, but the problem
> appears to have gone away after I upgraded the pool to 7.2.2.
>
> Cheers,
> Duncan.
>
> On Mar 26, 2009, at 12:33 PM, Duncan Brown wrote:
>
>> Thanks. You have root__AT__condor.phy.syr.edu (the central manager) and
>> sugar.phy.syr.edu (the schedd). The config files are in /opt/condor
>> and the log files are in /usr1/condor
>>
>> Cheers,
>> Duncan.
>>
>> On Mar 26, 2009, at 12:27 PM, condor-admin response tracking system
>> wrote:
>>
>>> This is a multi-part message in MIME format.
>>> --------------000409030106040305010501
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>> Content-Transfer-Encoding: 7bit
>>>
>>> condor-admin response tracking system wrote:
>>>> --Apple-Mail-17--68884715
>>>> Content-Type: text/plain;
>>>> charset=US-ASCII;
>>>> format=flowed;
>>>> delsp=yes
>>>> Content-Transfer-Encoding: 7bit
>>>>
>>>> Hi Greg,
>>>>
>>>> Could you send me an ssh public key and I'll give you access to the
>>>> log files?
>>>>
>>>>
>>> Attached.
>>>
>>> -Greg
>>>
>>>
>>> --------------000409030106040305010501
>>> Content-Type: text/plain;
>>> name="id_rsa.pub"
>>> Content-Transfer-Encoding: 7bit
>>> Content-Disposition: inline;
>>> filename="id_rsa.pub"
>>>
>>> ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAvOYdASjK8r6a0w9BTOD5Pr/
>>> U8IJbYOji4IxGY5OCiph3bAkMRcxm2ULbzvaM5ByBSGY0vtBRjRomGMi2ZQB3wc3byYAlMoJEK4e1RJGygmB3WUocj1e9to0I
>>> +ZZmFYPONIAMlOB8pIVGinZ4B4c9Eweaex3qzQg7Uq1xoR9c+jE= gthain__AT__chevre.cs.wisc.edu
>>>
>>> --------------000409030106040305010501--
>>>
>>>
>>> ========================================
>>> MESSAGE INFORMATION
>>> ========================================
>>> * From: Greg Thain <gthain__AT__cs.wisc.edu>
>>> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
>>> ,skoranda__AT__gravity.phys.uwm.edu
>>
>> --
>>
>> Duncan Brown Room 263-1, Department of
>> Physics,
>> Assistant Professor of Physics Syracuse University, NY
>> 13244, USA
>> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
>>
>>
>>
>
> --
>
> Duncan Brown Room 263-1, Department of
> Physics,
> Assistant Professor of Physics Syracuse University, NY 13244,
> USA
> Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
>
>
>
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Mon May 11 8:56:33 2009 (1242050194)
Subject: Actions
Assigned to psilord by psilord
===========================================================================
Date of actions: Fri Oct 9 16:11:31 2009 (1255122691)
Subject: Comments added
gthain is too busy, so I'm taking over this ticket from him.
Comments added by psilord
===========================================================================
Date comments were added: Fri Oct 9 16:11:36 2009 (1255122696)
Date: Fri, 9 Oct 2009 16:13:51 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: psilord <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Hello,
Greg Thain is too busy to work on this ticket, so I've taken it over from him.
Please let me know if I need to gain access to any machines in order to debug
this problem and what credentials you need.
Thank you.
Peter Keller
===========================================================================
Date mail was appended: Fri Oct 9 16:13:57 2009 (1255122838)
CC: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu
From: Duncan Brown <dabrown__AT__physics.syr.edu>
To: condor-admin__AT__cs.wisc.edu
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
Date: Mon, 12 Oct 2009 08:49:17 -0400
X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166
definitions=2009-10-12_04:2009-09-29,2009-10-12,2009-10-12 signatures=0
X-Proofpoint-Spam-Reason: safe
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
Hi Pete,
Can you send me your ssh public key and I'll give you access to sugar.
I haven't seen this bug in a while, so it's not a high priority at the
moment, though.
Cheers,
Duncan.
On Oct 9, 2009, at 5:13 PM, condor-admin response tracking system wrote:
> Hello,
>
> Greg Thain is too busy to work on this ticket, so I've taken it over
> from him.
>
> Please let me know if I need to gain access to any machines in order
> to debug
> this problem and what credentials you need.
>
> Thank you.
>
> Peter Keller
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: dabrown__AT__physics.syr.edu, anderson__AT__ligo.caltech.edu
> ,skoranda__AT__gravity.phys.uwm.edu
>
--
Duncan Brown Room 263-1, Department of Physics,
Assistant Professor of Physics Syracuse University, NY 13244, USA
Phone: (315) 443 5993 http://www.gravity.phy.syr.edu/~duncan
===========================================================================
Date mail was appended: Mon Oct 12 7:49:24 2009 (1255351765)
Date: Fri, 20 Nov 2009 11:04:34 -0600
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18999] LIGO: Fwd: [Condor] Problem
sugar.phy.syr.edu: condor_schedd exited (1)
On Mon, Oct 12, 2009 at 07:49:24AM -0500, condor-admin response tracking system wrote:
> Can you send me your ssh public key and I'll give you access to sugar.
> I haven't seen this bug in a while, so it's not a high priority at the
> moment, though.
Here you go:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAsnTMmrh3/N1GKW0MThhBQ8wF5sfsA/y7YI51bXKTMc615DHezgpgkzT+RjJLfqSrFH5thmCYdHcWSKDBXQs5G862F64HtHlM3w4tPhi/Aw45461Ak65cki31L/4Qv3g5QfbXOlYu3NA5aHyPhNtaExSKc9H4JoOk6wvzBysjfa8= psilord__AT__merlin.cs.wisc.edu
Thank you.
-pete
===========================================================================
Date mail was appended: Fri Nov 20 11:04:39 2009 (1258736679)
Subject: Actions
Status changed from open to pending by psilord
===========================================================================
Date of actions: Fri Nov 20 11:04:45 2009 (1258736685)