LIGO Support Ticket 18077
Ticket Information
Number: admin 18077
User: skoranda@gravity.phys.uwm.edu
Email: anderson__AT__ligo.caltech.edu,skoranda__AT__gravity.phys.uwm.edu,parmor__AT__gravity.phys.uwm.edu,gskelton__AT__gravity.phys.uwm.edu
Status: resolved
Assigned To: wenger
Date: Wed, 4 Jun 2008 08:05:34 -0500
From: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
To: condor-admin__AT__cs.wisc.edu
CC: Stuart Anderson <anderson__AT__ligo.caltech.edu>, Scott Koranda
<skoranda__AT__gravity.phys.uwm.edu>, Paul Armor
<parmor__AT__gravity.phys.uwm.edu>, Gregory Skelton
<gskelton__AT__gravity.phys.uwm.edu>
Subject: LIGO: condor_schedd periodically crashing
X-Seen-BY: mailfromd 4.1 granite.cs.wisc.edu
Hi,
The condor_schedd running at UWM on the Nemo cluster head node
marlin.phys.uwm.edu started periodically crashing last night.
It has crashed 6 times in the past 8 hours.
Below are the 6 emails received from Condor about the crashes.
The SchedLog that covers this time period is available at
http://www.lsc-group.phys.uwm.edu/lscdatagrid/downloads/SchedLog.UWM-Nemo.06-04-08.gz
This is for Condor 7.0.1 running on FC4 x86_64.
I can tell you that this schedd has exclusively been
responsible for a set of 5 large DAGs that were submitted on
5/28 and have been running since. Only the one user has any
jobs in the queue on this schedd at the moment.
There are two other condor_schedd for this pool. One has a
fair number of jobs but is not crashing, the other is empty.
Thanks,
Scott
Date: Tue, 3 Jun 2008 21:39:48 -0500
From: Condor User <condor__AT__gravity.phys.uwm.edu>
To: condor-admin__AT__gravity.phys.uwm.edu
Subject: [Condor] Problem marlin.phys.uwm.edu: condor_schedd killed (unresponsive)
This is an automated email from the Condor system
on machine "marlin.phys.uwm.edu". Do not reply.
"/opt/condor/sbin/condor_schedd" on "marlin.phys.uwm.edu" was killed because
it was no longer responding.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /opt/condor/home/log/SchedLog:
6/3 20:48:12 (pid:3826) Return from Handler <<Negotiator Command>>
6/3 20:48:15 (pid:3826) Calling HandleReq <handle_q> (0)
6/3 20:48:15 (pid:3826) ZKM: setting default map to condor__AT__nemo.phys.uwm.edu
6/3 20:48:15 (pid:3826) Return from HandleReq <handle_q>
6/3 20:48:15 (pid:3826) DaemonCore: pid 4908 exited with status 25600, invoking reaper 3 <child_exit>
6/3 20:48:15 (pid:3826) Shadow pid 4908 for job 2535296.0 exited with status 100
6/3 20:48:15 (pid:3826) DaemonCore: return from reaper for pid 4908
6/3 20:48:15 (pid:3826) Starting add_shadow_birthdate(2535297.0)
6/3 20:48:15 (pid:3826) Started shadow for job 2535297.0 on "<192.168.3.134:60818>", (shadow pid = 4910)
6/3 20:48:15 (pid:3826) Sent ad to central manager for jclayton__AT__nemo.phys.uwm.edu
6/3 20:48:15 (pid:3826) Sent ad to 1 collectors for jclayton__AT__nemo.phys.uwm.edu
6/3 20:48:20 (pid:3826) Calling HandleReq <handle_q> (0)
6/3 20:48:20 (pid:3826) ZKM: setting default map to condor__AT__nemo.phys.uwm.edu
6/3 20:48:20 (pid:3826) Return from HandleReq <handle_q>
6/3 20:48:20 (pid:3826) DaemonCore: pid 29052 exited with status 25600, invoking reaper 3 <child_exit>
6/3 20:48:20 (pid:3826) Shadow pid 29052 for job 2526156.0 exited with status 100
6/3 20:48:20 (pid:3826) Checking consistency running and runnable jobs
6/3 20:48:20 (pid:3826) Tables are consistent
6/3 20:48:20 (pid:3826) Rebuilt prioritized runnable job list in 0.005s.
6/3 20:48:20 (pid:3826) DaemonCore: return from reaper for pid 29052
*** End of file SchedLog
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or Condor in general?
Email address of the local Condor administrator: condor-admin__AT__gravity.phys.uwm.edu
The Official Condor Homepage is http://www.cs.wisc.edu/condor
Date: Tue, 3 Jun 2008 23:00:18 -0500
From: Condor User <condor__AT__gravity.phys.uwm.edu>
To: condor-admin__AT__gravity.phys.uwm.edu
Subject: [Condor] Problem marlin.phys.uwm.edu: condor_schedd killed (unresponsive)
This is an automated email from the Condor system
on machine "marlin.phys.uwm.edu". Do not reply.
"/opt/condor/sbin/condor_schedd" on "marlin.phys.uwm.edu" was killed because
it was no longer responding.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /opt/condor/home/log/SchedLog:
6/3 22:11:16 (pid:5786) Return from Handler <to startd <192.168.5.112:52989>>
6/3 22:11:16 (pid:5786) Calling Handler <to startd <192.168.3.122:38392>>
6/3 22:11:16 (pid:5786) Return from Handler <to startd <192.168.3.122:38392>>
6/3 22:11:16 (pid:5786) Calling Handler <to startd <192.168.3.200:57500>>
6/3 22:11:16 (pid:5786) Return from Handler <to startd <192.168.3.200:57500>>
6/3 22:11:16 (pid:5786) Calling Handler <to startd <192.168.5.112:52989>>
6/3 22:11:16 (pid:5786) Return from Handler <to startd <192.168.5.112:52989>>
6/3 22:11:18 (pid:5786) Starting add_shadow_birthdate(2526135.0)
6/3 22:11:18 (pid:5786) Started shadow for job 2526135.0 on "<192.168.3.122:38392>", (shadow pid = 6864)
6/3 22:11:18 (pid:5786) Starting add_shadow_birthdate(2526136.0)
6/3 22:11:18 (pid:5786) Started shadow for job 2526136.0 on "<192.168.5.112:52989>", (shadow pid = 6865)
6/3 22:11:18 (pid:5786) Starting add_shadow_birthdate(2526134.0)
6/3 22:11:18 (pid:5786) Started shadow for job 2526134.0 on "<192.168.3.116:51764>", (shadow pid = 6866)
6/3 22:11:18 (pid:5786) Starting add_shadow_birthdate(2526156.0)
6/3 22:11:18 (pid:5786) Started shadow for job 2526156.0 on "<192.168.3.200:57500>", (shadow pid = 6867)
6/3 22:11:18 (pid:5786) Sent ad to central manager for jclayton__AT__nemo.phys.uwm.edu
6/3 22:11:18 (pid:5786) Sent ad to 1 collectors for jclayton__AT__nemo.phys.uwm.edu
6/3 22:11:18 (pid:5786) DaemonCore: pid 6867 exited with status 25600, invoking reaper 3 <child_exit>
6/3 22:11:18 (pid:5786) Shadow pid 6867 for job 2526156.0 exited with status 100
6/3 22:11:18 (pid:5786) DaemonCore: return from reaper for pid 6867
*** End of file SchedLog
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or Condor in general?
Email address of the local Condor administrator: condor-admin__AT__gravity.phys.uwm.edu
The Official Condor Homepage is http://www.cs.wisc.edu/condor
Date: Wed, 4 Jun 2008 00:20:31 -0500
From: Condor User <condor__AT__gravity.phys.uwm.edu>
To: condor-admin__AT__gravity.phys.uwm.edu
Subject: [Condor] Problem marlin.phys.uwm.edu: condor_schedd killed (unresponsive)
This is an automated email from the Condor system
on machine "marlin.phys.uwm.edu". Do not reply.
"/opt/condor/sbin/condor_schedd" on "marlin.phys.uwm.edu" was killed because
it was no longer responding.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /opt/condor/home/log/SchedLog:
6/3 23:37:47 (pid:7771) Return from Handler <to startd <192.168.4.237:44172>>
6/3 23:37:47 (pid:7771) Calling Handler <to startd <192.168.3.150:37449>>
6/3 23:37:47 (pid:7771) Return from Handler <to startd <192.168.3.150:37449>>
6/3 23:37:47 (pid:7771) Calling Handler <to startd <192.168.5.142:37515>>
6/3 23:37:47 (pid:7771) Return from Handler <to startd <192.168.5.142:37515>>
6/3 23:37:47 (pid:7771) Calling Handler <to startd <192.168.4.237:44172>>
6/3 23:37:47 (pid:7771) Return from Handler <to startd <192.168.4.237:44172>>
6/3 23:37:47 (pid:7771) Calling Handler <to startd <192.168.5.142:37515>>
6/3 23:37:47 (pid:7771) Return from Handler <to startd <192.168.5.142:37515>>
6/3 23:37:49 (pid:7771) Starting add_shadow_birthdate(2526156.0)
6/3 23:37:49 (pid:7771) Started shadow for job 2526156.0 on "<192.168.4.237:44172>", (shadow pid = 8882)
6/3 23:37:49 (pid:7771) Starting add_shadow_birthdate(2526136.0)
6/3 23:37:49 (pid:7771) Started shadow for job 2526136.0 on "<192.168.3.150:37449>", (shadow pid = 8883)
6/3 23:37:49 (pid:7771) Starting add_shadow_birthdate(2526158.0)
6/3 23:37:49 (pid:7771) Started shadow for job 2526158.0 on "<192.168.5.142:37515>", (shadow pid = 8884)
6/3 23:37:49 (pid:7771) Sent ad to central manager for jclayton__AT__nemo.phys.uwm.edu
6/3 23:37:49 (pid:7771) Sent ad to 1 collectors for jclayton__AT__nemo.phys.uwm.edu
6/3 23:37:49 (pid:7771) DaemonCore: pid 8882 exited with status 25600, invoking reaper 3 <child_exit>
6/3 23:37:49 (pid:7771) Shadow pid 8882 for job 2526156.0 exited with status 100
6/3 23:37:49 (pid:7771) DaemonCore: return from reaper for pid 8882
*** End of file SchedLog
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or Condor in general?
Email address of the local Condor administrator: condor-admin__AT__gravity.phys.uwm.edu
The Official Condor Homepage is http://www.cs.wisc.edu/condor
Date: Wed, 4 Jun 2008 01:40:47 -0500
From: Condor User <condor__AT__gravity.phys.uwm.edu>
To: condor-admin__AT__gravity.phys.uwm.edu
Subject: [Condor] Problem marlin.phys.uwm.edu: condor_schedd killed (unresponsive)
This is an automated email from the Condor system
on machine "marlin.phys.uwm.edu". Do not reply.
"/opt/condor/sbin/condor_schedd" on "marlin.phys.uwm.edu" was killed because
it was no longer responding.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /opt/condor/home/log/SchedLog:
6/4 00:55:12 (pid:9668) Return from Handler <to startd <192.168.5.184:36097>>
6/4 00:55:12 (pid:9668) Calling Handler <to startd <192.168.4.133:37093>>
6/4 00:55:12 (pid:9668) Return from Handler <to startd <192.168.4.133:37093>>
6/4 00:55:12 (pid:9668) Calling Handler <to startd <192.168.6.16:40058>>
6/4 00:55:12 (pid:9668) Return from Handler <to startd <192.168.6.16:40058>>
6/4 00:55:12 (pid:9668) Calling Handler <to startd <192.168.5.158:59265>>
6/4 00:55:12 (pid:9668) Return from Handler <to startd <192.168.5.158:59265>>
6/4 00:55:14 (pid:9668) Starting add_shadow_birthdate(2526135.0)
6/4 00:55:14 (pid:9668) Started shadow for job 2526135.0 on "<192.168.6.16:40058>", (shadow pid = 10775)
6/4 00:55:14 (pid:9668) Starting add_shadow_birthdate(2526134.0)
6/4 00:55:14 (pid:9668) Started shadow for job 2526134.0 on "<192.168.4.133:37093>", (shadow pid = 10776)
6/4 00:55:14 (pid:9668) Starting add_shadow_birthdate(2526156.0)
6/4 00:55:14 (pid:9668) Started shadow for job 2526156.0 on "<192.168.5.158:59265>", (shadow pid = 10777)
6/4 00:55:14 (pid:9668) Starting add_shadow_birthdate(2526136.0)
6/4 00:55:14 (pid:9668) Started shadow for job 2526136.0 on "<192.168.5.184:36097>", (shadow pid = 10778)
6/4 00:55:14 (pid:9668) Sent ad to central manager for jclayton__AT__nemo.phys.uwm.edu
6/4 00:55:14 (pid:9668) Sent ad to 1 collectors for jclayton__AT__nemo.phys.uwm.edu
6/4 00:55:14 (pid:9668) DaemonCore: pid 10777 exited with status 25600, invoking reaper 3 <child_exit>
6/4 00:55:14 (pid:9668) Shadow pid 10777 for job 2526156.0 exited with status 100
6/4 00:55:14 (pid:9668) DaemonCore: return from reaper for pid 10777
*** End of file SchedLog
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or Condor in general?
Email address of the local Condor administrator: condor-admin__AT__gravity.phys.uwm.edu
The Official Condor Homepage is http://www.cs.wisc.edu/condor
Date: Wed, 4 Jun 2008 03:01:01 -0500
From: Condor User <condor__AT__gravity.phys.uwm.edu>
To: condor-admin__AT__gravity.phys.uwm.edu
Subject: [Condor] Problem marlin.phys.uwm.edu: condor_schedd killed (unresponsive)
This is an automated email from the Condor system
on machine "marlin.phys.uwm.edu". Do not reply.
"/opt/condor/sbin/condor_schedd" on "marlin.phys.uwm.edu" was killed because
it was no longer responding.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /opt/condor/home/log/SchedLog:
6/4 02:16:54 (pid:11584) Return from Handler <to startd <192.168.6.53:55337>>
6/4 02:16:54 (pid:11584) Calling Handler <to startd <192.168.6.57:59200>>
6/4 02:16:54 (pid:11584) Return from Handler <to startd <192.168.6.57:59200>>
6/4 02:16:56 (pid:11584) Starting add_shadow_birthdate(2526164.0)
6/4 02:16:56 (pid:11584) Started shadow for job 2526164.0 on "<192.168.5.240:33854>", (shadow pid = 12710)
6/4 02:16:56 (pid:11584) Starting add_shadow_birthdate(2526156.0)
6/4 02:16:56 (pid:11584) Started shadow for job 2526156.0 on "<192.168.4.197:43853>", (shadow pid = 12711)
6/4 02:16:56 (pid:11584) Starting add_shadow_birthdate(2526185.0)
6/4 02:16:56 (pid:11584) Started shadow for job 2526185.0 on "<192.168.6.53:55337>", (shadow pid = 12712)
6/4 02:16:56 (pid:11584) Starting add_shadow_birthdate(2526161.0)
6/4 02:16:56 (pid:11584) Started shadow for job 2526161.0 on "<192.168.4.151:54791>", (shadow pid = 12713)
6/4 02:16:56 (pid:11584) Starting add_shadow_birthdate(2526188.0)
6/4 02:16:56 (pid:11584) Started shadow for job 2526188.0 on "<192.168.6.57:59200>", (shadow pid = 12714)
6/4 02:16:56 (pid:11584) Starting add_shadow_birthdate(2526158.0)
6/4 02:16:56 (pid:11584) Started shadow for job 2526158.0 on "<192.168.4.13:38859>", (shadow pid = 12715)
6/4 02:16:56 (pid:11584) Sent ad to central manager for jclayton__AT__nemo.phys.uwm.edu
6/4 02:16:56 (pid:11584) Sent ad to 1 collectors for jclayton__AT__nemo.phys.uwm.edu
6/4 02:16:57 (pid:11584) DaemonCore: pid 12711 exited with status 25600, invoking reaper 3 <child_exit>
6/4 02:16:57 (pid:11584) Shadow pid 12711 for job 2526156.0 exited with status 100
6/4 02:16:57 (pid:11584) DaemonCore: return from reaper for pid 12711
*** End of file SchedLog
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or Condor in general?
Email address of the local Condor administrator: condor-admin__AT__gravity.phys.uwm.edu
The Official Condor Homepage is http://www.cs.wisc.edu/condor
Date: Wed, 4 Jun 2008 04:21:16 -0500
From: Condor User <condor__AT__gravity.phys.uwm.edu>
To: condor-admin__AT__gravity.phys.uwm.edu
Subject: [Condor] Problem marlin.phys.uwm.edu: condor_schedd killed (unresponsive)
This is an automated email from the Condor system
on machine "marlin.phys.uwm.edu". Do not reply.
"/opt/condor/sbin/condor_schedd" on "marlin.phys.uwm.edu" was killed because
it was no longer responding.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /opt/condor/home/log/SchedLog:
6/4 03:32:21 (pid:13521) Return from Handler <to startd <192.168.5.144:54963>>
6/4 03:32:21 (pid:13521) Calling Handler <to startd <192.168.5.14:54354>>
6/4 03:32:21 (pid:13521) Return from Handler <to startd <192.168.5.14:54354>>
6/4 03:32:21 (pid:13521) Calling Handler <to startd <192.168.4.212:49148>>
6/4 03:32:21 (pid:13521) Return from Handler <to startd <192.168.4.212:49148>>
6/4 03:32:21 (pid:13521) Calling Handler <to startd <192.168.4.246:37819>>
6/4 03:32:21 (pid:13521) Return from Handler <to startd <192.168.4.246:37819>>
6/4 03:32:23 (pid:13521) Starting add_shadow_birthdate(2526136.0)
6/4 03:32:23 (pid:13521) Started shadow for job 2526136.0 on "<192.168.5.14:54354>", (shadow pid = 14515)
6/4 03:32:23 (pid:13521) Starting add_shadow_birthdate(2526156.0)
6/4 03:32:23 (pid:13521) Started shadow for job 2526156.0 on "<192.168.5.144:54963>", (shadow pid = 14516)
6/4 03:32:23 (pid:13521) Starting add_shadow_birthdate(2526161.0)
6/4 03:32:23 (pid:13521) Started shadow for job 2526161.0 on "<192.168.4.212:49148>", (shadow pid = 14517)
6/4 03:32:23 (pid:13521) Starting add_shadow_birthdate(2526158.0)
6/4 03:32:23 (pid:13521) Started shadow for job 2526158.0 on "<192.168.4.246:37819>", (shadow pid = 14518)
6/4 03:32:23 (pid:13521) Sent ad to central manager for jclayton__AT__nemo.phys.uwm.edu
6/4 03:32:23 (pid:13521) Sent ad to 1 collectors for jclayton__AT__nemo.phys.uwm.edu
6/4 03:32:24 (pid:13521) DaemonCore: pid 14516 exited with status 25600, invoking reaper 3 <child_exit>
6/4 03:32:24 (pid:13521) Shadow pid 14516 for job 2526156.0 exited with status 100
6/4 03:32:24 (pid:13521) DaemonCore: return from reaper for pid 14516
*** End of file SchedLog
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or Condor in general?
Email address of the local Condor administrator: condor-admin__AT__gravity.phys.uwm.edu
The Official Condor Homepage is http://www.cs.wisc.edu/condor
===========================================================================
Date of creation: Wed Jun 4 8:02:41 2008 (1212584564)
Subject: Actions
Assigned to wenger by wenger
===========================================================================
Date of actions: Thu Jun 5 11:03:11 2008 (1212681791)
Date: Thu, 5 Jun 2008 12:15:34 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18077] LIGO: condor_schedd periodically crashing
Scott,
> The condor_schedd running at UWM on the Nemo cluster head node
> marlin.phys.uwm.edu started periodically crashing last night.
> It has crashed 6 times in the past 8 hours.
>
> Below are the 6 emails received from Condor about the crashes.
>
> The SchedLog that covers this time period is available at
>
> http://www.lsc-group.phys.uwm.edu/lscdatagrid/downloads/SchedLog.UWM-Nemo.06-04-08.gz
Okay, I grabbed that log file and we are looking into the problem.
> This is for Condor 7.0.1 running on FC4 x86_64.
>
> I can tell you that this schedd has exclusively been
> responsible for a set of 5 large DAGs that were submitted on
> 5/28 and have been running since. Only the one user has any
> jobs in the queue on this schedd at the moment.
>
> There are two other condor_schedd for this pool. One has a
> fair number of jobs but is not crashing, the other is empty.
Hmm, I wonder if it is some kind of load-related problem, then.
I guess we'll find out.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Thu Jun 5 12:15:37 2008 (1212686138)
Date: Fri, 20 Jun 2008 16:40:35 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: wenger <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18077] LIGO: condor_schedd periodically crashing
Scott,
> The condor_schedd running at UWM on the Nemo cluster head node
> marlin.phys.uwm.edu started periodically crashing last night.
> It has crashed 6 times in the past 8 hours.
>
> Below are the 6 emails received from Condor about the crashes.
>
> The SchedLog that covers this time period is available at
>
> http://www.lsc-group.phys.uwm.edu/lscdatagrid/downloads/SchedLog.UWM-Nemo.06-04-08.gz
Did you get rid of that file? Unfortunately I got distracted from this
ticket for a while, and I'm getting back to it now. But I forgot to grab
the SchedLog file earlier.
Sorry about that...
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Fri Jun 20 16:40:36 2008 (1213998037)
Date: Fri, 20 Jun 2008 16:50:58 -0500
From: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
CC: anderson__AT__ligo.caltech.edu, parmor__AT__gravity.phys.uwm.edu,
gskelton__AT__gravity.phys.uwm.edu
Subject: Re: [condor-admin #18077] LIGO: condor_schedd periodically crashing
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
Hi,
> Scott,
>
> > The condor_schedd running at UWM on the Nemo cluster head node
> > marlin.phys.uwm.edu started periodically crashing last night.
> > It has crashed 6 times in the past 8 hours.
> >
> > Below are the 6 emails received from Condor about the crashes.
> >
> > The SchedLog that covers this time period is available at
> >
> > http://www.lsc-group.phys.uwm.edu/lscdatagrid/downloads/SchedLog.UWM-Nemo.06-04-08.gz
>
> Did you get rid of that file? Unfortunately I got distracted from this
> ticket for a while, and I'm getting back to it now. But I forgot to grab
> the SchedLog file earlier.
>
> Sorry about that...
>
No worries. It is still there.
The problem was transient. We have not seen it since.
Cheers,
Scott
===========================================================================
Date mail was appended: Fri Jun 20 16:47:56 2008 (1213998480)
Date: Fri, 20 Jun 2008 16:57:15 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18077] LIGO: condor_schedd periodically crashing
Scott,
> No worries. It is still there.
Hmm, I must have made a typo, because I tried to get it and failed. But I
got it now.
> The problem was transient. We have not seen it since.
Okay, glad to hear that it went away. I'll see what I can figure out.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Fri Jun 20 16:57:19 2008 (1213999039)
Date: Thu, 22 Oct 2009 16:38:52 -0500
From: Scott Koranda <skoranda__AT__gravity.phys.uwm.edu>
To: condor-admin__AT__cs.wisc.edu
CC: Stuart Anderson <anderson__AT__ligo.caltech.edu>
Subject: [condor-admin #18077] LIGO: condor_schedd periodically crashing
X-Seen-BY: mailfromd 4.1 gypsum.cs.wisc.edu
>> From: "condor-admin response tracking system" <condor-admin__AT__cs.wisc.edu
>> >
>> Date: June 20, 2008 2:57:19 PM PDT
>> To: skoranda__AT__gravity.phys.uwm.edu
>> Cc: anderson__AT__ligo.caltech.edu, skoranda__AT__gravity.phys.uwm.edu,
>> parmor__AT__gravity.phys.uwm.edu, gskelton__AT__gravity.phys.uwm.edu
>> Subject: Re: [condor-admin #18077] LIGO: condor_schedd periodically
>> crashing
>> Reply-To: condor-admin__AT__cs.wisc.edu
>>
>> Scott,
>>
>>> No worries. It is still there.
>>
>> Hmm, I must have made a typo, because I tried to get it and failed.
>> But I
>> got it now.
>>
>>> The problem was transient. We have not seen it since.
>>
>> Okay, glad to hear that it went away. I'll see what I can figure out.
>>
>> Kent Wenger
>> Condor Team
>>
I believe this ticket can be closed.
Thanks much,
Scott
===========================================================================
Date mail was appended: Thu Oct 22 16:38:54 2009 (1256247535)
Date: Fri, 23 Oct 2009 09:22:30 -0500 (CDT)
From: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
To: condor-admin response tracking system <condor-admin__AT__cs.wisc.edu>
CC: "R. Kent Wenger" <wenger__AT__cs.wisc.edu>
Subject: Re: [condor-admin #18077] LIGO: condor_schedd periodically crashing
Scott,
> I believe this ticket can be closed.
Okay, I'll do that. Thanks for the update.
Kent Wenger
Condor Team
===========================================================================
Date mail was appended: Fri Oct 23 9:22:36 2009 (1256307759)
Subject: Actions
Ticket resolved by wenger
===========================================================================
Date of actions: Fri Oct 23 9:24:05 2009 (1256307845)