LIGO Support Ticket 7840
Ticket Information
Number: support 7840
User: anderson@ligo.caltech.edu
Email: jabadie__AT__ligo.caltech.edu
Status: resolved
Assigned To: psilord
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: LIGO: matchmaking confusion
Date: Sat, 26 Sep 2009 16:21:44 -0700
CC: Josh Abadie <jabadie__AT__ligo.caltech.edu>
X-Seen-BY: mailfromd 4.1 obsidian.cs.wisc.edu
I am confused about why certain matches are being made for jobs in the
LIGO CIT condor pool running,
[root@ldas-pcdev1 ~]# condor_version
$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
$CondorPlatform: X86_64-LINUX_RHEL5 $
In particular, jobs in the Vanilla Universe with an ImageSize in
excess of a slot Memory size are being matched.
For example, job 20779335.0 is reporting ImageSize = 5000000 (i.e.,
5GB) and is being matched to a machine ad that has Memory = 2500
(i.e., 2.5 GB).
The MatchLog on the central manager reports,
9/26 02:08:18 Matched 20779335.0 acsearle@ligo
<10.14.0.18:55622> preempting none <10.14.2.158:46819> slot5__AT__node408.ldas-cit.ligo.caltech.edu
while this slot is advertising 2.5GByte,
[root@ldas-pcdev1 ~]# condor_status node408 | egrep "slot5|Mem"
Name OpSys Arch State Activity LoadAv Mem
ActvtyTime
slot5__AT__node408.ldas LINUX X86_64 Claimed Busy 1.120 2500
0+00:08:48
however, the schedd shows 5GByte,
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
20779335.0 acsearle 9/25 15:56 0+09:39:41 H 0 4882.8
wpipeline search -
as confirmed with,
[root@ldas-pcdev1 ~]# condor_q -long 20779335.0 | grep Image
RequestMemory = ceiling(ImageSize / 1024.000000)
ImageSize_RAW = 4771600
ImageSize = 5000000
This is resulting in lots of job failures and futile job restarts on
other slots.
Here is the full Machine ClassAd (including the other slots on the
same machine), followed by the full job ClassAd. Any help in
understanding whether we have misconfigured our pool or if this is a
bug would be greatly appreciated.
Thanks.
[root@ldas-pcdev1 ~]# condor_status -long node408
MyType = "Machine"
TargetType = "Job"
Name = "slot1__AT__node408.ldas-cit.ligo.caltech.edu"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
MyCurrentTime = 1254006648
Machine = "node408.ldas-cit.ligo.caltech.edu"
PublicNetworkIpAddr = "<10.14.2.158:46819>"
DedicatedScheduler = "DedicatedScheduler__AT__ldas-pcdev1.ligo.caltech.edu"
COLLECTOR_HOST_STRING = "ldas-condori"
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
SlotID = 1
VirtualMachineID = 1
ExecutableSize = 10
JobUniverse = 5
NiceUser = FALSE
VirtualMemory = 2047981
TotalDisk = 103849644
Disk = 12981206
CondorLoadAvg = 1.110000
LoadAvg = 1.110000
KeyboardIdle = 156019
ConsoleIdle = 156019
Memory = 800
Cpus = 1
StartdIpAddr = "<10.14.2.158:46819>"
Arch = "X86_64"
OpSys = "LINUX"
UidDomain = "ligo"
FileSystemDomain = "ligo"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX X86_64 2.6.x normal 0xffffffffff600000"
TotalVirtualMemory = 16383848
TotalCpus = 8
TotalMemory = 16058
KFlops = 1529347
Mips = 4471
LastBenchmark = 1253937262
TotalLoadAvg = 9.000000
TotalCondorLoadAvg = 8.980000
ClockMin = 970
ClockDay = 6
TotalSlots = 8
TotalVirtualMachines = 8
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_0"
JavaMFlops = 665.608765
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList =
"HasFileTransfer
,HasPerFileEncryption
,HasReconnect
,HasMPI
,HasTDP
,HasJobDeferral
,HasJICLocalConfig
,HasJICLocalStdin,HasJava,HasVM,HasRemoteSyscalls,HasCheckpointing"
HasVM = FALSE
HibernationLevel = 0
HibernationState = "NONE"
CanHibernate = TRUE
HardwareAddress = "00:23:8b:77:10:a1"
Subnet = "255.255.0.0"
IsWakeOnLanSupported = TRUE
IsWakeOnLanEnabled = TRUE
IsWakeAble = TRUE
WakeOnLanSupportedFlags = "Magic Packet"
WakeOnLanEnabledFlags = "Magic Packet"
CpuBusyTime = 0
CpuIsBusy = FALSE
TimeToLive = 2147483647
State = "Claimed"
EnteredCurrentState = 1254006097
Activity = "Busy"
EnteredCurrentActivity = 1254006097
TotalTimeOwnerIdle = 6
TotalTimeMatchedIdle = 209
TotalTimeClaimedIdle = 1029
TotalTimeClaimedBusy = 96912
TotalTimeBackfillIdle = 330
TotalTimeBackfillBusy = 57534
Start = TRUE
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
LastFetchWorkSpawned = 0
LastFetchWorkCompleted = 0
NextFetchWorkDelay = -1
CurrentRank = 0.000000
RemoteUser = "omega@ligo"
RemoteOwner = "omega@ligo"
ClientMachine = "node499.ldas-cit.ligo.caltech.edu"
JobId = "103482.0"
GlobalJobId = "node499.ldas-cit.ligo.caltech.edu#103482.0#1254006096"
JobStart = 1254006097
ImageSize = 892456
TotalJobRunTime = 551
TotalClaimRunTime = 551
MonitorSelfTime = 1254006634
MonitorSelfCPUUsage = 0.204175
MonitorSelfImageSize = 28584.000000
MonitorSelfResidentSetSize = 4820
MonitorSelfAge = 0
MonitorSelfRegisteredSocketCount = 10
DaemonStartTime = 1253850620
UpdateSequenceNumber = 855
MyAddress = "<10.14.2.158:46819>"
LastHeardFrom = 1254006648
UpdatesTotal = 839
UpdatesSequenced = 839
UpdatesLost = 9
UpdatesHistory = "0x00000000000000000000000000000000"
MyType = "Machine"
TargetType = "Job"
Name = "slot2__AT__node408.ldas-cit.ligo.caltech.edu"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
MyCurrentTime = 1254006649
Machine = "node408.ldas-cit.ligo.caltech.edu"
PublicNetworkIpAddr = "<10.14.2.158:46819>"
DedicatedScheduler = "DedicatedScheduler__AT__ldas-pcdev1.ligo.caltech.edu"
COLLECTOR_HOST_STRING = "ldas-condori"
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
SlotID = 2
VirtualMachineID = 2
ExecutableSize = 10
JobUniverse = 5
NiceUser = FALSE
VirtualMemory = 2047981
TotalDisk = 103849644
Disk = 12981206
CondorLoadAvg = 1.120000
LoadAvg = 1.120000
KeyboardIdle = 156019
ConsoleIdle = 156019
Memory = 800
Cpus = 1
StartdIpAddr = "<10.14.2.158:46819>"
Arch = "X86_64"
OpSys = "LINUX"
UidDomain = "ligo"
FileSystemDomain = "ligo"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX X86_64 2.6.x normal 0xffffffffff600000"
TotalVirtualMemory = 16383848
TotalCpus = 8
TotalMemory = 16058
KFlops = 1529347
Mips = 4471
LastBenchmark = 1253937262
TotalLoadAvg = 9.000000
TotalCondorLoadAvg = 8.980000
ClockMin = 970
ClockDay = 6
TotalSlots = 8
TotalVirtualMachines = 8
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_0"
JavaMFlops = 665.608765
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList =
"HasFileTransfer
,HasPerFileEncryption
,HasReconnect
,HasMPI
,HasTDP
,HasJobDeferral
,HasJICLocalConfig
,HasJICLocalStdin,HasJava,HasVM,HasRemoteSyscalls,HasCheckpointing"
HasVM = FALSE
HibernationLevel = 0
HibernationState = "NONE"
CanHibernate = TRUE
HardwareAddress = "00:23:8b:77:10:a1"
Subnet = "255.255.0.0"
IsWakeOnLanSupported = TRUE
IsWakeOnLanEnabled = TRUE
IsWakeAble = TRUE
WakeOnLanSupportedFlags = "Magic Packet"
WakeOnLanEnabledFlags = "Magic Packet"
CpuBusyTime = 0
CpuIsBusy = FALSE
TimeToLive = 2147483647
State = "Claimed"
EnteredCurrentState = 1254006124
Activity = "Busy"
EnteredCurrentActivity = 1254006124
TotalTimeOwnerIdle = 6
TotalTimeUnclaimedIdle = 1
TotalTimeMatchedIdle = 380
TotalTimeClaimedIdle = 1918
TotalTimeClaimedBusy = 89073
TotalTimePreemptingVacating = 2
TotalTimeBackfillIdle = 465
TotalTimeBackfillBusy = 64176
Start = TRUE
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
LastFetchWorkSpawned = 0
LastFetchWorkCompleted = 0
NextFetchWorkDelay = -1
CurrentRank = 0.000000
RemoteUser = "omega@ligo"
RemoteOwner = "omega@ligo"
ClientMachine = "node499.ldas-cit.ligo.caltech.edu"
JobId = "103485.0"
GlobalJobId = "node499.ldas-cit.ligo.caltech.edu#103485.0#1254006107"
JobStart = 1254006124
ImageSize = 964512
TotalJobRunTime = 525
TotalClaimRunTime = 525
MonitorSelfTime = 1254006634
MonitorSelfCPUUsage = 0.204175
MonitorSelfImageSize = 28584.000000
MonitorSelfResidentSetSize = 4820
MonitorSelfAge = 0
MonitorSelfRegisteredSocketCount = 10
DaemonStartTime = 1253850620
UpdateSequenceNumber = 1040
MyAddress = "<10.14.2.158:46819>"
LastHeardFrom = 1254006649
UpdatesTotal = 1023
UpdatesSequenced = 1024
UpdatesLost = 4
UpdatesHistory = "0x00000000000000000000000000000000"
MyType = "Machine"
TargetType = "Job"
Name = "slot3__AT__node408.ldas-cit.ligo.caltech.edu"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
MyCurrentTime = 1254006650
Machine = "node408.ldas-cit.ligo.caltech.edu"
PublicNetworkIpAddr = "<10.14.2.158:46819>"
DedicatedScheduler = "DedicatedScheduler__AT__ldas-pcdev1.ligo.caltech.edu"
COLLECTOR_HOST_STRING = "ldas-condori"
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
SlotID = 3
VirtualMachineID = 3
ExecutableSize = 10
JobUniverse = 5
NiceUser = FALSE
VirtualMemory = 2047981
TotalDisk = 103849644
Disk = 12981206
CondorLoadAvg = 1.120000
LoadAvg = 1.120000
KeyboardIdle = 156019
ConsoleIdle = 156019
Memory = 1400
Cpus = 1
StartdIpAddr = "<10.14.2.158:46819>"
Arch = "X86_64"
OpSys = "LINUX"
UidDomain = "ligo"
FileSystemDomain = "ligo"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX X86_64 2.6.x normal 0xffffffffff600000"
TotalVirtualMemory = 16383848
TotalCpus = 8
TotalMemory = 16058
KFlops = 1529347
Mips = 4471
LastBenchmark = 1253937262
TotalLoadAvg = 9.000000
TotalCondorLoadAvg = 8.980000
ClockMin = 970
ClockDay = 6
TotalSlots = 8
TotalVirtualMachines = 8
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_0"
JavaMFlops = 665.608765
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList =
"HasFileTransfer
,HasPerFileEncryption
,HasReconnect
,HasMPI
,HasTDP
,HasJobDeferral
,HasJICLocalConfig
,HasJICLocalStdin,HasJava,HasVM,HasRemoteSyscalls,HasCheckpointing"
HasVM = FALSE
HibernationLevel = 0
HibernationState = "NONE"
CanHibernate = TRUE
HardwareAddress = "00:23:8b:77:10:a1"
Subnet = "255.255.0.0"
IsWakeOnLanSupported = TRUE
IsWakeOnLanEnabled = TRUE
IsWakeAble = TRUE
WakeOnLanSupportedFlags = "Magic Packet"
WakeOnLanEnabledFlags = "Magic Packet"
CpuBusyTime = 0
CpuIsBusy = FALSE
TimeToLive = 2147483647
State = "Claimed"
EnteredCurrentState = 1254004277
Activity = "Busy"
EnteredCurrentActivity = 1254004277
TotalTimeOwnerIdle = 6
TotalTimeMatchedIdle = 258
TotalTimeClaimedIdle = 1168
TotalTimeClaimedBusy = 83056
TotalTimePreemptingVacating = 1
TotalTimeBackfillIdle = 307
TotalTimeBackfillBusy = 71226
Start = TRUE
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
LastFetchWorkSpawned = 0
LastFetchWorkCompleted = 0
NextFetchWorkDelay = -1
CurrentRank = 0.000000
RemoteUser = "omega@ligo"
RemoteOwner = "omega@ligo"
ClientMachine = "node499.ldas-cit.ligo.caltech.edu"
JobId = "103351.0"
GlobalJobId = "node499.ldas-cit.ligo.caltech.edu#103351.0#1254004265"
JobStart = 1254004277
ImageSize = 948788
TotalJobRunTime = 2373
TotalClaimRunTime = 2373
MonitorSelfTime = 1254006634
MonitorSelfCPUUsage = 0.204175
MonitorSelfImageSize = 28584.000000
MonitorSelfResidentSetSize = 4820
MonitorSelfAge = 0
MonitorSelfRegisteredSocketCount = 10
DaemonStartTime = 1253850620
UpdateSequenceNumber = 833
MyAddress = "<10.14.2.158:46819>"
LastHeardFrom = 1254006650
UpdatesTotal = 817
UpdatesSequenced = 818
UpdatesLost = 5
UpdatesHistory = "0x00000000000000000000000000000000"
MyType = "Machine"
TargetType = "Job"
Name = "slot4__AT__node408.ldas-cit.ligo.caltech.edu"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
MyCurrentTime = 1254006651
Machine = "node408.ldas-cit.ligo.caltech.edu"
PublicNetworkIpAddr = "<10.14.2.158:46819>"
DedicatedScheduler = "DedicatedScheduler__AT__ldas-pcdev1.ligo.caltech.edu"
COLLECTOR_HOST_STRING = "ldas-condori"
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
SlotID = 4
VirtualMachineID = 4
ExecutableSize = 10
JobUniverse = 5
NiceUser = FALSE
VirtualMemory = 2047981
TotalDisk = 103849644
Disk = 12981206
CondorLoadAvg = 1.120000
LoadAvg = 1.120000
KeyboardIdle = 156019
ConsoleIdle = 156019
Memory = 1400
Cpus = 1
StartdIpAddr = "<10.14.2.158:46819>"
Arch = "X86_64"
OpSys = "LINUX"
UidDomain = "ligo"
FileSystemDomain = "ligo"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX X86_64 2.6.x normal 0xffffffffff600000"
TotalVirtualMemory = 16383848
TotalCpus = 8
TotalMemory = 16058
KFlops = 1529347
Mips = 4471
LastBenchmark = 1253937262
TotalLoadAvg = 9.000000
TotalCondorLoadAvg = 8.980000
ClockMin = 970
ClockDay = 6
TotalSlots = 8
TotalVirtualMachines = 8
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_0"
JavaMFlops = 665.608765
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList =
"HasFileTransfer
,HasPerFileEncryption
,HasReconnect
,HasMPI
,HasTDP
,HasJobDeferral
,HasJICLocalConfig
,HasJICLocalStdin,HasJava,HasVM,HasRemoteSyscalls,HasCheckpointing"
HasVM = FALSE
HibernationLevel = 0
HibernationState = "NONE"
CanHibernate = TRUE
HardwareAddress = "00:23:8b:77:10:a1"
Subnet = "255.255.0.0"
IsWakeOnLanSupported = TRUE
IsWakeOnLanEnabled = TRUE
IsWakeAble = TRUE
WakeOnLanSupportedFlags = "Magic Packet"
WakeOnLanEnabledFlags = "Magic Packet"
CpuBusyTime = 0
CpuIsBusy = FALSE
TimeToLive = 2147483647
State = "Claimed"
EnteredCurrentState = 1254004277
Activity = "Busy"
EnteredCurrentActivity = 1254004277
TotalTimeOwnerIdle = 6
TotalTimeMatchedIdle = 179
TotalTimeClaimedIdle = 905
TotalTimeClaimedBusy = 85251
TotalTimeBackfillIdle = 254
TotalTimeBackfillBusy = 69427
TotalTimeBackfillKilling = 1
Start = TRUE
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
LastFetchWorkSpawned = 0
LastFetchWorkCompleted = 0
NextFetchWorkDelay = -1
CurrentRank = 0.000000
RemoteUser = "omega@ligo"
RemoteOwner = "omega@ligo"
ClientMachine = "node499.ldas-cit.ligo.caltech.edu"
JobId = "103352.0"
GlobalJobId = "node499.ldas-cit.ligo.caltech.edu#103352.0#1254004265"
JobStart = 1254004277
ImageSize = 966696
TotalJobRunTime = 2374
TotalClaimRunTime = 2374
MonitorSelfTime = 1254006634
MonitorSelfCPUUsage = 0.204175
MonitorSelfImageSize = 28584.000000
MonitorSelfResidentSetSize = 4820
MonitorSelfAge = 0
MonitorSelfRegisteredSocketCount = 10
DaemonStartTime = 1253850620
UpdateSequenceNumber = 748
MyAddress = "<10.14.2.158:46819>"
LastHeardFrom = 1254006651
UpdatesTotal = 738
UpdatesSequenced = 739
UpdatesLost = 5
UpdatesHistory = "0x00000000000000000000000000000000"
MyType = "Machine"
TargetType = "Job"
Name = "slot5__AT__node408.ldas-cit.ligo.caltech.edu"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
MyCurrentTime = 1254006652
Machine = "node408.ldas-cit.ligo.caltech.edu"
PublicNetworkIpAddr = "<10.14.2.158:46819>"
DedicatedScheduler = "DedicatedScheduler__AT__ldas-pcdev1.ligo.caltech.edu"
COLLECTOR_HOST_STRING = "ldas-condori"
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
SlotID = 5
VirtualMachineID = 5
ExecutableSize = 10
JobUniverse = 5
NiceUser = FALSE
VirtualMemory = 2047981
TotalDisk = 103849644
Disk = 12981206
CondorLoadAvg = 1.120000
LoadAvg = 1.120000
KeyboardIdle = 156019
ConsoleIdle = 156019
Memory = 2500
Cpus = 1
StartdIpAddr = "<10.14.2.158:46819>"
Arch = "X86_64"
OpSys = "LINUX"
UidDomain = "ligo"
FileSystemDomain = "ligo"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX X86_64 2.6.x normal 0xffffffffff600000"
TotalVirtualMemory = 16383848
TotalCpus = 8
TotalMemory = 16058
KFlops = 1529347
Mips = 4471
LastBenchmark = 1253937262
TotalLoadAvg = 9.000000
TotalCondorLoadAvg = 8.980000
ClockMin = 970
ClockDay = 6
TotalSlots = 8
TotalVirtualMachines = 8
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_0"
JavaMFlops = 665.608765
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList =
"HasFileTransfer
,HasPerFileEncryption
,HasReconnect
,HasMPI
,HasTDP
,HasJobDeferral
,HasJICLocalConfig
,HasJICLocalStdin,HasJava,HasVM,HasRemoteSyscalls,HasCheckpointing"
HasVM = FALSE
HibernationLevel = 0
HibernationState = "NONE"
CanHibernate = TRUE
HardwareAddress = "00:23:8b:77:10:a1"
Subnet = "255.255.0.0"
IsWakeOnLanSupported = TRUE
IsWakeOnLanEnabled = TRUE
IsWakeAble = TRUE
WakeOnLanSupportedFlags = "Magic Packet"
WakeOnLanEnabledFlags = "Magic Packet"
CpuBusyTime = 0
CpuIsBusy = FALSE
TimeToLive = 2147483647
State = "Claimed"
EnteredCurrentState = 1254006124
Activity = "Busy"
EnteredCurrentActivity = 1254006124
TotalTimeOwnerIdle = 6
TotalTimeMatchedIdle = 89
TotalTimeClaimedIdle = 443
TotalTimeClaimedBusy = 137763
TotalTimePreemptingVacating = 1075
TotalTimeBackfillIdle = 1361
TotalTimeBackfillBusy = 15287
Start = TRUE
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
LastFetchWorkSpawned = 0
LastFetchWorkCompleted = 0
NextFetchWorkDelay = -1
CurrentRank = 0.000000
RemoteUser = "omega@ligo"
RemoteOwner = "omega@ligo"
ClientMachine = "node499.ldas-cit.ligo.caltech.edu"
JobId = "103486.0"
GlobalJobId = "node499.ldas-cit.ligo.caltech.edu#103486.0#1254006107"
JobStart = 1254006124
ImageSize = 1026692
TotalJobRunTime = 528
TotalClaimRunTime = 528
MonitorSelfTime = 1254006634
MonitorSelfCPUUsage = 0.204175
MonitorSelfImageSize = 28584.000000
MonitorSelfResidentSetSize = 4820
MonitorSelfAge = 0
MonitorSelfRegisteredSocketCount = 10
DaemonStartTime = 1253850620
UpdateSequenceNumber = 1092
MyAddress = "<10.14.2.158:46819>"
LastHeardFrom = 1254006652
UpdatesTotal = 1083
UpdatesSequenced = 1083
UpdatesLost = 2
UpdatesHistory = "0x00000000000000000000000000000000"
MyType = "Machine"
TargetType = "Job"
Name = "slot6__AT__node408.ldas-cit.ligo.caltech.edu"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
MyCurrentTime = 1254006653
Machine = "node408.ldas-cit.ligo.caltech.edu"
PublicNetworkIpAddr = "<10.14.2.158:46819>"
DedicatedScheduler = "DedicatedScheduler__AT__ldas-pcdev1.ligo.caltech.edu"
COLLECTOR_HOST_STRING = "ldas-condori"
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
SlotID = 6
VirtualMachineID = 6
ExecutableSize = 10
JobUniverse = 5
NiceUser = FALSE
VirtualMemory = 2047981
TotalDisk = 103849644
Disk = 12981206
CondorLoadAvg = 1.120000
LoadAvg = 1.120000
KeyboardIdle = 156024
ConsoleIdle = 156024
Memory = 2500
Cpus = 1
StartdIpAddr = "<10.14.2.158:46819>"
Arch = "X86_64"
OpSys = "LINUX"
UidDomain = "ligo"
FileSystemDomain = "ligo"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX X86_64 2.6.x normal 0xffffffffff600000"
TotalVirtualMemory = 16383848
TotalCpus = 8
TotalMemory = 16058
KFlops = 1529347
Mips = 4471
LastBenchmark = 1253937262
TotalLoadAvg = 9.000000
TotalCondorLoadAvg = 8.980000
ClockMin = 970
ClockDay = 6
TotalSlots = 8
TotalVirtualMachines = 8
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_0"
JavaMFlops = 665.608765
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList =
"HasFileTransfer
,HasPerFileEncryption
,HasReconnect
,HasMPI
,HasTDP
,HasJobDeferral
,HasJICLocalConfig
,HasJICLocalStdin,HasJava,HasVM,HasRemoteSyscalls,HasCheckpointing"
HasVM = FALSE
HibernationLevel = 0
HibernationState = "NONE"
CanHibernate = TRUE
HardwareAddress = "00:23:8b:77:10:a1"
Subnet = "255.255.0.0"
IsWakeOnLanSupported = TRUE
IsWakeOnLanEnabled = TRUE
IsWakeAble = TRUE
WakeOnLanSupportedFlags = "Magic Packet"
WakeOnLanEnabledFlags = "Magic Packet"
CpuBusyTime = 0
CpuIsBusy = FALSE
TimeToLive = 2147483647
State = "Claimed"
EnteredCurrentState = 1253951079
Activity = "Busy"
EnteredCurrentActivity = 1253951085
TotalTimeOwnerIdle = 6
TotalTimeMatchedIdle = 48
TotalTimeClaimedIdle = 409
TotalTimeClaimedBusy = 146838
TotalTimePreemptingVacating = 76
TotalTimeBackfillIdle = 576
TotalTimeBackfillBusy = 8072
Start = TRUE
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
LastFetchWorkSpawned = 0
LastFetchWorkCompleted = 0
NextFetchWorkDelay = -1
CurrentRank = 0.000000
RemoteUser = "acsearle@ligo"
RemoteOwner = "acsearle@ligo"
ClientMachine = "ldas-pcdev1.ligo.caltech.edu"
JobId = "20780625.0"
GlobalJobId = "ldas-pcdev1.ligo.caltech.edu#20780625.0#1253919662"
JobStart = 1253951085
ImageSize = 4435860
TotalJobRunTime = 55568
TotalClaimRunTime = 55568
MonitorSelfTime = 1254006634
MonitorSelfCPUUsage = 0.204175
MonitorSelfImageSize = 28584.000000
MonitorSelfResidentSetSize = 4820
MonitorSelfAge = 0
MonitorSelfRegisteredSocketCount = 10
DaemonStartTime = 1253850620
UpdateSequenceNumber = 759
MyAddress = "<10.14.2.158:46819>"
LastHeardFrom = 1254006653
UpdatesTotal = 751
UpdatesSequenced = 750
UpdatesLost = 1
UpdatesHistory = "0x00000000000000000000000000000000"
MyType = "Machine"
TargetType = "Job"
Name = "slot7__AT__node408.ldas-cit.ligo.caltech.edu"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
MyCurrentTime = 1254006654
Machine = "node408.ldas-cit.ligo.caltech.edu"
PublicNetworkIpAddr = "<10.14.2.158:46819>"
DedicatedScheduler = "DedicatedScheduler__AT__ldas-pcdev1.ligo.caltech.edu"
COLLECTOR_HOST_STRING = "ldas-condori"
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
SlotID = 7
VirtualMachineID = 7
ExecutableSize = 10
JobUniverse = 5
NiceUser = FALSE
VirtualMemory = 2047981
TotalDisk = 103849644
Disk = 12981206
CondorLoadAvg = 1.140000
LoadAvg = 1.140000
KeyboardIdle = 156024
ConsoleIdle = 156024
Memory = 2500
Cpus = 1
StartdIpAddr = "<10.14.2.158:46819>"
Arch = "X86_64"
OpSys = "LINUX"
UidDomain = "ligo"
FileSystemDomain = "ligo"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX X86_64 2.6.x normal 0xffffffffff600000"
TotalVirtualMemory = 16383848
TotalCpus = 8
TotalMemory = 16058
KFlops = 1529347
Mips = 4471
LastBenchmark = 1253937262
TotalLoadAvg = 9.000000
TotalCondorLoadAvg = 8.980000
ClockMin = 970
ClockDay = 6
TotalSlots = 8
TotalVirtualMachines = 8
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_0"
JavaMFlops = 665.608765
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList =
"HasFileTransfer
,HasPerFileEncryption
,HasReconnect
,HasMPI
,HasTDP
,HasJobDeferral
,HasJICLocalConfig
,HasJICLocalStdin,HasJava,HasVM,HasRemoteSyscalls,HasCheckpointing"
HasVM = FALSE
HibernationLevel = 0
HibernationState = "NONE"
CanHibernate = TRUE
HardwareAddress = "00:23:8b:77:10:a1"
Subnet = "255.255.0.0"
IsWakeOnLanSupported = TRUE
IsWakeOnLanEnabled = TRUE
IsWakeAble = TRUE
WakeOnLanSupportedFlags = "Magic Packet"
WakeOnLanEnabledFlags = "Magic Packet"
CpuBusyTime = 0
CpuIsBusy = FALSE
TimeToLive = 2147483647
State = "Claimed"
EnteredCurrentState = 1253989744
Activity = "Busy"
EnteredCurrentActivity = 1253989746
TotalTimeOwnerIdle = 7
TotalTimeUnclaimedIdle = 1
TotalTimeUnclaimedBenchmarking = 5
TotalTimeMatchedIdle = 526
TotalTimeClaimedIdle = 2975
TotalTimeClaimedBusy = 117323
TotalTimeClaimedRetiring = 1
TotalTimePreemptingVacating = 2562
TotalTimeBackfillIdle = 3562
TotalTimeBackfillBusy = 29059
TotalTimeBackfillKilling = 5
Start = TRUE
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
LastFetchWorkSpawned = 0
LastFetchWorkCompleted = 0
NextFetchWorkDelay = -1
CurrentRank = 0.000000
RemoteUser = "acsearle@ligo"
RemoteOwner = "acsearle@ligo"
ClientMachine = "ldas-pcdev1.ligo.caltech.edu"
JobId = "20781154.0"
GlobalJobId = "ldas-pcdev1.ligo.caltech.edu#20781154.0#1253919741"
JobStart = 1253989745
ImageSize = 1491988
TotalJobRunTime = 16908
TotalClaimRunTime = 16908
MonitorSelfTime = 1254006634
MonitorSelfCPUUsage = 0.204175
MonitorSelfImageSize = 28584.000000
MonitorSelfResidentSetSize = 4820
MonitorSelfAge = 0
MonitorSelfRegisteredSocketCount = 10
DaemonStartTime = 1253850620
UpdateSequenceNumber = 1913
MyAddress = "<10.14.2.158:46819>"
LastHeardFrom = 1254006654
UpdatesTotal = 1905
UpdatesSequenced = 1904
UpdatesLost = 0
UpdatesHistory = "0x00000000000000000000000000000000"
MyType = "Machine"
TargetType = "Job"
Name = "slot8__AT__node408.ldas-cit.ligo.caltech.edu"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
MyCurrentTime = 1254006647
Machine = "node408.ldas-cit.ligo.caltech.edu"
PublicNetworkIpAddr = "<10.14.2.158:46819>"
DedicatedScheduler = "DedicatedScheduler__AT__ldas-pcdev1.ligo.caltech.edu"
COLLECTOR_HOST_STRING = "ldas-condori"
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
SlotID = 8
VirtualMachineID = 8
ExecutableSize = 10
JobUniverse = 5
NiceUser = FALSE
VirtualMemory = 2047981
TotalDisk = 103849644
Disk = 12981206
CondorLoadAvg = 1.120000
LoadAvg = 1.140000
KeyboardIdle = 828
ConsoleIdle = 156019
Memory = 4150
Cpus = 1
StartdIpAddr = "<10.14.2.158:46819>"
Arch = "X86_64"
OpSys = "LINUX"
UidDomain = "ligo"
FileSystemDomain = "ligo"
HasIOProxy = TRUE
CheckpointPlatform = "LINUX X86_64 2.6.x normal 0xffffffffff600000"
TotalVirtualMemory = 16383848
TotalCpus = 8
TotalMemory = 16058
KFlops = 1529347
Mips = 4471
LastBenchmark = 1253937262
TotalLoadAvg = 9.000000
TotalCondorLoadAvg = 8.980000
ClockMin = 970
ClockDay = 6
TotalSlots = 8
TotalVirtualMachines = 8
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_0"
JavaMFlops = 665.608765
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList =
"HasFileTransfer
,HasPerFileEncryption
,HasReconnect
,HasMPI
,HasTDP
,HasJobDeferral
,HasJICLocalConfig
,HasJICLocalStdin,HasJava,HasVM,HasRemoteSyscalls,HasCheckpointing"
HasVM = FALSE
HibernationLevel = 0
HibernationState = "NONE"
CanHibernate = TRUE
HardwareAddress = "00:23:8b:77:10:a1"
Subnet = "255.255.0.0"
IsWakeOnLanSupported = TRUE
IsWakeOnLanEnabled = TRUE
IsWakeAble = TRUE
WakeOnLanSupportedFlags = "Magic Packet"
WakeOnLanEnabledFlags = "Magic Packet"
CpuBusyTime = 0
CpuIsBusy = FALSE
TimeToLive = 2147483647
State = "Claimed"
EnteredCurrentState = 1253919428
Activity = "Busy"
EnteredCurrentActivity = 1253919437
TotalTimeOwnerIdle = 6
TotalTimeMatchedIdle = 10
TotalTimeClaimedIdle = 150
TotalTimeClaimedBusy = 152682
TotalTimeBackfillIdle = 29
TotalTimeBackfillBusy = 3141
TotalTimeBackfillKilling = 1
Start = TRUE
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = 0
LastFetchWorkSpawned = 0
LastFetchWorkCompleted = 0
NextFetchWorkDelay = -1
CurrentRank = 0.000000
RemoteUser = "acsearle@ligo"
RemoteOwner = "acsearle@ligo"
ClientMachine = "ldas-pcdev1.ligo.caltech.edu"
JobId = "20779361.0"
GlobalJobId = "ldas-pcdev1.ligo.caltech.edu#20779361.0#1253919425"
JobStart = 1253919437
ImageSize = 2240096
TotalJobRunTime = 87210
TotalClaimRunTime = 87210
MonitorSelfTime = 1254006634
MonitorSelfCPUUsage = 0.204175
MonitorSelfImageSize = 28584.000000
MonitorSelfResidentSetSize = 4820
MonitorSelfAge = 0
MonitorSelfRegisteredSocketCount = 10
DaemonStartTime = 1253850620
UpdateSequenceNumber = 543
MyAddress = "<10.14.2.158:46819>"
LastHeardFrom = 1254006647
UpdatesTotal = 533
UpdatesSequenced = 533
UpdatesLost = 6
UpdatesHistory = "0x00000000000000000000000000000000"
[root@ldas-pcdev1 ~]# condor_q -long 20779335.0
-- Submitter: ldas-pcdev1.ligo.caltech.edu : <10.14.0.18:55622> : ldas-
pcdev1.ligo.caltech.edu
MyType = "Job"
TargetType = "Machine"
ClusterId = 20779335
QDate = 1253919419
CompletionDate = 0
Owner = "acsearle"
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
ExitStatus = 0
NumCkpts_RAW = 0
NumCkpts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
CumulativeSuspensionTime = 0
ExitBySignal = FALSE
Notification = ERROR
WantBadgers = TRUE
JOB_LEASE_DURATION = 3600
copy_to_spool = TRUE
CondorVersion = "$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $"
CondorPlatform = "$CondorPlatform: X86_64-LINUX_RHEL5 $"
RootDir = "/"
Iwd = "/mnt/qfs3/acsearle/omega/analyses/prc/JW1/H1L1V1/
JW1_SGQ9/+0+0+0/3.350/."
JobUniverse = 5
Cmd = "/archive/home/acsearle/opt/omega/adaptive/bin/wpipeline"
MinHosts = 1
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
RequestCpus = 1
JobPrio = 0
User = "acsearle@ligo"
NiceUser = FALSE
Environment = "_CONDOR_ANCESTOR_19859=19860:1253853850:637169728
SHLIB_PATH=/opt/ldg-4.8/globus/lib LSCSOFTCVS=:pserver:acsearle__AT__gravity.phys.uwm.edu
:2402/usr/local/cvs/lscsoft GLOBUS_OPTIONS=-Xms256M' '-Xmx1024M
PAC_ANCHOR=/opt/ldg-4.8 SHLVL=4 PWD=/archive/home/acsearle/omega/
analyses/prc/JW1/H1L1V1/JW1_SGQ9/+0+0+0/3.350
LSC_DATAGRID_SERVER_LOCATION=/opt/ldg VDT_LOCATION=/opt/ldg-4.8
SSH_CLIENT=10.14.0.28' '55922' '22
_CONDOR_ANCESTOR_19860=14284:1253919379:2685419256 CVS_RSH=ssh
VDT_POSTINSTALL_README=/opt/ldg-4.8/post-install/README PATH=/archive/
home/omega/opt/omega/bin:/archive/home/acsearle/local/bin:/opt/lscsoft/
lalapps/bin:/opt/lscsoft/lalstochastic/bin:/opt/lscsoft/lal/bin:/opt/
lscsoft/glue/bin:/opt/lscsoft/libframe/bin:/opt/lscsoft/libmetaio/bin:/
opt/lscsoft/framecpp/bin:/opt/lscsoft/dol/bin:/opt/lscsoft/root/bin:/
opt/ldg-4.8/apache/bin:/opt/ldg-4.8/pyglobus-url-copy/bin:/opt/ldg-4.8/
unixodbc/bin:/opt/ldg-4.8/edg/sbin:/opt/ldg-4.8/glite/sbin:/opt/
ldg-4.8/glite/bin:/usr/sbin:/usr/bin:/opt/ldg-4.8/ant/bin:/opt/ldg-4.8/
jdk1.5/bin:/opt/ldg-4.8/mysql5/bin:/opt/ldg-4.8/wget/bin:/opt/ldg-4.8/
logrotate/sbin:/opt/ldg-4.8/gpt/sbin:/opt/ldg-4.8/globus/bin:/opt/
ldg-4.8/globus/sbin:/opt/pacman-3.28/bin:/opt/ldg-4.8/vdt/sbin:/opt/
ldg-4.8/vdt/bin:/opt/ldg-4.8/ldg-server/bin:/usr/kerberos/bin:/usr/
bin:/bin:/usr/sbin:/sbin:/ligotools/bin:/ldcg/matlab_r2008a/bin
LALSTOCHASTIC_LOCATION=/opt/lscsoft/lalstochastic GLITE_LOCATION_LOG=/
opt/ldg-4.8/glite/log VDT_INSTALL_LOG=vdt-install.log
S6_SEGMENT_SERVER=ldbdi://10.14.20.73 ROOTSYS=/opt/lscsoft/root
VDTSETUP_CONDOR_LOCATION=/usr DYLD_LIBRARY_PATH=/opt/lscsoft/
lalstochastic/lib64:/opt/lscsoft/lal/lib64:/opt/lscsoft/glue/lib64/
python2.4/site-packages:/opt/lscsoft/framecpp/lib64:/opt/ldg-4.8/
globus/lib GLOBUS_PATH=/opt/ldg-4.8/globus X509_CERT_DIR=/opt/ldg-4.8/
globus/TRUSTED_CA GLOBUS_TCP_PORT_RANGE=40000,45000 LAL_PREFIX=/opt/
lscsoft/lal LIGO_DATAFIND_SERVER=10.14.20.73:80 VOMS_USERCONF=/opt/
ldg-4.8/glite/etc/vomses ONLINEDQ=/online/DQ LDG_SOFTWARE_LOCATION=http://www.ldas-sw.ligo.caltech.edu/ldg_dist/ldg4.8/software
INPUTRC=/etc/inputrc ONLINEHOFT=/online/frames/hoft LSCSOFT_PREFIX=/
opt/lscsoft ROOT_LOCATION=/opt/lscsoft/root PKG_CONFIG_PATH=/opt/
lscsoft/lalstochastic/lib64/pkgconfig:/opt/lscsoft/lal/lib64/
pkgconfig:/opt/lscsoft/libframe/lib64/pkgconfig:/opt/lscsoft/libmetaio/
lib64/pkgconfig:/opt/lscsoft/framecpp/lib64/pkgconfig:/opt/lscsoft/dol/
lib64/pkgconfig:/opt/lscsoft/root/lib64/pkgconfig: GLITE_LOCATION_TMP=/
opt/ldg-4.8/glite/tmp LIGOTOOLS=/ligotools SSH_TTY=/dev/pts/0
GLITE_LOCATION=/opt/ldg-4.8/glite GLITE_LOCATION_VAR=/opt/ldg-4.8/
glite/var LIBPATH=/opt/ldg-4.8/globus/lib:/usr/lib:/lib SHELL=/bin/
bash LDG_INSTALL_LOG=/opt/ldg-4.8/ldg-server/etc/ldg-install.log
FRAMECPP_PREFIX=/opt/lscsoft/framecpp LDG_DIRECTORY=/opt/ldg-4.8/ldg-
server MAIL=/var/spool/mail/acsearle VDTSETUP_CONDOR_CONFIG=/usr1/
condor/condor_config VOMS_LOCATION=/opt/ldg-4.8/glite
_CONDOR_MAX_DAGMAN_LOG=0 MANPATH=/opt/lscsoft/lalapps/share/man:/opt/
lscsoft/lalstochastic/share/man:/opt/lscsoft/lal/share/man:/opt/
lscsoft/libframe/man:/opt/lscsoft/libmetaio/man:/opt/lscsoft/framecpp/
share/man:/opt/lscsoft/root/man:/usr/man:/opt/ldg-4.8/globus/man::/opt/
ldg-4.8/vdt/man:/opt/ldg-4.8/perl/man:/opt/ldg-4.8/expat/man:/opt/
ldg-4.8/logrotate/man:/opt/ldg-4.8/wget/man:/opt/ldg-4.8/mysql5/man:/
opt/ldg-4.8/jdk1.5/man:/opt/ldg-4.8/glite/share/man:/opt/ldg-4.8/edg/
share/man:/opt/ldg-4.8/apache/man GLUE_PREFIX=/opt/lscsoft/glue
MYSQL_UNIX_PORT=/opt/ldg-4.8/vdt-app-data/mysql5/var/mysql.sock
LALSTOCHASTIC_PREFIX=/opt/lscsoft/lalstochastic PERL5LIB=/opt/ldg-4.8/
ldg-server/lib:/opt/ldg-4.8/perl/lib/5.8.0:/opt/ldg-4.8/perl/lib/5.8.0/
x86_64-linux-thread-multi:/opt/ldg-4.8/perl/lib/site_perl/5.8.0:/opt/
ldg-4.8/perl/lib/site_perl/5.8.0/x86_64-linux-thread-multi:/opt/
ldg-4.8/vdt/lib: GLUE_LOCATION=/opt/lscsoft/glue ANT_HOME=/opt/ldg-4.8/
ant USER=acsearle SSH_CONNECTION=10.14.0.28' '55922' '10.14.0.18' '22
DOL_LOCATION=/opt/lscsoft/dol LD_LIBRARY_PATH=/archive/home/acsearle/
opt/sqlite/lib:/archive/home/acsearle/opt/lscsoft/lal/lib:/opt/lscsoft/
lalstochastic/lib64:/opt/lscsoft/lal/lib64:/opt/lscsoft/glue/lib64/
python2.4/site-packages:/opt/lscsoft/libframe/lib64:/opt/lscsoft/
libmetaio/lib64:/opt/lscsoft/framecpp/lib64:/opt/lscsoft/dol/lib64:/
opt/lscsoft/root/lib64:/opt/ldg-4.8/tclglobus/lib:/opt/ldg-4.8/apache/
lib:/opt/ldg-4.8/myodbc/lib:/opt/ldg-4.8/unixodbc/lib:/opt/ldg-4.8/
glite/lib64:/opt/ldg-4.8/glite/lib:/opt/ldg-4.8/jdk1.5/jre/lib/i386:/
opt/ldg-4.8/jdk1.5/jre/lib/i386/server:/opt/ldg-4.8/jdk1.5/jre/lib/
i386/client:/opt/ldg-4.8/mysql5/lib/mysql:/opt/ldg-4.8/berkeley-db/
lib:/opt/ldg-4.8/expat/lib:/opt/ldg-4.8/globus/lib:/opt/ldg-4.8/globus/
lib:/opt/ldg-4.8/globus/lib:/ligotools/lib HOSTNAME=ldas-pcdev1
PYTHONPATH=/opt/lscsoft/lalapps/lib64/python2.4/site-packages:/opt/
lscsoft/lalapps/lib/python2.4/site-packages:/opt/lscsoft/glue/lib64/
python2.4/site-packages:/opt/lscsoft/glue/lib/python2.4/site-packages:/
opt/lscsoft/libframe/lib64/python:/opt/lscsoft/libmetaio/lib64/python:/
opt/ldg-4.8/globus/lib64/python: X509_CADIR=/opt/ldg-4.8/globus/
TRUSTED_CA CATALINA_OPTS=-Dorg.globus.wsrf.container.persistence.dir=/
opt/ldg-4.8/vdt-app-data/globus/persisted ODBCINI=/opt/ldg-4.8/
unixodbc/etc/odbc.ini LAL_LOCATION=/opt/lscsoft/lalapps HOME=/archive/
home/acsearle _CONDOR_DAGMAN_CONFIG_FILE=/mnt/qfs3/acsearle/omega/
analyses/prc/JW1/H1L1V1/JW1_SGQ9/+0+0+0/3.350/dagman.config
LOGNAME=acsearle EDG_LOCATION=/opt/ldg-4.8/edg GPT_LOCATION=/opt/
ldg-4.8/gpt MATLABPATH=/ligotools/matlab _=/usr/bin/condor_submit
GLOBUS_ERROR_VERBOSE=true JAVA_HOME=/opt/ldg-4.8/jdk1.5
CONDOR_ID=20779215.0 G_BROKEN_FILENAMES=1 FRAMECPP_LOCATION=/opt/
lscsoft/framecpp GLOBUS_LOCATION=/opt/ldg-4.8/globus LANG=C
CONDOR_PARENT_ID=ldas-pcdev1:14284:1253919379 CONDOR_CONFIG=/usr1/
condor/condor_config HISTSIZE=1000 LSC_SEGFIND_SERVER=ldas-
cit.ligo.caltech.edu MATLAB=/ldcg/matlab_r2008a GLOBUS_MYSQL_PATH=/opt/
ldg-4.8/mysql5 PACMAN_LOCATION=/opt/pacman-3.28 CONDOR_LOCATION=/usr
X509_VOMS_DIR=/opt/ldg-4.8/glite/vomsdir TERM=xterm-color
LDG_LOCATION=/opt/ldg-4.8
CVSROOT=:pserver:acsearle__AT__gravity.phys.uwm.edu:2402/usr/local/cvs/
lscsoft LSCSOFTPREFIX=/archive/home/acsearle/opt/lscsoft LESSOPEN=|/
usr/bin/lesspipe.sh' '%s X509_USER_PROXY=/tmp/x509up_p13922.fileo94RhR.
1 _CONDOR_DAGMAN_LOG=/mnt/qfs3/acsearle/omega/analyses/prc/JW1/H1L1V1/
JW1_SGQ9/+0+0+0/3.350/analysis.dag.dagman.out LSC_DATAFIND_SERVER=ldas-
cit.ligo.caltech.edu"
JobNotification = 0
WantRemoteIO = FALSE
UserLog = "/usr1/acsearle/JW1/H1L1V1/JW1_SGQ9/+0+0+0/3.350/
analysis.dag.log"
CoreSize = 0
KillSig = "SIGTERM"
Rank = 0.000000
In = "/dev/null"
StreamIn = FALSE
Out = "./output/870144795-870148395/stdout.txt"
StreamOut = FALSE
Err = "./output/870144795-870148395/stderr.txt"
StreamErr = FALSE
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "NO"
TransferFiles = "NEVER"
ExecutableSize_RAW = 9
ExecutableSize = 10
DiskUsage_RAW = 9
DiskUsage = 10
RequestMemory = ceiling(ImageSize / 1024.000000)
RequestDisk = DiskUsage
Requirements = (Memory >= 2048) && (Arch == "X86_64") && (OpSys ==
"LINUX") && (Disk >= DiskUsage) && (TARGET.FileSystemDomain ==
MY.FileSystemDomain)
FileSystemDomain = "ligo"
JobLeaseDuration = 3600
PeriodicHold = FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
OnExitHold = FALSE
OnExitRemove = TRUE
LeaveJobInQueue = FALSE
Args = "search -p parameters.txt -f framecache.txt -o ./output/
870144795-870148395 -t/usr1/acsearle/JW1/H1L1V1/JW1_SGQ9/+0+0+0/3.350
870144795 870148395"
DAGNodeName = "870144795-870148395"
DAGParentNodeNames = ""
DAGManJobId = 20779215
GlobalJobId = "ldas-pcdev1.ligo.caltech.edu#20779335.0#1253919422"
ProcId = 0
AutoClusterId = 8
AutoClusterAttrs =
"JobUniverse
,LastCheckpointPlatform
,NumCkpts
,JobStart
,DiskUsage,FileSystemDomain,Requirements,NiceUser,ConcurrencyLimits"
JobStartDate = 1253919438
ImageSize_RAW = 4771600
ImageSize = 5000000
LastHoldReason = "The system macro SYSTEM_PERIODIC_HOLD expression
'((NumShadowExceptions > 20) || (NumShadowStarts > 100))' evaluated to
TRUE"
LastHoldReasonCode = 3
LastHoldReasonSubCode = 0
LastRejMatchReason = "no match found"
LastRejMatchTime = 1254006151
WantMatchDiagnostics = TRUE
LastMatchTime = 1254006151
NumJobMatches = 93
OrigMaxHosts = 1
StartdPrincipal = "10.14.2.92"
JobLastStartDate = 1254006124
JobCurrentStartDate = 1254006151
NumShadowStarts = 6
JobRunCount = 6
NumJobStarts = 93
LastJobLeaseRenewal = 1254006154
RemoteSysCpu = 0.000000
RemoteUserCpu = 1.000000
LastVacateTime = 1254006154
BytesSent = 0.000000
BytesRecvd = 0.000000
RemoteWallClockTime = 34781.000000
LastRemoteHost = "slot6__AT__node342.ldas-cit.ligo.caltech.edu"
LastPublicClaimId = "<10.14.2.92:46549>#1253850619#2039#..."
LastPublicClaimIds = ""
CurrentHosts = 0
LastSuspensionTime = 0
MaxHosts = 1
HoldReasonCode = 1
JobStatus = 5
HoldReason = "via condor_hold (by user condor)"
EnteredCurrentStatus = 1254006173
LastReleaseReason = "via condor_release (by user condor)"
ServerTime = 1254006958
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date of creation: Sat Sep 26 18:21:58 2009 (1254007322)
Subject: Actions
Assigned to psilord by psilord
===========================================================================
Date of actions: Mon Sep 28 10:10:33 2009 (1254150633)
Date: Mon, 28 Sep 2009 10:18:50 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: psilord <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #7840] LIGO: matchmaking confusion
Hello,
> From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
>
> I am confused about why certain matches are being made for jobs in the
> LIGO CIT condor pool running,
I looked at the Start attribute in the machine ads you sent me and they
were:
Start = TRUE
which explains why they are running the jobs.
A 'condor_config_val -v start' will tell you where the errant policy is defined.
Thank you.
-pete
===========================================================================
Date mail was appended: Mon Sep 28 10:18:56 2009 (1254151136)
Date: Wed, 7 Oct 2009 00:36:04 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: psilord <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #7840] LIGO: matchmaking confusion
Hello,
Did you get my response to this ticket I had sent some days ago?
Thank you.
-pete
===========================================================================
Date mail was appended: Wed Oct 7 0:36:11 2009 (1254893771)
CC: jabadie__AT__ligo.caltech.edu
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-support__AT__cs.wisc.edu
Subject: Re: [condor-support #7840] LIGO: matchmaking confusion
Date: Tue, 6 Oct 2009 22:54:52 -0700
X-Seen-BY: mailfromd 4.1 silica.cs.wisc.edu
The specific job in this ticket is probably doing the right thing
because of the Requirements statement that did not have our normal
expression of "&& ((Memory * 1024) >= ImageSize)". I will follow up
with the user to see if/how they did that, but for now please go ahead
and close this ticket.
Thanks.
On Oct 6, 2009, at 10:36 PM, condor-support response tracking system
wrote:
> Hello,
>
> Did you get my response to this ticket I had sent some days ago?
>
> Thank you.
>
> -pete
>
>
> ========================================
> MESSAGE INFORMATION
> ========================================
> * From: Peter Keller <psilord__AT__cs.wisc.edu>
> * Ticket Email List: anderson__AT__ligo.caltech.edu, jabadie__AT__ligo.caltech.edu
>
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date mail was appended: Wed Oct 7 0:55:09 2009 (1254894909)
Date: Wed, 7 Oct 2009 01:00:19 -0500
From: Peter Keller <psilord__AT__cs.wisc.edu>
To: condor-support response tracking system <condor-support__AT__cs.wisc.edu>
Subject: Re: [condor-support #7840] LIGO: matchmaking confusion
On Wed, Oct 07, 2009 at 12:55:09AM -0500, condor-support response tracking system wrote:
> The specific job in this ticket is probably doing the right thing
> because of the Requirements statement that did not have our normal
> expression of "&& ((Memory * 1024) >= ImageSize)". I will follow up
> with the user to see if/how they did that, but for now please go ahead
> and close this ticket.
Ok, I'll close it, but realize that the particular machine mentioned
will start any job because the start expression is true....
Thank you.
-pete
===========================================================================
Date mail was appended: Wed Oct 7 1:00:22 2009 (1254895222)
Subject: Actions
Ticket resolved by psilord
===========================================================================
Date of actions: Fri Oct 9 10:13:18 2009 (1255101198)