Next: 7.3 Development Release Series
Up: 7. Condor Version History
Previous: 7.1 Introduction to Condor
Subsections
7.2 Stable Release Series 6.2
This is the second stable release series of Condor.
All of the new features developed in the 6.1 series are now considered
stable, supported features of Condor.
New releases of 6.2.0 should happen infrequently and will only include
bug fixes and support for new platforms.
New features will be added and tested in the 6.3 development series.
The details of each version are described below.
7.2.1 Version 6.2.2
New Features:
- Condor administrators can now specify
CONDOR_SUPPORT_EMAIL in their config file.
This address is included at the bottom of all email Condor sends out.
Previously, the CONDOR_ADMIN setting was used for this, but
at many sites, the address where the Condor daemons should send email
about administrative problems is not the same address that users
should use for technical support.
If your site has different addresses for these two things, you can now
specify them properly.
Bugs Fixed:
- The FILESYSTEM_DOMAIN and UID_DOMAIN settings were
not being automatically append to the requirements of a Vanilla-universe job,
possibly causing it to run on machines where it will not successfully run. This
has been fixed.
- The getdirentries call was not being trapped, causing a few applications
to fail running inside of Condor.
- When the NT shadow and the Unix shadow were used in conjunction with each
other during the submission of heterogeneous jobs, they conflicted in where
they should store their Condor internal files. This would cause extremely
long hangs where the job was listed as running but no job actually started.
This has been fixed and now you can mix the NT shadow and Unix shadow
together just fine during heterogeneous submits.
- Numerous additions/clarifications to sections of the manual to bring
the manual up to date with the source base.
- Fixed a bug where if you set MAX_JOBS_RUNNING to zero in the
config files, the schedd would fail with a floating point error.
- PVM support for Solaris 2.8 was mistakenly turned off in 6.2.1,
it has been turned back on again.
- Added exit status of Condor tools and daemons to the manual.
- Fixed a bug in the schedd where it would segfault if it manipulated
class ads in a certain way.
- Fixed stat() and lstat() to ask the shadow where the
file was that it was trying to locate instead of assuming it was always going
to be remote I/O.
- Fixed a bug where Condor would incorrectly inform you that it
didn't have read/write permission on a file located in NFS when you
actually did have permission to read/write it.
- Removed mention of condor_master_off since it was deprecated a
long time ago and is not available anymore.
- Removed mention of condor_reconfig_schedd since it was deprecated a
long time ago and is not available anymore.
- Fixed a bug where the schedd would occasionally EXCEPT with an
``ATTR_JOB_STATUS not found'' error soon after a condor_rm
command had been invoked.
- Fixed a bug in the Negotiator and the Schedd where it would segfault
during reading of a corrupted log file.
- Fixed a bug where we used to EXCEPT if we couldn't email the
administrator of the pool. We now do not EXCEPT anymore.
- In the condor_startd, messages were being printed into the log
file about load average and idle time computations whenever
D_FULLDEBUG was included in the STARTD_DEBUG config
file setting.
Now, to see the periodic messages printing out the current load
averages, you must include D_LOAD in STARTD_DEBUG , and
to see the messages about idle time, you must include D_IDLE.
Known Bugs:
7.2.2 Version 6.2.1
New Features:
- The condor_userlog command is now available on Windows NT.
- Jobs run in stand-alone checkpointing mode can now take a -_condor_nowarn
argument, which silences the warnings from the system call library when you
perform a checkpoint-unsafe action, such as opening a file for reading and
writing.
Bugs Fixed:
- When using heterogeneous specifications of an executable between
NT and UNIX, Condor could get confused if the vanilla job had run on
an NT machine, vacated without being completed, and then restarted as
a standard universe job on UNIX. The job would be labeled as running in
the queue, but not perform any work. This has been fixed.
- The entries in the environment
option in a submit file now correctly override the variables brought in
from the getenv option on Windows NT.
In previous version of CondorNT, the job would get an environment with the
variable defined multiple times. This bug did not affect UNIX versions of
Condor.
- Some service packs of Windows NT had bugs that prevented Condor
from determining the file permissions on input and output files. 6.2.1
uses a different set of API's to determine the permissions and works
properly across all service packs
- In versions of Condor previous to 6.2.0, the registry would slowly
grow on Windows NT and sometimes become corrupted. This was fixed in 6.2.0,
but if a previously-corrupted registry was detected Condor aborted. In 6.2.1,
this has been turned into a warning, as it doesn't need to be a fatal error.
- Fixed a memory-corruption bug in the condor_collector
- PVM resources in Condor were unable to have more than one
@ symbol in a name.
- The TRANSFER_FILES is now set to ON_EXIT
on UNIX by default for the vanilla universe. Previously, users submitting
from UNIX to NT needed to explicitly enable it or include the executable in
the list of input files for the job to run.
- If TRANSFER_FILES was set to TRUE
files created during the job's run would be transfered whenever the job was
vacated and transfered to the next machine the job ran on, but would not be
transfered back to the submit machine when the job finally exited for the last time.
- Determining the current working directory was broken in stand-alone
checkpointing.
- A job's standard output and standard error can now go to the same file.
- When the START_HAS_BAD_UTMP is set to TRUE, the
condor_startd now detects activity on the
/dev/pts
devices.
- The condor_negotiator in 6.2.0 could incorrectly reject a job
that should have been successfully matched if it previously rejected a
job. If the same jobs were sent to the condor_negotiator in a different
order, the match that should succeed would. In 6.2.1, the order is no longer
important, and previous rejections will not prevent future matches.
- The getdents, getdirents, and statfs system calls now work correctly in
cross-platform submissions.
- condor_compile is better able to detect which version of Linux
it is running on and which flags it should pass to the linker. This should
help Condor users on non-Red Hat distributions.
- Fixed a bug in the condor_startd that would cause the daemon
to crash if you set the POLLING_INTERVAL macro to a value
greater than 60.
- In condor_q, dash-arguments (e.g., -pool, -run, etc.) were being
parsed incorrectly such that the same arguments specified without a
dash would be interpreted as if the dash were present, making it
impossible to specify ``pool'' or ``globus'' or ``run'' as an owner
argument.
- Fixed bug in condor_submit that would cause certain submit
file directives to be silently ignored if you used the wrong attribute
name.
Now, all submit file attributes can use the same names you see in the
job ClassAd (what you'd see with condor_q -long.
For example, you can now use ``CoreSize = 0'' or ``core_size = 0''
in your submit file, and either one would be recognized.
- A static limit on the number of clusters the condor_schedd
would accept from the condor_negotiator was removed.
- On Windows NT, if a job's log file was in a non-existent location,
both the condor_submit and the condor_schedd would crash.
- Encounting unsupported system calls could cause Condor to corrupt the
signal state of the job.
- Fixed some of the error messages in condor_submit so that they
are all consistently formatted.
- Fixed a bug in the Linux standard universe where calloc(2)
would not return zero filled memory.
- condor_rm, condor_hold and condor_release will now return
a non-zero exit status on failure, and only return 0 on success.
Previously, they always returned status 0.
- If a user accidentally put
notify_user = false
in their submit file, Condor used to treat that
as a valid entry.
Now, condor_submit prints out a warning in this case, telling the
user that they probably want to use
notification = never
instead.
Known Bugs:
- It may be possible to checkpoint with an open socket on IRIX 6.2.
On restart, the job will abort and go back into the queue.
7.2.2.1 Version 6.2.0
New Features Over the 6.0 Release Series
- Support for running multiple jobs on SMP (Symmetric Multi-Processor)
machines.
New Features Over the Last Development Series: 6.1.17
- If CkptArch isn't specified in the job submission file's
Requirements attribute, then automatically add this expression:
CkptRequirements = ((CkptArch == Arch) || (CkptArch =?= UNDEFINED)) &&
((CkptOpSys == OpSys) || (CkptOpSys =?= UNDEFINED))
to the Requirements expression. This allows for users who specify
a heterogeneous submission to not have to worry about having their checkpoints
incorrectly starting up on architectures for which they were not designed
to run.
- The APPEND_REQ_<universe> config file entries now get
appended to the beginning of the expressions before Condor adds internal
default expressions. This allows the sysadmin to override any default
policy that Condor enforces.
- There is now a single APPEND_REQUIREMENTS attribute
that will get appended to all universe's Requirements
expressions unless a specific APPEND_REQ_STANDARD or
APPEND_REQ_VANILLA expression is defined.
- Increased certain networking parameters to help alleviate the
condor_shadow's inability to contact the condor_schedd during heavy load
of the system.
- Added a condor_glidein man page to the manual.
- Some of the log messages in the condor_startd were modified to
be more clear and to provide more information.
- Added a new attribute to the condor_startd ClassAd when the
machine is claimed, RemoteOwner.
Bugs fixed since 6.1.17
- On NT, the Registry would increase in size while Condor was
servicing jobs. This has been fixed.
- Added utmpx support for Solaris 2.8 to fix a problem where
KeyBoardIdle wasn't being set correctly.
- When doing a condor_hold under NT, the job was removed instead of
held. This has been fixed.
- When using the -master argument tocondor_restart, the
condor_master used to exit instead of restarting.
Now, the condor_master correctly restarts itself in this case.
Known Bugs:
- STARTD_HAS_BAD_UTMP does not work if set to True on Solaris
2.8. However, since utmpx support is enabled, you shouldn't
need to do this normally.
- condor_kbdd doesn't work properly under Compaq Tru64 5.1, and
as a result, resources may not leave the ``Unclaimed'' state
regardless of keyboard or pty activity. Compaq Tru64 5.0a and earlier
do work properly.
Next: 7.3 Development Release Series
Up: 7. Condor Version History
Previous: 7.1 Introduction to Condor
condor-admin@cs.wisc.edu