In general, this release on NT works the same as the release of Condor for Unix.
However, the following items are not supported in this version:
for details on the
file transfer mechanism.
Except for those items listed above, most everything works the same way in Condor NT as it does in the Unix release. This release is based on the Condor Version 6.5.0source tree, and thus the feature set is the same as Condor Version 6.5.0for Unix. For instance, all of the following work in Condor NT:
This section provides some details on how Condor NT starts and stops jobs. This discussion is geared for the Condor administrator or advanced user who is already familiar with the material in the Administrators' Manual and wishes to know detailed information on what Condor NT does when starting and stopping jobs.
When Condor NT is about to start a job, the condor_ startd on the execute machine spawns a condor_ starter process. The condor_ starter then creates:
Next, the condor_ starter (called the starter) contacts the condor_ shadow (called the shadow) process, which is running on the submitting machine, and pulls over the job's executable and input files. These files are placed into the temporary working directory for the job. After all files have been received, the starter spawns the user's executable as user ``condor-run-dir_XXX'' with its current working directory set to the temporary working directory (that is, $(EXECUTE)/dir_XXX).
While the job is running, the starter closely monitors the CPU usage and image size of all processes started by the job. Every 20 minutes the starter sends this information, along with the total size of all files contained in the job's temporary working directory, to the shadow. The shadow then inserts this information into the job's ClassAd so that policy and scheduling expressions can make use of this dynamic information.
If the job exits of its own accord (that is, the job completes), the starter first terminates any processes started by the job which could still be around if the job did not clean up after itself. The starter examines the job's temporary working directory for any files which have been created or modified and sends these files back to the shadow running on the submit machine. The shadow places these files into the initialdir specified in the submit description file; if no initialdir was specified, the files go into the directory where the user invoked condor_ submit. Once all the output files are safely transferred back, the job is removed from the queue. If, however, the condor_ startd forcibly kills the job before all output files could be transferred, the job is not removed from the queue but instead switches back to the Idle state.
If the condor_ startd decides to vacate a job prematurely, the starter sends a WM_CLOSE message to the job. If the job spawned multiple child processes, the WM_CLOSE message is only sent to the parent process (that is, the one started by the starter). The WM_CLOSE message is the preferred way to terminate a process on Windows NT, since this method allows the job to cleanup and free any resources it may have allocated. When the job exits, the starter cleans up any processes left behind. At this point, if transfer_files is set to ONEXIT (the default) in the job's submit description file, the job switches from states, from Running to Idle, and no files are transferred back. If transfer_files is set to ALWAYS, then any files in the job's temporary working directory which were changed or modified are first sent back to the submitting machine. But this time, the shadow places these so-called intermediate files into a subdirectory created in the $(SPOOL) directory on the submitting machine ($(SPOOL) is specified in Condor's configuration file). The job is then switched back to the Idle state until Condor finds a different machine on which to run. When the job is started again, Condor places into the job's temporary working directory the executable and input files as before, plus any files stored in the submit machine's $(SPOOL) directory for that job.
NOTE: A Windows console process can intercept a WM_CLOSE message via the Win32 SetConsoleCtrlHandler() function if it needs to do special cleanup work at vacate time; a WM_CLOSE message generates a CTRL_CLOSE_EVENT. See SetConsoleCtrlHandler() in the Win32 documentation for more info.
NOTE: The default handler in Windows NT for a WM_CLOSE message is for the process to exit. Of course, the job could be coded to ignore it and not exit, but eventually the condor_ startd will get impatient and hard-kill the job (if that is the policy desired by the administrator).
Finally, after the job has left and any files transferred back, the starter deletes the temporary working directory, the temporary account, the WindowStation and the Desktop before exiting itself. If the starter should terminate abnormally, the condor_ startd attempts the clean up. If for some reason the condor_ startd should disappear as well (that is, if the entire machine was power-cycled hard), the condor_ startd will clean up when Condor is restarted.
On the execute machine, the user job is run using the access token of an
account dynamically created by Condor which has bare-bones access rights and
privileges. For instance, if your machines are configured so that only
Administrators have write access C:
WINNT, then certainly no
Condor job run on that machine would be able to write anything there. The
only files the job should be able to access on the execute machine are files
accessible by group Everybody and files in the job's temporary working
directory.
On the submit machine, Condor permits the File Transfer mechanism to only
read files which the submitting user has access to read, and only write
files to which the submitting user has access to write. For example, say
only Administrators can write to C:
WINNT on the submit machine,
and a user gives the following to condor_ submit :
executable = mytrojan.exe
initialdir = c:\winnt
output = explorer.exe
queue
Unless that user is in group Administrators, Condor will not permit
explorer.exe to be overwritten.
If for some reason the submitting user's account disappears between the time condor_ submit was run and when the job runs, Condor is not able to check and see if the now-defunct submitting user has read/write access to a given file. In this case, Condor will ensure that group ``Everyone'' has read or write access to any file the job subsequently tries to read or write. This is in consideration for some network setups, where the user account only exists for as long as the user is logged in.
Condor also provides protection to the job queue. It would be bad if the integrity of the job queue is compromised, because a malicious user could remove other user's jobs or even change what executable a user's job will run. To guard against this, in Condor's default configuration all connections to the condor_ schedd (the process which manages the job queue on a given machine) are authenticated using Windows NT's SSPI security layer. The user is then authenticated using the same challenge-response protocol that NT uses to authenticate users to Windows NT file servers. Once authenticated, the only users allowed to edit job entry in the queue are:
To protect the actual job queue files themselves, the Condor NT installation program will automatically set permissions on the entire Condor release directory so that only Administrators have write access.
Finally, Condor NT has all the IP/Host-based security mechanisms present
in the full-blown version of Condor. See section 3.7.4
starting on page
for complete information
on how to allow/deny access to Condor based upon machine host name or
IP address.
Unix machines and Windows NT machines running Condor can happily co-exist in the same Condor pool without any problems. Jobs submitted on Windows NT can run on Windows NT or Unix, and jobs submitted on Unix can run on Unix or Windows NT. Without any specification (using the requirements expression in the submit description file), the default behavior will be to require the execute machine to be of the same architecture and operating system as the submit machine.
There is absolutely no need to run more than one Condor central manager, even if you have both Unix and NT machines. The Condor central manager itself can run on either Unix or NT; there is no advantage to choosing one over the other. Here at University of Wisconsin-Madison, for instance, we have hundreds of Unix (Solaris, Linux, Irix, etc) and Windows NT machines in our Computer Science Department Condor pool. Our central manager is running on Windows NT. All is happy.