Hawkeye v1.0 Release Notes 1. Hawkeye Installation instructions 1.1 By Hand Enclosed in this tarball there is a Hawkeye tarball. To install this, you can create a 'hawkeye' user, or run as a normal user. Do the following: $ mkdir /home/hawkeye $ cd /home/hawkeye -or- $ mkdir ~/hawkeye $ cd ~/hawkeye Then: $ tar xfz hawkeye-linux-0.1.tar.gz $ mkdir log execute spool In this directory, you'll find the following created: bin sbin etc install modules hawkeye_install 1.2 Using hawkeye_install Run the ./hawkeye_install perl script. You can specify the target directory on the command line; if not, it will prompt you for the directory. hawkeye_install will ask you questions pertaining to the hostname of this computer, and the hostname of the computer that you wish to run as your Hawkeye "Central Manager". Like Condor, all Hawkeye installations need one Central Manager. Data collection modules are no longer shipped with Hawkeye; they are now packaged separately, and can be found on the Hawkeye web page. 2. Module installation Prepackaged Hawkeye modules are available through the Hawkeye web page. These may be installed with the hawkeye_install_module script, which is packaged with Hawkeye. These are packaged as gzipped "tar balls". These are named, by convention, "name-version.tar.gz", so version 1.0 of module "who" would be named "who-v1.0.tar.gz". A single module package can contain one or more modules. The "all" module package contains all currently availble modules. In the simplest case, a module package may be installed with a single command (in this example, the "who" module package is installed: $ hawkeye_install_module who-1.0.tar.gz With no other command line parameters, the "who" module package will be installed without asking any questions. The module package installer also takes the following options: -a : Ask before installing each module. By default, the module package installer will install all modules speicifed on the command line. -c=file : Specify Hawkeye config file. Normally, the module package installer will try to find the Hawkeye configuration file itself. If this fails, or if you wish to override this, specify the Hawkeye configureation file with this option. -d=dir : Specify modules directory. Normally, the module package installer will install the modules in the directory specified by the MODULES parameter in the Hawkeye ocnfiguration file. If you wish to override this behaviour, specify the target directory with this option. -h : Dump help -m : Manual mode; asks lots of questions. By default, the module package installer will use default values for the module parameters. Specifying this option will cause the installer to ask you about each of these. This option implies "-a". -n : Don't update the config file. Normally, the module package installer will automatically update your Hawkeye configuration file. If you specify this option, it won't update your config. 2. Hawkeye Configuration File The hawkeye_config file should be editted to suit your needs. You should have only one collector running in your pool. Because Hawkeye is based on Condor, it's configuarion language is the same as Condor's. Read the Condor manual for general configuration information. Note that there's no negotiator or schedd in Hawkeye. The bottom of the config file contains a section like: MODULES = $(HAWKEYE)/modules STARTD_CRON_NAME = Hawkeye HAWKEYE_JOBS = MODULES defines the directory where the Hawkeye modules are located. STARTD_CRON_NAME defines the "logical name" by which the modules can be referenced. This is named as it is because it is the "cron" subsytem of the "startd" daemon. This setting is used to define the parameter which contains the "job list". Because in the above example, we set STARTD_CRON_NAME = Hawkeye, we now list our jobs with HAWKEYE_JOBS. HAWKEYE_JOBS define the collection modules. At install time, no modules are configured to run. Entries are added to it by the Hawkeye module package installer. The module package installer adds modules as a block in the configuration file. Below is an example block configuring the the "w" and "vmstat" modules. ## ## Configuration for Module w ## Monitors the users, uptime, and load averages of the system ## (based on the "w" program) HAWKEYE_JOBS = $(HAWKEYE_JOBS) w:w_:$(MODULES)/w:5m: ## Parameters for module w: ## ## Configuration for Module vmstat ## Monitors the status of the Virtual Memory (VM) subsystem You may ## want set the vmstat parameters by hand HAWKEYE_JOBS = $(HAWKEYE_JOBS) vmstat:vmstat_:$(MODULES)/vmstat:1s:continuous ## Parameters for module vmstat: ## HAWKEYE_VMSTAT_PERIOD ## Sets the frequency with which vmstat collects data (i.e. 30s, 5m) ## Default: 30s HAWKEYE_VMSTAT_PERIOD = 30s In the above example 2 are defined (w and vmstat). Each job entry is of the form: name:prefix:executable:period:{options} name - Defines the logical name of the module. These must be unique. prefix - This is prepended to the attributes that the module creates for it's ClassAd. For example, if the prefix is "xxx_", and the attribute is abc = "xyzzy", the attribute published to the ClassAd will be "xxx_abc = xyzzy". These should be unique, as well. executable - This specifies the executable to run the module itself. period - This defines the frequency with which Hawkeye launches the module. If an earlier instance of the module is already running, Hawkeye's behavior is defined by the nokill option (see below). In "continuous mode", in which the module runs constantly and outputs data periodically (see below and look at top for an example), the period instead defines the delay between when the module exits and when Hawkeye restarts it. The modifiers "s", "m", and "h" for seconds, minutes, and hours (respectively) are recognized. Thus, 5m means run every five minutes. If no modifier is specified, seconds is assumed. A value of 0 is automatically reset to 1 in continuous mode, so that Hawkeye won't busy-loop if the job fails to execute for some reason. A value of 0 is marked as an error for other job modes, since it's an invalid period. options - Several options are currently defined: continuous - Specifies continuous mode. If a non-zero period is specified, then the next run will start period time after the previous run ends. kill / nokill - If, when it's time to run the module the next time (in non-continuous mode), the job is still running, "nokill" will cause the startd to not kill the job, and to let it continue running, instead. By default, or with a "kill" option, the job will be killed and the next run will start when the current run dies. The modules's command-line arguments, environment, and CWD can also be set via the following config parameters: Hawkeye__ARGS= The module's "logical name" is _always_ assigned to ARGV[1], and these parameters will be appended to the ARGV array. Hawkeye__ENV= This is used to specify environmennt variables passed to the module. If a specified variable is already in environment, this will override it. For example: Hawkeye__ENV=var1=string1:var2=string2 This will add the "var1" and "var2" variables to the module's environment. Hawkeye__CWD= This is used to set the initial working directory of the module. By default (if you don't specifiy this), the working directory is the same as that of the startd. Set the HAWKEYE_CONFIG environment variable to point at the config: $ export HAWKEYE_CONFIG=/home/hawkeye/hawkeye_config -or- $ export HAWKEYE_CONFIG=$HOME/hawkeye/hawkeye_config You may want to add hawkeye/bin and hawkeye/sbin to your path. Then, start the hawkeye master process: $ ./sbin/hawkeye_master Verify that it's running: $ ps axuww|grep hawkeye user 11525 0.0 0.2 3592 1460 ? S 22:55 0:00 ./sbin/hawkeye_master user 11526 0.0 0.3 4500 1780 ? S 22:55 0:00 hawkeye_startd -f Check the status: $ ./bin/hawkeye_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime myhost LINUX INTEL Owner Idle 0.000 511 0+00:01:36 Machines Owner Claimed Unclaimed Matched Preempting INTEL/LINUX 1 1 0 0 0 0 Total 1 1 0 0 0 0 To add more systems to your poll, set them up in a similar manner, setting their HAWKEYE_HOST to the master machine. Make sure that you don't start a collector on these machines; while it won't cause any harm, it can lead to confusion. 4. Current modules Currently the following modules are available: df - Used to monitor free disk space (based on the "df" program) Memory - Monitors system memory usage Network Errors - Monitors the network status Open Files - Monitors the number of open file descriptors top - Monitors the top processes on the system, based on a number of attributes (based on the "top" program) uptime - Monitors the uptime and load averages of the system (based on the "uptime" program) w - Monitors the users, uptime, and load averages of the system (based on the "w" program) who - Monitors uses logged on the system (based on the "who" program) gdmphb - Monitors the status of GDMP for the US/CMS test bed Condor Node - Monitors an individual Condor node Condor Pool - Monitors the health of a Condor pool (Requires the separate Condor-Absent contrib Condor module which should be available soon). Grid Probe - Designed to make monitoring a Grid easier (Requires the separate Grid-Probe contrib module which should be available soon). It's the module being used to monitor the USCMS test bed. 5. Creating modules All current Hawkeye modules are either shell or perl scripts, although there is no reason that they need to be scripts. Modules are invoked with the ARGV array: ARGV[0] = ARGV[1] = ARGV[2..n] can be specified via the Hawkeye__ARGS parameter as discussed above. All scripts need to output one or more "attribute / value" pairs, one line at a time. At the end of each ClassAd, a line starting with a "-" should be generated. While optional for "periodic" modules, This line is required for continuous modules (to separate the ClassAds). In general, for periodic modules, it should be the last output: attr1 = value1 attr2 = value2 - For string (non-numeric) values, the string must be enclosed in double quotes: stringattr = "string" Modules that produce an array of like values should (by convention) also produce an INDEX attribute: list_1 = value1 list_2 = value2 list_3 = value3 INDEX = "1 2 3" - If multiple attributes are produced for each of these, then a (again, by convention), a FIELDS attribute should be produced: list_1_a = value1a list_1_b = "value1b" list_1_c = value1c list_2_a = value2a list_2_b = "value2b" list_2_c = value2c list_3_a = value3a list_3_b = "value3b" list_3_c = value3c INDEX = "1 2 3" FIELDS = "a b c" - The 'top' and 'df' modules produce these both, and are good (although complex) examples. It may be easier to look at their outputs than the code that actually produces it. list_1_a = value1a list_1_b = "value1b" list_1_c = value1c list_2_a = value2a list_2_b = "value2b" list_2_c = value2c list_3_a = value3a list_3_b = "value3b" list_3_c = value3c INDEX = "1 2 3" FIELDS = "a b c" - Note that the "prefix" is prepended to the start of each attribte, so if the prefix is "mymod_", the above will translate to the following startd classad attribtes: mymod_list_1_a = value1a mymod_list_1_b = "value1b" mymod_list_1_c = value1c mymod_list_2_a = value2a mymod_list_2_b = "value2b" mymod_list_2_c = value2c mymod_list_3_a = value3a mymod_list_3_b = "value3b" mymod_list_3_c = value3c mymod_INDEX = "1 2 3" mymod_FIELDS = "a b c" Note that the "-" is *not* inserted into the ClassAd. 5.1 Module Packages Module packages are a tarball of the module executable file(s) with a "Hawkeye module descriptor" file. For example, a tar directory of the "who" module reveals: $ tar tvfz vmstat-1.0-rc1.tar.gz -rw-r--r-- nleroy/nleroy 367 2003-02-24 12:28:31 vmstat.hawk -rwxr-xr-x nleroy/nleroy 4844 2002-12-05 09:39:55 vmstat The file "vmstat.hawk" is the "module descriptor" (which is read by the module package installer), and the file "vmstat" is the executable file (which is copied into the moduleds directory by the module package installer). Below is the descriptor for the vmstat module: ----------------FILE START-------------------------- # Data file describing the VMSTAT module ModuleFiles: vmstat Description: Monitors the status of the Virtual Memory (VM) subsystem You may want set the vmstat parameters by hand Default: vmstat options: continuous period: 1s prefix: vmstat_ # It's parameters parameter: HAWKEYE_VMSTAT_PERIOD = 30s Sets the frequency with which vmstat collects data (i.e. 30s, 5m) ----------------FILE END-------------------------- Below is a quick overview of the syntax of these files. ModuleFiles: name This specifies the list of files that need to be copied to the modules directory. The first of these is the executable which is specified in the config file. Description: After this line, specify free form text which describes your module. This is terminated by an empty line, or a line with one of these keywords. Default: name This is the default "logical name" by which the module will be installed in the config file. Options: option-list This specifies the default (space separated) list of options for the module in the config file. Period: period-spec This specifies the default period for the module; this uses the same syntax that is used in the config file. Prefix: prefix-string This specifies the default ClassAd prefix for the module; this uses the same syntax that is used in the config file. Parameter: parameter-name [= value] Text describing the module This specifies a parameter for the module's configuration. If the "= value" portion is specified, this specifies the default value for this parameter. The text following the parameter line is free form text describing the parameter. This is terminiated by an empty line, or a line with any one of these keywords.