next up previous contents index
Next: 3.12 Setting Up for Up: 3. Administrators' Manual Previous: 3.10 The High Availability   Contents   Index

Subsections


3.11 Quill

The following section provides an overview of the installation, deployment and use of quill and also a description of various aspects of quill that are significant to deployment.


3.11.1 Theory of Operation

Quill consists of three components:

  1. The condor_ quill server which maintains the job queue and history tables in the database.

  2. A modified condor_ q tool which can be used to query the database tables.

  3. A modified condor_ history tool.

In this release, these two modified query tools will replace their traditional counterparts as in addition to querying the database, these modified tools allows users access to the old features of querying the schedd and the history file respectively.


3.11.2 Installing and Configuring Quill

Here is an overview of the steps needed to install and use quill and then a description of each step.

  1. Install postgres server and client libraries if they aren't already
  2. Configure postgres to suit quill
  3. Unpack and build the quill server and client tools
  4. Modify condor_ config with quill related options
  5. Invoke the quill daemon and start querying it
  6. Miscellaneous issues (database schema, security, etc.)

Here, each of the above steps are detailed.

  1. The Postgress Server

    Quill uses the postgres server as its backend and the postgres client library, libpq to talk to the server. While the 8.0 version of postgres will be included as part of condor externals, quill has also been tested with earlier versions of postgres (specifically 7.4).

    In addition, one can obtain the postgres source from:

    http://www.postgresql.org/ftp/source/

    Installation instructions are detailed in: http://www.postgresql.org/docs/8.0/static/installation.html

  2. Configuration of Postgress

    The following steps need to be taken after postgres is installed. These are done only once; quill takes care of all other database creation and maintenance tasks.

    1. The quill daemon and client tools connect to the database as users ``quillreader'' and ``quillwriter'' respectively. We're talking about database users and not operating system users (two completely different things). So if those users dont already exist, they need to be added using the 'createuser' command in the postgres bin directory. Moreover they need to be assigned appropriate passwords; these passwords will be used by the quill tools to connect to the database in a secure way. User ``quillreader'' should not be allowed to create more databases nor create more users. User ``quillwriter'' should also not be allowed to create more users however it should be allowed to create more databases. The following commands creates the two users with the appropriate permissions (be ready to enter the corresponding passwords when prompted):

      /path/to/postgres/bin/directory/createuser quillreader \
      	--no-createdb --no-adduser --pwprompt
      
      /path/to/postgres/bin/directory/createuser quillwriter \
      	--createdb --no-adduser --pwprompt
      

    2. Postgres should be configured to accept tcp/ip connections. In version 7, this was done by setting tcpip_socket=true in the postgresql.conf file. In version 8, this has changed. Look at the listen_addresses variable in the same file. Set yours appropriately, e.g. listen_addresses = '*' (here, '*' means any ip interface)

    3. Postgres needs to be configured to accept tcp/ip connections from certain hosts. This also enables remote connections. This is done in the pg_hba.conf file which usually resides in the postgres server's data directory. While the particular syntax and semantics for host based configuration can vary from site to site, basically one needs to allow access to any hosts that will access this database server, either by way of the quill daemon itself writing to the server, or by way of the condor_ q tool querying this server. For example, in order to give database users ``quillreader'' and ``quillwriter'' password-enabled access to all databases on current machine from any other machine in the network add the following:

      host all quillreader 128.105.0.0 255.255.0.0 password
      host all quillwriter 128.105.0.0 255.255.0.0 password

      Note that in addition to the database specified by QUILL_DB_NAME in the condor_config file, the quill daemon also needs access to the database 'template1'. This is because in order to create the former database in the first place, it needs to connect to the latter.

    Once the server is up and running and the client libraries are installed, we can now go ahead and install quill.

  3. Compiling and linking Quill

    Quill has been fully integrated into the condor build system. This means that condor_ quill appears as a directory under the top level src/ directory in the condor source and building the condor source will also automatically build quill and both of its query tools.

  4. Modifying condor_config

    Now that we have built and installed quill, its time to tweak the condor_config file to include quill related options.

    The following variables need to either be modified or added:

    The first one is DAEMON_LIST. Add QUILL to this list as shown below:

    DAEMON_LIST                     = MASTER, etc. etc.,  QUILL
    

    Add .quillwritepassword to the VALID_SPOOL_FILES variable, since we do not want condor_ preen to delete this file thinking it is junk:

    VALID_SPOOL_FILES	= job_queue.log, etc. etc., .quillwritepassword
    

    We need to tell it where it resides and what are its start-up arguments:

    QUILL                           = $(SBIN)/condor_quill
    QUILL_ARGS                      = -f
    

    Quill writes to its own log just as the other daemons. This log can be checked to see quill's run-time behavior and any malfunctions.

    QUILL_LOG       = $(LOG)/QuillLog
    

    The following options go in the daemon-specific (in this case, quill) section with appropriately modified values to suit the local environment:

    QUILL_ENABLED           = TRUE
    QUILL_NAME              = some-unique-quill-name.cs.wisc.edu
    QUILL_DB_NAME           = database-for-some-unique-quill-name
    QUILL_DB_IP_ADDR        = databaseipaddress:port
    # the following parameter's units is in seconds
    QUILL_POLLING_PERIOD    = 10
    # the following parameter's units is in hours
    QUILL_HISTORY_CLEANING_INTERVAL = 24
    # the following parameter's units is in days
    QUILL_HISTORY_DURATION 	= 30
    QUILL_IS_REMOTELY_QUERYABLE = TRUE
    QUILL_DB_QUERY_PASSWORD =  password-for-database-user-quillreader
    QUILL_ADDRESS_FILE      = $(LOG)/.quill_address
    

    Following is a description on each. Skip to the next section for a brief overview on how to query quill:

  5. Invoking the quill daemon and querying it.

    Once the condor_config file is updated with the above arguments, the quill daemon can be started by either restarting condor using condor_ restart or just starting it using condor_ master. All the daemons in the DAEMON_LIST variable, as updated above, are started and managed by the master accordingly.

    The condor_ quill daemon is responsible for maintaining a database mirror of the job_queue and history logs. One can query those two using condor_ q and condor_ history respectively. Both these two tools retain all their old functionality, i.e. condor_ q can be used to query the schedd and condor_ history can be used to query the history file

    Moreover, they retain all their old options plus some more thanks to database technology. For example, as before, we can query both using the job id (cluster.proc), owner, dags, io, cputime, etc. Orthogonally, just as how we could query remote schedds, we can also query remote quill databases for job queue and historical information. The latter is new functionality thanks to the remote querying functionality in Postgres.

    The -help option can be used to look at all the options supported by both tools.


3.11.3 Examples

  1. Query a remote quill daemon on regular.cs.wisc.edu for all the jobs in the queue
    	condor_q -name quill@regular.cs.wisc.edu
    	condor_q -name schedd@regular.cs.wisc.edu
    
    There are two ways to get to a quill daemon: directly using its name as specified in the QUILL_NAME variable in section 4) above, or indirectly by querying the schedd using its name. In the latter case, condor_ q will detect if that schedd is being serviced by a database, and if so, directly query it. In both cases, the ip address and port of the database server hosting the data of this particular remote quill daemon can be figured out by the QUILL_DB_IP_ADDR and QUILL_DB_NAME variables specified in the QUILL_AD sent by the quill daemon to the collector and in the SCHEDD_AD sent by the schedd.

  2. Query a remote quill daemon on regular.cs.wisc.edu for all historical jobs belonging to owner 'akini'.
    	condor_history -name quill@regular.cs.wisc.edu akini
    

  3. Query the local quill daemon for the average time spent in the queue for all non-completed jobs.
    	condor_q -avgqueuetime
    
    This is a new query. -avgqueuetime is defined as the average of (currenttime - jobsubmissiontime) over all jobs which are neither completed (JobStatus == 4) or removed (JobStatus == 3).

  4. Query the local quill daemon for all historical jobs completed since Apr 1, 2005 at 13h 00m.
    	condor_history -completedsince '04/01/2005 13:00'
    
    This is also a new query. It fetches all jobs which got into the 'Completed' state on or after the specified timestamp. We follow Postgres's date/time syntax rules as it encompasses most format options. See http://www.postgresql.org/docs/8.0/static/datatype-datetime.html#AEN4516 for the various timestamp formats.


3.11.4 Quill and Its RDBMS Schema

With only 7 tables and 2 views, quill uses a relatively simple database schema. These can be broadly divided into tables used to store job queue information and those used to store historical information.

The job queue part of the schema closely follows condor's classad data model, i.e. each row in these tables describe an <attribute,value> pair of the classad. Additionally, just as how condor distinguishes a ClusterAd from a ProcAd where the former stores attributes common to all jobs within a cluster whereas the latter stores attributes specific to each job, the schema also makes this distinction. Finally, numerical and string valued attributes are stored separately.

Thus we have four tables:

In addition to the <attribute, value>, each row contains the cluster-id (cid) and in the case of the ProcAd tables, also the proc-id (pid).

Since each classad would be split into potentially two tables (string and numeric), there are views that unify them into a single entity in order to simplify queries.

Here are the view definitions:

Finally, the job queue part of the schema also contains a table that stores metadata information related to the job_queue.log file.

At all times, there's only 1 row in this table and it describes information related to the last time quill polled the job_queue.log file.

The historical information on the other hand is slightly differently designed. Instead of a purely vertical data model (each row is a <attribute,value> pair), we have two tables that together represent the complete job classad. Their schema is as follows:

  1. History_Horizontal (cid int, pid int, EnteredHistoryTable timestamp with time zone, Owner text, QDate int, RemoteWallClockTime int, RemoteUserCpu float, RemoteSysCpu float, ImageSize int, JobStatus int, JobPrio int, Cmd text, CompletionDate int, LastRemoteHost text, primary key(cid,pid))

  2. History_Vertical (cid int, pid int, attr text, val text, primary key (cid, pid, attr))

Each historical job ad is divided into its horizontal and vertical counterparts. This division was made because of query performance reasons. While its easier to store classads in a vertical table, queries on vertical tables generally perform worse than those on horizontal tables since the latter has lot fewer records. However, in Condor, since job ads dont have a fixed schema (users can define their own attributes), a purely horizontal schema would end up having a lot of null values. As such, we have a hybrid schema where attributes on which queries are frequently performed (via condor_ history) are put in the History_Horizontal table and the other attributes are stored vertically (just as in the Cluster/Proc tables above) in the History_Vertical table. Also History_Horizontal contains all the attributes needed to service the short form of the condor_ history command (i.e. without the -l option).

The resulting hybrid schema has proven to be the most efficient in servicing condor_ history queries. The job queue tables (Cluster and Proc) were not designed in this hybrid manner because job queues aren't as large as history; just a vertical schema worked great.


3.11.5 Quill and Security

There are several layers of security in Quill, some provided by condor and others provided by the database. Firstly, all accesses to the database are password-protected.

  1. As mentioned in section 2c) above, the query tools, condor_ q and condor_ history connect to the database as user ``quillreader''. The password for this user can vary from one database to another and as such, each quill daemon advertises this password to the collector. The query tools then obtain this password from the collector and connect successfully to the database. Access to the database by the ``quillreader'' user is read-only as this is sufficient for the query tools. The quill daemon ensures this protected access using the sql GRANT command when it first creates the tables in the database. Note that access to the ``quillreader'' password itself can be blocked by blocking access to the collector, a feature already supported in Condor.

  2. The quill daemon, on the other hand, needs read and write access to the database. As such, it connects as user ``quillwriter'' who has owner priviledges to the database. Since this gives all access to the ``quillwriter'' user, its password cannot be stored in a public place (such as the collector). For this reason, the ``quillwriter'' password is stored in a file called .quillwritepassword in the condor spool directory. Appropriate read/write protections on this file guarantee secure access to the database. This file must be created and protected by the site administrator; if this file does not exist as and where expected, the condor_ quill daemon logs and error and exits.

  3. Finally, as mentioned in section 4) above, the IsRemotelyQueryable attribute in the ``quill ad'' advertised by the quill daemon to the collector can be used by site administrators to disallow the database from being read by all remote condor query tools.


3.11.6 Maintenance of Quill

There are virtually no maintenance issues in Quill. Once started, it checks if all necessary database related structures (database itself, tables, indices, views) are present and creates them if they are not present. It also purges old historical jobs based on user policy (see section 4 above) and garbage collection in the database (using the postgres VACUUM ANALYZE command). Of course, if Quill is shut down and the database is no longer needed, it can be dropped using the postgres dropdb command.


next up previous contents index
Next: 3.12 Setting Up for Up: 3. Administrators' Manual Previous: 3.10 The High Availability   Contents   Index
condor-admin@cs.wisc.edu