dnqs

Many Unix jobs are not of an interactive nature and require prolonged periods of computation. It is anti-social to run this type of job on a workstation, thereby monopolising its interactive facilities; instead they should be submitted to the Distributed Network Queuing System (dnqs), where they will be run in a controlled sequence with other batch jobs, unattended by the user.

dnqs consists of a set of queues distinguished by the various requirements of the jobs submitted to them, e.g. machine, execution time. These queues dispatch jobs in turn to the time-shared computers (currently Aidan and Finan) and to a set of other machines which are dedicated to this purpose. Aidan and Finan may run a set of batch jobs concurrently, as well as servicing many other interactive activities. The dedicated machines run only one job at a time, that job having exclusive use of the host machine.

Note that dnqs jobs may be submitted from any ISS Solaris system.

Constructing a batch job

In order to use dnqs a job needs to be constructed: this is simply a set of UNIX commands (just as would be typed if the work were being done interactively at a workstation) which are assembled in the correct sequence in a file. However, it should be recognised that the execution of batch jobs lacks the human supervision present in interactive work: if jobs are to run successfully, possible pitfalls should be anticipated in advance and provision made for them: errors cannot be fixed "on the fly".

Queues available

Queues have been set up for jobs which require "small", "medium", "large" and "extra large" amounts of processing time (for example sunp_s, sunp_m, sunp_l, sunp_xl)

Note that the queues impose a limit by per-process CPU time: this limit may change and/or additional ones may be added at short notice. The number and locations of the queues are also subject to change in the light of experience.

Queuename Max CPU time Max VM (Megabytes) Machine
sunp_s 3 hours 250 aidan (400MHz)
sunp_m 12 hours 250 aidan (400MHz)
sunp_l 24 hours 250 aidan (400MHz)
sunp_xl 5 days 250 aidan (400MHz)
sun750m_m 24 hours 600 batch1 (750MHz)
sun750m_l 7 days 600 batch1 (750MHz)
sun750m_xl 28 days 600 batch1 (750MHz)

Notes

Submitting a batch job

The actual submission of the job is done using the qsub command, specifying the selected queue and the name of the file containing the job, e.g.

qsub sun750m_m big_calc

This puts the job (contained here in a file called big_calc) into the selected queue (here, sun750m_m) where it will await its turn for execution.

A unique number is assigned to the job in order to investigate its progress and identify its output, e.g.

Your job "big_calc" (935678577) has been submitted to queue sun750m_m.

A number of options are available for the qsub command, and the Unix man command describes these: type

man qsub

Of particular interest is -M which causes the user to be e-mailed with details of the execution of the job.

Output from dnqs jobs

Generally, when a job runs it produces output, which may be directed explicitly to files named in the job or to the standard output and error streams.

For interactive jobs, the standard streams are often displayed on the screen. Of course, this is not possible for batch jobs and so their contents are collected into files which are kept in a directory called dnqs_outputs in the user's home directory. The names of these files contain the job number for identification, e.g.

935678577.stdout and 935678577.stderr

Examples

The directory /usr/local/dnqs/examples contains simple examples of the use of dnqs and the file /usr/local/dnqs/examples/README shows how to try out these examples.

A particular example, qexample, which has its own man page, shows a more sophisticated scheme (for optimising file activity). This should not be contemplated until you have a complete understanding of simple dnqs usage.

Other dnqs commands

A number of other utility commands are available.

qstat

show dnqs queue status, including settings and limits

qrm

remove a dnqs job from the queue

qdate

converts the jobid (number) to a date/time. Returns submission time, not start time.

qacct

display accumulated records of all dnqs jobs

quser

displays records only for the specified user

See their man pages for their purpose and use.

dnqs was developed in the USA and modified for use at Newcastle University. Consequently it may be found that the man pages are not completely accurate; however, all locally written documentation should be authoritative.