torque.rst

Torque
======

We are using a cluster environment called `Torque`_. There is large number of
similar systems with different sets of tools, implementation styles and
licenses. Many of them are somewhat similar to the original portable batch
System **PBS**, developed by NASA.

There are **3 main components** to Torque:

1. Main Server accepting queueing and scheduling commands
2. Separate Scheduling System that handles resource allocation policies
3. Machine Oriented Mini-Servers that handle the processing on the nodes itself

As a user you are only going to interact with the main server with a set of
commands, most importantly ``qsub``.

Queues
------

Torque manages a number of queues that can hold thousands of jobs that are
subject to execution. Once you prepared a Job (:doc:`Jobs`) you can place them
inside a queue. All the jobs of all users are going to the same central
queue(s):

.. image:: ../img/queue.svg
   :width: 100%

The scheduler uses a fair share sliding window algorithm to decide what job to
pick out of that queue and start it on some node. Assume that in the image
above jobs on the right hand side have been submitted earlier than those on the
left side. It is absolutely possible that the next feasible job in that queue
is not a green one but a blue or red one. The decision depends on the amount of
resources a job requires and the amount of resources the corresponding job
owner had used in last 7 days.

Each queue has different parameter sets and resource targets. On the tardis there are 3 queues:

+ ``default``
+ ``longwall`` for jobs that need more than 36 hours
+ ``testing`` for very short jobs only
+ ``gpu`` for jobs that need a GPU

Resources
---------

There are **3** important resources used for accounting, reservations and scheduling:

1. CPU cores
2. Amount of physical Memory
3. Time

A job is a piece of code that requires combination of those 3 resources to run
correctly. Thus you can request each of those resources separately. For
instance a computational problem *might* consist of the following:

A
   10.000 single threaded Jobs each running only a couple of minutes and a memory foot print of 100MB.

B
   20 jobs where each can use as many local processors as possible requiring 10GB of memory each with an unknown or varying running time.

C
  A single job that is able to utilize the whole cluster at once using MPI.


All of the above requirements need to be represented with a job description so
Torque knows how many resources to acquire. This is especially important with
large jobs when there are a lot of other, smaller jobs in the queue that need
to be actively retained so the larger jobs won't starve. The batch system is
constantly partitioning all of the cluster resources to maintain optimal
efficiency and fairness.


.. _Torque: http://www.adaptivecomputing.com/products/open-source/torque