Commit 03be694f authored by Michael Krause's avatar Michael Krause 🎉
Browse files

moved pbs to generic rm

parent 15daf226
......@@ -50,13 +50,13 @@ List of Contents
introduction/rules
.. toctree::
:maxdepth: 1
:caption: Batch System
:maxdepth: 3
:caption: Resource Manager
rm/general
rm/torque
rm/slurm
pbs/torque
pbs/jobs
pbs/commands
pbs/resources
.. toctree::
:maxdepth: 1
......
Torque
======
We are using a cluster environment called `Torque`_. There is large number of
similar systems with different sets of tools, implementation styles and
licenses. Many of them are somewhat similar to the original portable batch
System **PBS**, developed by NASA.
There are **3 main components** to Torque:
1. Main Server accepting queueing and scheduling commands
2. Separate Scheduling System that handles resource allocation policies
3. Machine Oriented Mini-Servers that handle the processing on the nodes itself
As a user you are only going to interact with the main server with a set of
commands, most importantly ``qsub``.
General
=======
A main component of every HPC system is called the resource manager (RM).
Sometimes it's also called a batch system. There are quite a number of systems
out there, commercial, free and open source, or a mixture of both. They all try
to solve a similar problem but they are not compatible to each other. Some
notable examples are:
+ PBS
+ Sun/Oracle Grid Engine
+ Torque / PBSpro
+ Condor
+ LSF
+ SLURM
We have been using a resource manager called Torque for many years now and it
worked quite well. Unfortunately the open source part of the project isn't
maintained very well anymore and the lack of proper GPU support led us to switch
to SLURM. We will gradually switch from Torque to SLURM (2018) and hence you
will find documentation and example commands for both systems available on this
page.
Queues
------
Torque manages a number of queues that can hold thousands of jobs that are
subject to execution. Once you prepared a Job (:ref:`Jobs`) you can place them
inside a queue. All the jobs of all users are going to the same central
queue(s):
The RM usually manages a number of queues that can hold thousands of jobs that
are subject to execution. Once you prepared a Job
(:ref:`torque_jobs`, :ref:`slurm_jobs`) you can place them inside a queue. All
the jobs of all users are going to the same central queue(s):
.. image:: ../img/queue.svg
:width: 100%
The scheduler uses a fair share sliding window algorithm to decide what job to
pick out of that queue and start it on some node. Assume that in the image
above jobs on the right hand side have been submitted earlier than those on the
left side. It is absolutely possible that the next feasible job in that queue
is not a green one but a blue or red one. The decision depends on the amount of
resources a job requires and the amount of resources the corresponding job
owner had used in last 7 days.
The scheduler part of the RM uses different, configurable priority-based
algorithms to decide what job to pick out of that queue and start it on some
node. For Torque specifically, the scheduler implements fair share scheduling
for every user over a window of 7 days. Another global objective for the
scheduler (Torque or SLURM) is to maximize resource utilization while
simultaneously assuring that every jobs will start eventually.
Assume that in the image above jobs on the right hand side have been submitted
earlier than those on the left side. It is absolutely possible that the next
feasible job in that queue is not a green one but a blue or red one. The
decision depends on the amount of resources a job requires and the amount of
resources the corresponding job owner had used in last 7 days.
Each queue has different parameter sets and resource targets. On the tardis there are 3 queues:
Each queue has different parameter sets and resource targets. On the tardis
there are 4 distinct queues:
+ ``default``
+ ``longwall`` for jobs that need more than 36 hours
......@@ -44,13 +56,14 @@ Each queue has different parameter sets and resource targets. On the tardis ther
Resources
---------
There are **3** important resources used for accounting, reservations and scheduling:
There are **4** important resources used for accounting, reservations and scheduling:
1. CPU cores
2. Amount of physical Memory
2. Amount of physical memory
3. Time
4. Generic resources (gres, usually a GPU)
A job is a piece of code that requires combination of those 3 resources to run
A job is a piece of code that requires combination of those resources to run
correctly. Thus you can request each of those resources separately. For
instance a computational problem *might* consist of the following:
......@@ -61,15 +74,19 @@ B
20 jobs where each can use as many local processors as possible requiring 10GB of memory each with an unknown or varying running time.
C
A single job that is able to utilize the whole cluster at once using MPI.
A single job that is able to utilize the whole cluster at once using a network
layer such as Message Passing Interface (MPI)
All of the above requirements need to be represented with a job description so
Torque knows how many resources to acquire. This is especially important with
the RM knows how many resources to acquire. This is especially important with
large jobs when there are a lot of other, smaller jobs in the queue that need
to be actively retained so the larger jobs won't starve. The batch system is
constantly partitioning all of the cluster resources to maintain optimal
efficiency and fairness.
.. _Torque: http://www.adaptivecomputing.com/products/open-source/torque
.. important::
The need for GPU scheduling is the reason we are switching from Torque to
SLURM. If you want to submit CUDA jobs, you **have** to use SLURM.
SLURM **(new)**
===============
`SLURM`_ is the resource manager we introduced to the tardis system in 2018. It
is similar to Torque in its main concepts, but the commands and syntax differs
a lot. Although there is a compatibility layer to translated the q-command
family to SLURM we will refer to the native commands on this page. Right now you
need SLURM to schedule GPUs, but during the course of the year we will gradually
move nodes from Torque to SLURM to motivate everyone to switch to the new
system.
The structure of this page is similar to the Torque.
.. include:: slurm/jobs.rst
.. include:: slurm/commands.rst
.. include:: slurm/resources.rst
.. _SLURM: https://slurm.schedmd.com/
Commands
--------
TBD
.. slurm_jobs:
Example Jobs
------------
We mentioned job files as parameters to sbatch in the general section. They are
a convenient way of collecting job properties without clobbering the command
line. It's also useful to programmatically create a job description and capture
it in a file.
SLURM job concepts far exceed those of Torque and this document will try to match those. For a comprehensible list, check out the official `slurm documentation`_.
Simple Jobs
+++++++++++
The simplest job file just consist of a list of shell commands to be executed.
In that case it is equivalent to a shell script. Note, that in contrast to
Torque, SLURM jobs have to start with a hash-bang (#!) line.
Example ``simple_job.job``
.. code-block:: bash
#!/bin/bash
cd project/
./run_simulation.py
You can then submit that job with ``sbatch simple_job.job``.
Now that is rarely sufficient. In most cases you are going to need some
resource requests and a state variable as you are very likely to submit
multiple similar jobs. It is possible to add slurm parameters (see
:doc:`slurm_resources`) inside the job file.
Example ``job_with_resources.job``
.. code-block:: bash
#!/bin/bash
#SBATCH --job-name myjob
#SBATCH --partition gpu
#SBATCH --time 24:0:0
#SBATCH --cpus-per-task 2
#SBATCH --mem 32GB
#SBATCH --gres gpu:1
#SBATCH --mail-type NONE
#SBATCH --chdir .
./run_simulation.py
This would create a job called **myjob** in the GPU queue (partiton), that
needs **24 hours** of running time, **32 gigabyte of RAM**, a single GPU of any
type, 2 processors. It will **not send any e-mails** and start in the **current
directory**.
Interactive Jobs
++++++++++++++++
Sometimes it may be useful to get a quick shell on one of the compute nodes.
Before submitting hundreds or thousands of jobs you might want to run some
simple checks to ensure all the paths are correct and the software is loading
as expected. Although you can usually run these tests on the master itself there are
cases when this is dangerous, for example when your tests quickly require lot's
of memory. In that case you should move those tests to one of the compute nodes:
.. code-block:: bash
srun --job-name test --pty /bin/bash
This will submit a job that requests a shell. The submission will block until
the job gets scheduled. Note that we did not use `sbatch` here, but the similar
command `srun`. The fundamental distinction here is that `srun` will block
until the command or job gets scheduled, while `sbatch` puts the job into the
queue and returns immediately.
When there are lot's of jobs in the queue the scheduling might take some time.
To speed things up you can submit to the testing queue which only allows jobs
with a very short running time: Example:
.. code-block:: bash
[krause@master ~/slurmtests] srun --partition gpu --job-name test --pty /bin/bash
[krause@gpu-1 ~/slurmtests]
.. _slurm_job_wrappers:
Job Wrappers
++++++++++++
TBD, check here later.
.. _slurm documentation: https://slurm.schedmd.com/quickstart.html
.. _slurm_resources:
Resources
---------
TBD
Torque **(EOL)**
================
`Torque`_ is a **PBS**-based resource manager, originally developed by NASA and
currently maintained by Adaptive Computing Inc. A necessary, but problematic
requirement for Torque is the independent scheduler. For a long time we used
the open source product Maui, which hadn't seen an update for many years and
became incompatible to some of the more recent feature of Torque.
.. include:: torque/jobs.rst
.. include:: torque/commands.rst
.. include:: torque/resources.rst
.. _Torque: http://www.adaptivecomputing.com/products/open-source/torque
Important Commands
==================
------------------
This is is a list of the core PBS related commands.
Submitting
----------
++++++++++
Submit a job description stored in a file called ``jobfile``. The return value
of qsub an error or the job ID in the form of ``ID@tardis.mpib-berlin.mpg.de``.
......@@ -62,7 +62,7 @@ Examples:
Querying
--------
++++++++
To query the state of a single job, run:
......@@ -119,7 +119,7 @@ Examples:
2 0
Selecting
---------
+++++++++
The reverse command to ``qsub <jobid>`` is ``qselect``. This is useful to
generate a number of active job IDs matching some properties. This is extremely
......@@ -143,7 +143,7 @@ Examples:
Deleting
--------
++++++++
Sometimes it is necessary to delete jobs from the queue with ``qdel [job id]``.
Either because you realized the code is not doing what it's supposed to be
......@@ -172,7 +172,7 @@ but you can use grep and xargs:
Altering
--------
++++++++
It may be useful to change job parameters while they are waiting in the queue.
This is more efficient than deleting and re-submitting the jobs because their
......
.. _jobs:
Example Jobs
============
------------
We mentioned job files as parameters to qsub in the last section. They are
a convenient way of collecting job properties without clobbering the command
......@@ -9,7 +9,7 @@ line. It's also useful to programmatically create a job description and capture
it in a file.
Simple Jobs
-----------
+++++++++++
The simplest job file just consist of a list of shell commands to be executed.
In that case it is equivalent to a shell script.
......@@ -50,7 +50,7 @@ called **$HOME/logs/**. It will **not send any e-mails** and start in the **curr
directory**.
Interactive Jobs
----------------
++++++++++++++++
Sometimes it may be useful to get a quick shell on one of the compute nodes.
......@@ -75,10 +75,10 @@ This will submit a job that requests a shell. The submission will block until th
krause@ood-9:~> $
.. _job_wrappers:
.. _torque_job_wrappers:
Job Wrappers
------------
++++++++++++
Another common pattern is to create a script that would loop over some input
space and then create a job-file line-by-line and submit it at the end of each
......@@ -143,7 +143,7 @@ A different syntax to get exactly the same thing:
.. _job_array:
Environment Variables
----------------------
++++++++++++++++++++++
There are a number of **environment variables** available to each job, for instance:
......
.. _resources:
.. _torque_resources:
Resources and Options
=====================
---------------------
Here is a list of common pbs options. You can either use these options directly
with ``qsub`` or add them as meta-parameters in a job file. In the later case
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment