jobs.rst

.. slurm_jobs:

Example Jobs
------------

We mentioned job files as parameters to sbatch in the general section. They are
a convenient way of collecting job properties without clobbering the command
line. It's also useful to programmatically create a job description and capture
it in a file.
SLURM job concepts far exceed those of Torque and this document will try to match those. For a comprehensible list, check out the official `slurm documentation`_.

Simple Jobs
+++++++++++

The simplest job file just consist of a list of shell commands to be executed.
In that case it is equivalent to a shell script. Note, that in contrast to
Torque, SLURM jobs have to start with a hash-bang (#!) line.

Example ``simple_job.job``

.. code-block:: bash

   #!/bin/bash
   cd project/
   ./run_simulation.py


You can then submit that job with ``sbatch simple_job.job``.


Now that is rarely sufficient. In most cases you are going to need some
resource requests and a state variable as you are very likely to submit
multiple similar jobs. It is possible to add slurm parameters (see
:doc:`slurm_resources`) inside the job file.

Example ``job_with_resources.job``

.. code-block:: bash


   #!/bin/bash
   #SBATCH --job-name myjob
   #SBATCH --partition gpu
   #SBATCH --time 24:0:0
   #SBATCH --cpus-per-task 2
   #SBATCH --mem 32GB
   #SBATCH --gres gpu:1
   #SBATCH --mail-type NONE
   #SBATCH --chdir .

   ./run_simulation.py


This would create a job called **myjob** in the GPU queue (partiton), that
needs **24 hours** of running time, **32 gigabyte of RAM**, a single GPU of any
type, 2 processors. It will **not send any e-mails** and start in the **current
directory**.

Interactive Jobs
++++++++++++++++

Sometimes it may be useful to get a quick shell on one of the compute nodes.

Before submitting hundreds or thousands of jobs you might want to run some
simple checks to ensure all the paths are correct and the software is loading
as expected. Although you can usually run these tests on the master itself there are
cases when this is dangerous, for example when your tests quickly require lot's
of memory. In that case you should move those tests to one of the compute nodes:

.. code-block:: bash

   srun --job-name test --pty /bin/bash

This will submit a job that requests a shell. The submission will block until
the job gets scheduled. Note that we did not use `sbatch` here, but the similar
command `srun`. The fundamental distinction here is that `srun` will block
until the command or job gets scheduled, while `sbatch` puts the job into the
queue and returns immediately.

When there are lot's of jobs in the queue the scheduling might take some time.
To speed things up you can submit to the testing queue which only allows jobs
with a very short running time: Example:

.. code-block:: bash

    [krause@master ~/slurmtests] srun --partition gpu --job-name test --pty /bin/bash
    [krause@gpu-1 ~/slurmtests]


.. _slurm_job_wrappers:

Job Wrappers
++++++++++++

TBD, check here later.

.. _slurm documentation: https://slurm.schedmd.com/quickstart.html