Skip to content
jobs.rst 4.07 KiB
Newer Older
.. _slurm_jobs:
Michael Krause's avatar
Michael Krause committed

Example Jobs
------------

SLURM job concepts exceed those of Torque and this document will try to
match those. For a comprehensible list, check out the official `slurm
documentation`_.
Michael Krause's avatar
Michael Krause committed

Simple Jobs
+++++++++++

The simplest job file just consist of a list of shell commands to be executed.
In that case it is equivalent to a shell script. Note, that in contrast to
Torque, SLURM jobs have to start with a hash-bang (#!) line.

Example ``simple_job.job``

.. code-block:: bash

   #!/bin/bash
   ./run_simulation.py


To submit this job run:

.. code-block:: bash

   sbatch simple_job.job
Now that is rarely sufficient. In most cases you are going to
need some resource requests and a state variable as you are very likely to
submit multiple similar jobs. It is possible to add SLURM parameters (see
:ref:`slurm_resources`) inside the job file.
Michael Krause's avatar
Michael Krause committed

Example ``job_with_resources.job``

.. code-block:: bash


   #!/bin/bash
   #SBATCH --job-name myjob
   #SBATCH --partition gpu
   #SBATCH --time 24:0:0
   #SBATCH --cpus-per-task 2
   #SBATCH --mem 32GB
   #SBATCH --gres gpu:1
   #SBATCH --mail-type NONE
   #SBATCH --workdir .
Michael Krause's avatar
Michael Krause committed

   ./run_simulation.py


This would create a job called "myjob" in the GPU partiton, that needs 24 hours
of running time, 32GB of RAM, a single GPU of any type, and 2 processors. It
will not send any e-mails and start in the current directory.
Michael Krause's avatar
Michael Krause committed

Interactive Jobs
++++++++++++++++

Sometimes it may be useful to get a quick shell on one of the compute nodes.

Before submitting hundreds or thousands of jobs you might want to run some
simple checks to ensure all the paths are correct and the software is loading
as expected. Although you can usually run these tests on the login node itself
there are cases when this is dangerous, for example when your tests quickly
require lot's of memory. In that case you should move those tests to one of the
compute nodes:
Michael Krause's avatar
Michael Krause committed

.. code-block:: bash

   srun --pty bash
Michael Krause's avatar
Michael Krause committed

This will submit a job that requests a shell. The submission will block until
the job gets scheduled. Note that we do not use `sbatch`, but the similar
Michael Krause's avatar
Michael Krause committed
command `srun`. The fundamental distinction here is that `srun` will block
until the command or job gets scheduled, while `sbatch` puts the job into the
queue and returns immediately. The parameter ``--pty`` allocates a psdeudo-terminal to the program so input/output works as expected.
Michael Krause's avatar
Michael Krause committed

When there are lot's of jobs in the queue the scheduling might take some time.
To speed things up you can submit to the testing queue which only allows jobs
with a very short running time: Example:

.. code-block:: bash

    srun -p test --pty /bin/bash

Other useful examples are:


.. code-block:: bash

   # get a quick R session with 2 cores in the test partition
   srun -p test -c 2 --pty R

   # Start the most recent Matlab with 32GB
   # two bash commands need to be passed to bash -c
   srun --mem 32g --pty bash -c 'module load matlab ; matlab'

   # Start some python script with 1 GPU and 2 cores
   srun -p gpu --gres gpu -c 2 python3 main.py
Michael Krause's avatar
Michael Krause committed


.. _slurm_job_wrappers:

Job Wrappers
++++++++++++

Usually users want to collect a number of jobs into batches and submit them with one command. There are a number of approaches to do that. The most straight forward way is to use a minimal ``submit.sh`` shell script that could look a bit like this:

.. code-block:: bash

    #!/bin/bash

    for sub in $(seq -w 1 15) ; do
        echo '#!/bin/bash'                    > job.slurm
        echo "#SBATCH --job-name main_$sub"  >> job.slurm
        echo "#SBATCH --cpus 2"              >> job.slurm
        echo "python main.py $sub"           >> job.slurm
        sbatch job.slurm
        rm -f job.slurm
    done


This can be condensed down into a single line with the ``--wrap`` option to
sbatch. Here SLURM will create a job file on the fly, add the #!-line and
append the wrapped string to that file. This is syntax is being used a lot in
the examples in this document.

.. code-block:: bash

   for sub in $(seq -w 1 15) ; do
     sbatch -c 2 -J main_$sub --wrap "python3 main.py $sub"
Michael Krause's avatar
Michael Krause committed

.. _slurm documentation: https://slurm.schedmd.com/quickstart.html