resources.inc

.. _slurm_resources:

Resources
---------

This is a list of common SLURM options. You can either use these options
directly with ``sbatch``/``srun`` or add them as meta-parameters in a job file.
In the later case those options need the prefix ``#SBATCH`` and must be stated
in the first section of the file before the actual commands. The complete list
can be found in ``man sbatch``.

``#SBATCH --job-name job-name``
   Sets the name of the job. This is mostly useful when submitting lot's of
   similar jobs in a loop.

``#SBATCH --time 24:0:0``
   Sets the expected maximum running time for the job. When a job **exceeds**
   those limits it will **be terminted**.


``#SBATCH --mem 10GB``
   Sets another resource requirement: memory. Exceeding this value in a job is
   even more crucial than running time as you might interfere with other jobs
   on the node. Therefor it needs to be **terminated as well**.

``#SBATCH --cpus-per-task 2``
    Requests 2 CPUs for the job. This only makes sense if your code is
    multi-threaded and can actually utilize the cores.

``#SBATCH --workdir project/data``
    Sets the working directory of the job. Every time a job gets started it
    will spawn a shell on some node. To initially jump to some directory use
    this option. *Otherwise* the first command of your job should always be ``cd
    project/data``.


``#SBATCH --output /home/mpib/krause/logs/slurm-%j.out``
    Specify the location where SLURM will save the jobs' log file. By default
    (different to Torque) *stdout* and *stderr* streams will be merged together
    into this output file. The ``%j`` variable will be replaced with the SLURM
    job id. To save the error stream to a separate file, use ``--error``. If
    you do not specify ``--output`` or ``--error``, the (combined) log is
    stored in the current location. Another difference to Torque is the fact
    that log file will be available right away and contents will be streamed
    into it during the lifecycle of the job (you can follow incoming data with
    ``tail -f slurm.out``.

``#SBATCH --output /dev/null``
    To discard all standard output log use the special file ``/dev/null``.


``#SBATCH --mail-user krause[,knope,dwyer]``
    Send an e-mail to a single user or a list of users for some configured mail
    types (see below).

``#SBATCH --mail-type NONE,[OTHER,EVENT,TYPES]``
    + **NONE** default (no mail)
    + **BEGIN** defines the beginning of a job run
    + **FAIL** send an e-mail when the job has failed
    + **END** send an e-mail when the job has finished

    Check out ``man sbatch`` for mor mail types.

``#SBATCH --dependency=afterok:Job-Id[:Job2-Id...]``
    This will add a dependency to the current job. It will only be started or
    tagged as startable when another job with id *Job-Id* finished
    successfully. You can provide more than one id using a colon as
    a separator.

``#SBATCH --gres gpu:1 --partition gpu``
    Request a single GPU of any kind. It's also necessary to specify
    a different partition using the ``--partition/-p`` flag.