Commit 8bcb2208 authored by Michael Krause's avatar Michael Krause 🎉
Browse files

rm: add slurm GPU usage and examples

parent 78eb9bc6
......@@ -8,14 +8,13 @@ a little. We switched, because it is much more flexible than Torque, actively
maintained, and supports sophisticated GPU scheduling. We will gradually move
nodes from Torque to SLURM to motivate everyone to familiarize with the new system.
.. important::
**Users familiar with Torque** can check out the :ref:`slurm_transition` for a quick start.
**Users familiar with Torque** can check out the :ref:`slurm_transition` for a quick start:
.. include:: slurm/transition.rst
.. include:: slurm/jobs.rst
.. include:: slurm/commands.rst
.. include:: slurm/resources.rst
.. include:: slurm/transition.rst
.. include:: slurm/gpus.rst
.. _SLURM: https://slurm.schedmd.com/
using GPUs
----------
With the release of SLURM we introduced a number of specific nodes with two
flavors of Nvidia GPUs attached to them to be used with CUDA-enabled code.
Right now we have these nodes available:
======== =============== ====== ===== =========
Nodename GPU Type Memory Count Partition
======== =============== ====== ===== =========
gpu-1 GTX 1080 TI 12 GB 2 test
-------- --------------- ------ ----- ---------
gpu-2 GTX 1080 8 GB 3 gpu
-------- --------------- ------ ----- ---------
gpu-3 GTX 1080 8 GB 3 gpu
-------- --------------- ------ ----- ---------
gpu-4 Quadro RTX 5000 16 GB 4 gpu
-------- --------------- ------ ----- ---------
gpu-5 Quadro RTX 5000 16 GB 4 gpu
-------- --------------- ------ ----- ---------
gpu-6 Quadro RTX 5000 16 GB 4 gpu
-------- --------------- ------ ----- ---------
gpu-7 Quadro RTX 5000 16 GB 4 gpu
======== =============== ====== ===== =========
Both the 12GB 1080 TI and the 8GB 1080 are grouped under the name **1080**. The
short name for the more powerful Quadro cards is **rtx5k**.
To request any GPU, you can use ``-p gpu --gres gpu:1`` or ``-p test --gres
gpu:1`` if you want to test things. The ``gres`` parameter is very flexible and
allows to request the GPU group (**1080** or **rtx5k**).
For example, to request 2 Geforce 1080, use ``--gres gpu:1080:2``. This will
effectively hide all other GPUs and grants exclusive usage of the devices.
You can use the `nvidia-smi` tool in an interactive job or the node-specific
charts to get an idea of the device's utilization.
Any code that supports CUDA up to version 10.1 should just work out of the box, that includes python's pygpu or Matlab's gpu-enabled libraries.
.. note::
It is also possible to pass a requested GPU into a **singularity
container**. You have to pass the ``--nv`` flag to any
singylarity calls, however.
Example: Request an interactive job (srun --pty) with 4 cores, 8gb of memory and a single card from the rtx5k group. Instead of ``/bin/bash`` we use the shell from a singularity container and tell singularity to prepare an nvidia environment ``singularity shell --nv``:
.. code::
srun --pty -p gpu --gres gpu:rtx5k:1 -c 4 --mem 8gb \
singularity shell --nv /data/container/unofficial/fsl/fsl-6.0.3.sif
Singularity> hostname
gpu-4
Singularity> nvidia-smi
Tue Jul 14 18:38:14 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.74 Driver Version: 418.74 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:3B:00.0 Off | Off |
| 33% 28C P8 10W / 230W | 0MiB / 16095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
+-----------------------------------------------------------------------------+
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Singularity>
......@@ -27,14 +27,6 @@ can be found in ``man sbatch``.
Requests 2 CPUs for the job. This only makes sense if your code is
multi-threaded and can actually utilize the cores.
``#SBATCH --gres gpu:1 --partition gpu``
Ask for a single GPU of any kind. It's also necessary to specify
a different partition. The ``gres`` parameter is very flexible and you can
also specify the GPU type (``1080`` or ``rtx5k``). For example, to request
2 Geforce 1080, use `--gres gpu:1080:2`. This will effectively hide all
other GPUs on the system. You can test it out by running `nvidia-smi` in an
interactive job.
``#SBATCH --workdir project/data``
Sets the working directory of the job. Every time a job gets started it
will spawn a shell on some node. To initially jump to some directory use
......@@ -74,3 +66,7 @@ can be found in ``man sbatch``.
tagged as startable when another job with id *Job-Id* finished
successfully. You can provide more than one id using a colon as
a separator.
``#SBATCH --gres gpu:1 --partition gpu``
Request a single GPU of any kind. It's also necessary to specify
a different partition using the ``--partition/-p`` flag.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment