gpus.inc

.. _gpu_list:

using GPUs
----------

With the release of SLURM we introduced a number of specific nodes with two
flavors of Nvidia GPUs attached to them to be used with CUDA-enabled code.

Right now we have these nodes available:

======== =============== ====== ===== ========= ============
Nodename GPU Type        Memory Count Partition Architecture
======== =============== ====== ===== ========= ============
gpu-1    GTX 1080 TI     12 GB  2     test      Pascal
-------- --------------- ------ ----- --------- ------------
gpu-2    GTX 1080        8 GB   3     gpu       Pascal
-------- --------------- ------ ----- --------- ------------
gpu-3    GTX 1080        8 GB   3     gpu       Pascal
-------- --------------- ------ ----- --------- ------------
gpu-4    Quadro RTX 5000 16 GB  4     gpu       Turing
-------- --------------- ------ ----- --------- ------------
gpu-5    Quadro RTX 5000 16 GB  4     gpu       Turing
-------- --------------- ------ ----- --------- ------------
gpu-6    Quadro RTX 5000 16 GB  4     gpu       Turing
-------- --------------- ------ ----- --------- ------------
gpu-7    Quadro RTX 5000 16 GB  4     gpu       Turing
======== =============== ====== ===== ========= ============

Both the 12GB 1080 TI and the 8GB 1080 are grouped under the name **pascal**. The
short name for the more powerful Quadro cards is **turing**.

To request any GPU, you can use ``-p gpu --gres gpu:1`` or ``-p test --gres
gpu:1`` if you want to test things. The ``gres`` parameter is very flexible and
allows to request the GPU group/architecture (**pascal** or **turing**).
For example, to request 2 Geforce 1080, use ``--gres gpu:pascal:2``. This will
effectively hide all other GPUs and grants exclusive usage of the devices.

You can use the `nvidia-smi` tool in an interactive job or the node-specific
charts to get an idea of the device's utilization.

Any code that supports CUDA up to version 10.1 should just work out of the box,
that includes python's pygpu or Matlab's gpu-enabled libraries.

.. note::

    It is also possible to pass a requested GPU into a **singularity
    container**. You have to pass the ``--nv`` flag to any
    singylarity calls, however.

Example: Request an interactive job (``srun --pty``) with 4 cores, 8 GB of
memory and a single card from the *turing* group. Instead of ``/bin/bash`` we use
the shell from a singularity container and tell singularity to prepare an
Nvidia environment with ``singularity shell --nv``:

.. code::

   srun --pty -p gpu --gres gpu:turing:1 -c 4 --mem 8gb \
     singularity shell --nv /data/container/unofficial/fsl/fsl-6.0.4.sif

   Singularity> hostname
   gpu-4

   Singularity> nvidia-smi
   Tue Jul 14 18:38:14 2020
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 418.74       Driver Version: 418.74       CUDA Version: 10.1     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |===============================+======================+======================|
   |   0  Quadro RTX 5000     Off  | 00000000:3B:00.0 Off |                  Off |
   | 33%   28C    P8    10W / 230W |      0MiB / 16095MiB |      0%      Default |
   +-------------------------------+----------------------+----------------------+
                                                                               
   +-----------------------------------------------------------------------------+
   | Processes:                                                       GPU Memory |
   |  GPU       PID   Type   Process name                             Usage      |
   +-----------------------------------------------------------------------------+
   |=============================================================================|
   |  No running processes found                                                 |
   +-----------------------------------------------------------------------------+
   Singularity>