Commit ea4384b3 authored by Michael Krause's avatar Michael Krause 🎉
Browse files

Clarify and introduce new slurm partition

parent efa02ee8
Pipeline #9819 passed with stages
in 34 seconds
...@@ -45,22 +45,30 @@ Assume that in the image above jobs on the right hand side have been submitted ...@@ -45,22 +45,30 @@ Assume that in the image above jobs on the right hand side have been submitted
earlier than those on the left side. It is absolutely possible that the next earlier than those on the left side. It is absolutely possible that the next
feasible job in that queue is not a green one but a blue or red one. The feasible job in that queue is not a green one but a blue or red one. The
decision depends on the amount of resources a job requires and the amount of decision depends on the amount of resources a job requires and the amount of
resources the corresponding job owner had used in last 7 days. resources the corresponding job owner has used in last 7 days.
Each queue has different parameter sets and resource targets: Each queue/partition has different parameter sets and resource targets:
**Torque**
+ ``default``
+ ``longwall`` for jobs that need more than 36 hours
+ ``testing`` for very short jobs only
**Slurm**
+ ``short`` (default)
+ ``long`` for jobs that need more than 24 hours + ``long`` for jobs that need more than 24 hours
+ ``test`` short jobs up to 1 hour + ``short`` (default) regular jobs, nodes shared with ``long`` partition
+ ``gpu`` for jobs that need a GPU + ``quick`` dedicated group of cores for urgent jobs
+ ``test`` short jobs up to 1 hour (debug, develop), GPU available
+ ``gpu`` dedicated nodes for jobs that need a GPU
=============== ====================== ========== =========================
Partition Size Time Limit :ref:`GPUs <gpu_list>`
=============== ====================== ========== =========================
long 1248 cores (shared) unlimited None
--------------- ---------------------- ---------- -------------------------
short (default) 1504 cores (shared) 24 hours None
--------------- ---------------------- ---------- -------------------------
quick 120 cores (dedicated) 2 hours None
--------------- ---------------------- ---------- -------------------------
test 4 cores (dedicated) 1 hour 2
--------------- ---------------------- ---------- -------------------------
gpu 176 cores (dedicated) unlimited 22
=============== ====================== ========== =========================
--------- ---------
...@@ -75,15 +83,9 @@ A job is a piece of code that requires combination of those resources to run ...@@ -75,15 +83,9 @@ A job is a piece of code that requires combination of those resources to run
correctly. Thus you can request each of those resources separately. For correctly. Thus you can request each of those resources separately. For
instance a computational problem *might* consist of the following: instance a computational problem *might* consist of the following:
A A. 10.000 single threaded Jobs each running only a couple of minutes and a memory foot print of 100MB.
10.000 single threaded Jobs each running only a couple of minutes and a memory foot print of 100MB. B. 20 jobs where each can use as many local processors as possible requiring 10GB of memory each with an unknown or varying running time.
C. A single job that is able to utilize the whole cluster at once using a network layer such as Message Passing Interface (MPI)
B
20 jobs where each can use as many local processors as possible requiring 10GB of memory each with an unknown or varying running time.
C
A single job that is able to utilize the whole cluster at once using a network
layer such as Message Passing Interface (MPI)
All of the above requirements need to be represented with a job description so All of the above requirements need to be represented with a job description so
...@@ -96,5 +98,5 @@ efficiency and fairness. ...@@ -96,5 +98,5 @@ efficiency and fairness.
.. important:: .. important::
The need for GPU scheduling is the reason we are switching from Torque to The need for GPU scheduling is the reason we switched from Torque to
SLURM. If you want to submit CUDA jobs, you **have** to use SLURM. SLURM. If you want to submit CUDA jobs, you **have** to use SLURM.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment