Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Michael Krause
tardis-doc
Commits
ea4384b3
Commit
ea4384b3
authored
May 03, 2021
by
Michael Krause
🎉
Browse files
Clarify and introduce new slurm partition
parent
efa02ee8
Pipeline
#9819
passed with stages
in 34 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
rm/general.rst
View file @
ea4384b3
...
...
@@ -45,22 +45,30 @@ Assume that in the image above jobs on the right hand side have been submitted
earlier than those on the left side. It is absolutely possible that the next
feasible job in that queue is not a green one but a blue or red one. The
decision depends on the amount of resources a job requires and the amount of
resources the corresponding job owner ha
d
used in last 7 days.
resources the corresponding job owner ha
s
used in last 7 days.
Each queue has different parameter sets and resource targets:
Each queue
/partition
has different parameter sets and resource targets:
**Torque**
+ ``default``
+ ``longwall`` for jobs that need more than 36 hours
+ ``testing`` for very short jobs only
**Slurm**
+ ``short`` (default)
+ ``long`` for jobs that need more than 24 hours
+ ``test`` short jobs up to 1 hour
+ ``gpu`` for jobs that need a GPU
+ ``short`` (default) regular jobs, nodes shared with ``long`` partition
+ ``quick`` dedicated group of cores for urgent jobs
+ ``test`` short jobs up to 1 hour (debug, develop), GPU available
+ ``gpu`` dedicated nodes for jobs that need a GPU
=============== ====================== ========== =========================
Partition Size Time Limit :ref:`GPUs <gpu_list>`
=============== ====================== ========== =========================
long 1248 cores (shared) unlimited None
--------------- ---------------------- ---------- -------------------------
short (default) 1504 cores (shared) 24 hours None
--------------- ---------------------- ---------- -------------------------
quick 120 cores (dedicated) 2 hours None
--------------- ---------------------- ---------- -------------------------
test 4 cores (dedicated) 1 hour 2
--------------- ---------------------- ---------- -------------------------
gpu 176 cores (dedicated) unlimited 22
=============== ====================== ========== =========================
---------
...
...
@@ -75,15 +83,9 @@ A job is a piece of code that requires combination of those resources to run
correctly. Thus you can request each of those resources separately. For
instance a computational problem *might* consist of the following:
A
10.000 single threaded Jobs each running only a couple of minutes and a memory foot print of 100MB.
B
20 jobs where each can use as many local processors as possible requiring 10GB of memory each with an unknown or varying running time.
C
A single job that is able to utilize the whole cluster at once using a network
layer such as Message Passing Interface (MPI)
A. 10.000 single threaded Jobs each running only a couple of minutes and a memory foot print of 100MB.
B. 20 jobs where each can use as many local processors as possible requiring 10GB of memory each with an unknown or varying running time.
C. A single job that is able to utilize the whole cluster at once using a network layer such as Message Passing Interface (MPI)
All of the above requirements need to be represented with a job description so
...
...
@@ -96,5 +98,5 @@ efficiency and fairness.
.. important::
The need for GPU scheduling is the reason we
are
switch
ing
from Torque to
The need for GPU scheduling is the reason we switch
ed
from Torque to
SLURM. If you want to submit CUDA jobs, you **have** to use SLURM.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment