We mentioned job files as parameters to sbatch in the general section. They are
SLURM job concepts exceed those of Torque and this document will try to
a convenient way of collecting job properties without clobbering the command
match those. For a comprehensible list, check out the official `slurm
line. It's also useful to programmatically create a job description and capture
documentation`_.
it in a file.
SLURM job concepts far exceed those of Torque and this document will try to match those. For a comprehensible list, check out the official `slurm documentation`_.
Simple Jobs
Simple Jobs
+++++++++++
+++++++++++
...
@@ -21,17 +19,20 @@ Example ``simple_job.job``
...
@@ -21,17 +19,20 @@ Example ``simple_job.job``
.. code-block:: bash
.. code-block:: bash
#!/bin/bash
#!/bin/bash
cd project/
./run_simulation.py
./run_simulation.py
You can then submit that job with ``sbatch simple_job.job``.
To submit this job run:
.. code-block:: bash
sbatch simple_job.job
Now that is rarely sufficient. In most cases you are going to need some
Now that is rarely sufficient. In most cases you are going to
resource requests and a state variable as you are very likely to submit
need some resource requests and a state variable as you are very likely to
multiple similar jobs. It is possible to add slurm parameters (see
submit multiple similar jobs. It is possible to add SLURM parameters (see
:doc:`slurm_resources`) inside the job file.
:ref:`slurm_resources`) inside the job file.
Example ``job_with_resources.job``
Example ``job_with_resources.job``
...
@@ -46,15 +47,14 @@ Example ``job_with_resources.job``
...
@@ -46,15 +47,14 @@ Example ``job_with_resources.job``
#SBATCH --mem 32GB
#SBATCH --mem 32GB
#SBATCH --gres gpu:1
#SBATCH --gres gpu:1
#SBATCH --mail-type NONE
#SBATCH --mail-type NONE
#SBATCH --chdir .
#SBATCH --workdir .
./run_simulation.py
./run_simulation.py
This would create a job called **myjob** in the GPU queue (partiton), that
This would create a job called "myjob" in the GPU partiton, that needs 24 hours
needs **24 hours** of running time, **32 gigabyte of RAM**, a single GPU of any
of running time, 32GB of RAM, a single GPU of any type, and 2 processors. It
type, 2 processors. It will **not send any e-mails** and start in the **current
will not send any e-mails and start in the current directory.
directory**.
Interactive Jobs
Interactive Jobs
++++++++++++++++
++++++++++++++++
...
@@ -63,19 +63,20 @@ Sometimes it may be useful to get a quick shell on one of the compute nodes.
...
@@ -63,19 +63,20 @@ Sometimes it may be useful to get a quick shell on one of the compute nodes.
Before submitting hundreds or thousands of jobs you might want to run some
Before submitting hundreds or thousands of jobs you might want to run some
simple checks to ensure all the paths are correct and the software is loading
simple checks to ensure all the paths are correct and the software is loading
as expected. Although you can usually run these tests on the master itself there are
as expected. Although you can usually run these tests on the login node itself
cases when this is dangerous, for example when your tests quickly require lot's
there are cases when this is dangerous, for example when your tests quickly
of memory. In that case you should move those tests to one of the compute nodes:
require lot's of memory. In that case you should move those tests to one of the
compute nodes:
.. code-block:: bash
.. code-block:: bash
srun --job-name test --pty /bin/bash
srun --pty bash
This will submit a job that requests a shell. The submission will block until
This will submit a job that requests a shell. The submission will block until
the job gets scheduled. Note that we did not use `sbatch` here, but the similar
the job gets scheduled. Note that we do not use `sbatch`, but the similar
command `srun`. The fundamental distinction here is that `srun` will block
command `srun`. The fundamental distinction here is that `srun` will block
until the command or job gets scheduled, while `sbatch` puts the job into the
until the command or job gets scheduled, while `sbatch` puts the job into the
queue and returns immediately.
queue and returns immediately. The parameter ``--pty`` allocates a psdeudo-terminal to the program so input/output works as expected.
When there are lot's of jobs in the queue the scheduling might take some time.
When there are lot's of jobs in the queue the scheduling might take some time.
To speed things up you can submit to the testing queue which only allows jobs
To speed things up you can submit to the testing queue which only allows jobs
...
@@ -83,8 +84,22 @@ with a very short running time: Example:
...
@@ -83,8 +84,22 @@ with a very short running time: Example:
.. code-block:: bash
.. code-block:: bash
[krause@master ~/slurmtests] srun --partition gpu --job-name test --pty /bin/bash
srun -p test --pty /bin/bash
[krause@gpu-1 ~/slurmtests]
Other useful examples are:
.. code-block:: bash
# get a quick R session with 2 cores in the test partition
@@ -92,6 +107,29 @@ with a very short running time: Example:
...
@@ -92,6 +107,29 @@ with a very short running time: Example:
Job Wrappers
Job Wrappers
++++++++++++
++++++++++++
TBD, check here later.
Usually users want to collect a number of jobs into batches and submit them with one command. There are a number of approaches to do that. The most straight forward way is to use a minimal ``submit.sh`` shell script that could look a bit like this:
.. code-block:: bash
#!/bin/bash
for sub in $(seq -w 1 15) ; do
echo '#!/bin/bash' > job.slurm
echo "#SBATCH --job-name main_$sub" >> job.slurm
echo "#SBATCH --cpus 2" >> job.slurm
echo "python main.py $sub" >> job.slurm
sbatch job.slurm
rm -f job.slurm
done
This can be condensed down into a single line with the ``--wrap`` option to