We mentioned job files as parameters to sbatch in the general section. They are
a convenient way of collecting job properties without clobbering the command
line. It's also useful to programmatically create a job description and capture
it in a file.
SLURM job concepts far exceed those of Torque and this document will try to match those. For a comprehensible list, check out the official `slurm documentation`_.
SLURM job concepts exceed those of Torque and this document will try to
match those. For a comprehensible list, check out the official `slurm
documentation`_.
Simple Jobs
+++++++++++
...
...
@@ -21,17 +19,20 @@ Example ``simple_job.job``
.. code-block:: bash
#!/bin/bash
cd project/
./run_simulation.py
You can then submit that job with ``sbatch simple_job.job``.
To submit this job run:
.. code-block:: bash
sbatch simple_job.job
Now that is rarely sufficient. In most cases you are going to need some
resource requests and a state variable as you are very likely to submit
multiple similar jobs. It is possible to add slurm parameters (see
:doc:`slurm_resources`) inside the job file.
Now that is rarely sufficient. In most cases you are going to
need some resource requests and a state variable as you are very likely to
submit multiple similar jobs. It is possible to add SLURM parameters (see
:ref:`slurm_resources`) inside the job file.
Example ``job_with_resources.job``
...
...
@@ -46,15 +47,14 @@ Example ``job_with_resources.job``
#SBATCH --mem 32GB
#SBATCH --gres gpu:1
#SBATCH --mail-type NONE
#SBATCH --chdir .
#SBATCH --workdir .
./run_simulation.py
This would create a job called **myjob** in the GPU queue (partiton), that
needs **24 hours** of running time, **32 gigabyte of RAM**, a single GPU of any
type, 2 processors. It will **not send any e-mails** and start in the **current
directory**.
This would create a job called "myjob" in the GPU partiton, that needs 24 hours
of running time, 32GB of RAM, a single GPU of any type, and 2 processors. It
will not send any e-mails and start in the current directory.
Interactive Jobs
++++++++++++++++
...
...
@@ -63,19 +63,20 @@ Sometimes it may be useful to get a quick shell on one of the compute nodes.
Before submitting hundreds or thousands of jobs you might want to run some
simple checks to ensure all the paths are correct and the software is loading
as expected. Although you can usually run these tests on the master itself there are
cases when this is dangerous, for example when your tests quickly require lot's
of memory. In that case you should move those tests to one of the compute nodes:
as expected. Although you can usually run these tests on the login node itself
there are cases when this is dangerous, for example when your tests quickly
require lot's of memory. In that case you should move those tests to one of the
compute nodes:
.. code-block:: bash
srun --job-name test --pty /bin/bash
srun --pty bash
This will submit a job that requests a shell. The submission will block until
the job gets scheduled. Note that we did not use `sbatch` here, but the similar
the job gets scheduled. Note that we do not use `sbatch`, but the similar
command `srun`. The fundamental distinction here is that `srun` will block
until the command or job gets scheduled, while `sbatch` puts the job into the
queue and returns immediately.
queue and returns immediately. The parameter ``--pty`` allocates a psdeudo-terminal to the program so input/output works as expected.
When there are lot's of jobs in the queue the scheduling might take some time.
To speed things up you can submit to the testing queue which only allows jobs
...
...
@@ -83,8 +84,22 @@ with a very short running time: Example:
.. code-block:: bash
[krause@master ~/slurmtests] srun --partition gpu --job-name test --pty /bin/bash
[krause@gpu-1 ~/slurmtests]
srun -p test --pty /bin/bash
Other useful examples are:
.. code-block:: bash
# get a quick R session with 2 cores in the test partition
@@ -92,6 +107,29 @@ with a very short running time: Example:
Job Wrappers
++++++++++++
TBD, check here later.
Usually users want to collect a number of jobs into batches and submit them with one command. There are a number of approaches to do that. The most straight forward way is to use a minimal ``submit.sh`` shell script that could look a bit like this:
.. code-block:: bash
#!/bin/bash
for sub in $(seq -w 1 15) ; do
echo '#!/bin/bash' > job.slurm
echo "#SBATCH --job-name main_$sub" >> job.slurm
echo "#SBATCH --cpus 2" >> job.slurm
echo "python main.py $sub" >> job.slurm
sbatch job.slurm
rm -f job.slurm
done
This can be condensed down into a single line with the ``--wrap`` option to