.. _slurm_jobs: Example Jobs ------------ SLURM job concepts exceed those of Torque and this document will try to match those. For a comprehensible list, check out the official `slurm documentation`_. Simple Jobs +++++++++++ The simplest job file just consist of a list of shell commands to be executed. In that case it is equivalent to a shell script. Note, that in contrast to Torque, SLURM jobs have to start with a hash-bang (#!) line. Example ``simple_job.job`` .. code-block:: bash #!/bin/bash ./run_simulation.py To submit this job run: .. code-block:: bash sbatch simple_job.job Now that is rarely sufficient. In most cases you are going to need some resource requests and a state variable as you are very likely to submit multiple similar jobs. It is possible to add SLURM parameters (see :ref:`slurm_resources`) inside the job file. Example ``job_with_resources.job`` .. code-block:: bash #!/bin/bash #SBATCH --job-name myjob #SBATCH --partition gpu #SBATCH --time 24:0:0 #SBATCH --cpus-per-task 2 #SBATCH --mem 32GB #SBATCH --gres gpu:1 #SBATCH --mail-type NONE #SBATCH --workdir . ./run_simulation.py This would create a job called "myjob" in the GPU partiton, that needs 24 hours of running time, 32GB of RAM, a single GPU of any type, and 2 processors. It will not send any e-mails and start in the current directory. Interactive Jobs ++++++++++++++++ Sometimes it may be useful to get a quick shell on one of the compute nodes. Before submitting hundreds or thousands of jobs you might want to run some simple checks to ensure all the paths are correct and the software is loading as expected. Although you can usually run these tests on the login node itself there are cases when this is dangerous, for example when your tests quickly require lot's of memory. In that case you should move those tests to one of the compute nodes: .. code-block:: bash srun --pty bash This will submit a job that requests a shell. The submission will block until the job gets scheduled. Note that we do not use `sbatch`, but the similar command `srun`. The fundamental distinction here is that `srun` will block until the command or job gets scheduled, while `sbatch` puts the job into the queue and returns immediately. The parameter ``--pty`` allocates a psdeudo-terminal to the program so input/output works as expected. When there are lot's of jobs in the queue the scheduling might take some time. To speed things up you can submit to the testing queue which only allows jobs with a very short running time: Example: .. code-block:: bash srun -p test --pty /bin/bash Other useful examples are: .. code-block:: bash # get a quick R session with 2 cores in the test partition srun -p test -c 2 --pty R # Start the most recent Matlab with 32GB # two bash commands need to be passed to bash -c srun --mem 32g --pty bash -c 'module load matlab ; matlab' # Start some python script with 1 GPU and 2 cores srun -p gpu --gres gpu -c 2 python3 main.py .. _slurm_job_wrappers: Job Wrappers ++++++++++++ Usually users want to collect a number of jobs into batches and submit them with one command. There are a number of approaches to do that. The most straight forward way is to use a minimal ``submit.sh`` shell script that could look a bit like this: .. code-block:: bash #!/bin/bash for sub in $(seq -w 1 15) ; do echo '#!/bin/bash' > job.slurm echo "#SBATCH --job-name main_$sub" >> job.slurm echo "#SBATCH --cpus 2" >> job.slurm echo "python main.py $sub" >> job.slurm sbatch job.slurm rm -f job.slurm done This can be condensed down into a single line with the ``--wrap`` option to sbatch: .. code-block:: bash for sub in $(seq -w 1 15) ; do\ sbatch -c 2 -J main_$sub --wrap "python3 main.py $sub" ;\ done .. _slurm documentation: https://slurm.schedmd.com/quickstart.html