jobs.inc 5.68 KB
Newer Older
1
.. _torque_jobs:
2

Michael's avatar
Michael committed
3
Example Jobs
Michael Krause's avatar
Michael Krause committed
4
------------
Michael Krause's avatar
next  
Michael Krause committed
5

Michael's avatar
Michael committed
6
7
8
9
10
11
We mentioned job files as parameters to qsub in the last section. They are
a convenient way of collecting job properties without clobbering the command
line. It's also useful to programmatically create a job description and capture
it in a file.

Simple Jobs
Michael Krause's avatar
Michael Krause committed
12
+++++++++++
Michael's avatar
Michael committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

The simplest job file just consist of a list of shell commands to be executed.
In that case it is equivalent to a shell script.

Example ``simple_job.pbs``

.. code-block:: bash

   cd project/
   ./run_simulation.py


You can then submit that job with ``qsub simple_job.pbs``.


Now that is rarely sufficient. In most cases you are going to need some
resource requests and a state variable as you are very likely to submit
multiple similar jobs. It is possible to add qsub parameters (see
Michael Krause's avatar
Michael Krause committed
31
:ref:`torque_resources`) inside the job file.
Michael's avatar
Michael committed
32
33
34
35
36
37
38
39
40
41
42

Example ``job_with_resources.pbs``

.. code-block:: bash

    #PBS -N myjob
    #PBS -l walltime=10:0:0
    #PBS -l mem=32gb
    #PBS -j oe
    #PBS -o $HOME/logs/
    #PBS -m n
Michael's avatar
Michael committed
43
    #PBS -d .
Michael's avatar
Michael committed
44
45
46
47
48
49
50
51

    ./run_simulation.py

This would create a job called **myjob** that needs **10 hours** of running time,
**32 gigabyte of RAM** and a **joined** stdout/stderror stream stored in a folder
called **$HOME/logs/**. It will **not send any e-mails** and start in the **current
directory**.

52
Interactive Jobs
Michael Krause's avatar
Michael Krause committed
53
++++++++++++++++
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

Sometimes it may be useful to get a quick shell on one of the compute nodes.

Before submitting hundreds or thousands of jobs you might want to run some
simple checks to ensure all the paths are correct and the software is loading
as expected. Although you can usually run these tests on the master itself there are
cases when this is dangerous, for example when your tests quickly require lot's
of memory. In that case you should move those tests to one of the compute nodes:

.. code-block:: bash

   qsub -I

This will submit a job that requests a shell. The submission will block until the job gets scheduled. When there are lot's of jobs in the queue the scheduling might take some time. To speed things up you can submit to the testing queue which only allows jobs with a very short running time: Example:

.. code-block:: bash

    krause@master:~> $ qsub -I -q testing
    qsub: waiting for job 4465022.master.tardis.mpib-berlin.mpg.de to start
    qsub: job 4465022.master.tardis.mpib-berlin.mpg.de ready

    krause@ood-9:~> $


Michael Krause's avatar
Michael Krause committed
78
.. _torque_job_wrappers:
79
80

Job Wrappers
Michael Krause's avatar
Michael Krause committed
81
++++++++++++
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

Another common pattern is to create a script that would loop over some input
space and then create a job-file line-by-line and submit it at the end of each
loop run. Creating a file line-by-line can be done with the file redirection
operators ``>`` and ``>>``.

To **delete** a file **and then write** a single line to a file ``tmp.file`` you could:

.. code-block:: bash

   echo "this is some line" > tmp.file

To **append** to an existing file you can use ``>>``:

.. code-block:: bash

   echo "this is a second line" >> tmp.file


Using this operator a **job creation wrapper** could look like this:


.. code-block:: bash

    #!/bin/bash

    for input_file in INPUT/* ; do
        echo "#PBS -m n"                          > tmp.pbs
        echo "#PBS -o /dev/null"                 >> tmp.pbs
        echo "#PBS -j oe"                        >> tmp.pbs
        echo "#PBS -l walltime=96:0:0"           >> tmp.pbs
        echo "#PBS -d ."                         >> tmp.pbs

        echo "./run_analysis.py $input_file"     >> tmp.pbs
        qsub tmp.pbs
    done

119
    rm -f tmp.pbs
120
121
122
123
124
125
126
127

A different syntax to get exactly the same thing:

.. code-block:: bash

    #!/bin/bash

    cat > tmp.pbs <<EOF
Michael Krause's avatar
Michael Krause committed
128
129
    #PBS -m n
    #PBS -o /dev/null
130
    #PBS -j oe
Michael Krause's avatar
Michael Krause committed
131
132
    #PBS -l walltime=96:00:00
    #PBS -d .
133
134
135
136
137
138
139
140
141

    ./run_analysis.py %VAR1%
    EOF

    for input_file in INPUT/* ; do
        sed "s/%VAR1%/${input_file}/" tmp.pbs | qsub
    done

    rm -f tmp.pbs
Michael's avatar
Michael committed
142

Michael's avatar
Michael committed
143
144
.. _job_array:

Michael's avatar
Michael committed
145
Environment Variables
Michael Krause's avatar
Michael Krause committed
146
++++++++++++++++++++++
Michael's avatar
Michael committed
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199

There are a number of **environment variables** available to each job, for instance:

.. code-block:: bash


    krause@ood-32:~> $ env | grep PBS | sort
    PBS_ARRAYID=1
    PBS_ENVIRONMENT=PBS_BATCH
    PBS_JOBCOOKIE=22E3D79E015B9F5EE9E205EBF8CB64E7
    PBS_JOBID=4294864-1.master.tardis.mpib-berlin.mpg.de
    PBS_JOBNAME=STDIN-1
    PBS_MOMPORT=15003
    PBS_NODEFILE=/var/spool/torque/aux//4294864-1.master.tardis.mpib-berlin.mpg.de
    PBS_NODENUM=0
    PBS_NUM_NODES=1
    PBS_NUM_PPN=1
    PBS_O_HOME=/home/mpib/krause
    PBS_O_HOST=master.tardis.mpib-berlin.mpg.de
    PBS_O_LANG=de_DE.UTF-8
    PBS_O_LOGNAME=krause
    PBS_O_MAIL=/var/mail/krause
    PBS_O_PATH=/home/mpib/krause/bin:/opt/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
    PBS_O_QUEUE=route
    PBS_O_SHELL=/bin/bash
    PBS_O_WORKDIR=/home/mpib/krause
    PBS_QUEUE=default
    PBS_SERVER=master
    PBS_TASKNUM=1
    PBS_VERSION=TORQUE-2.4.16
    PBS_VNODENUM=0

Those variables are dynamic to a job and can be used in your scripts. You could
create a **unique directory or file** based on ``$PBS_JOBID`` for example.
Another extremely useful feature is ``$PBS_ARRAYID``. When you submit a job
array with the ``-t`` parameter this variable holds the current index. So you
could easily run multiple unique jobs like this:

.. code-block:: bash

   echo './run_simulation $PBS_ARRAYID' | qsub -t 1-100 -d .

This will create 100 jobs with indices between 1 and 100, each starting in the
current working directory. **Note** the use of the **single quotes** here. This
is important, as you do not want to evaluate the variable ``$PBS_ARRAYID`` at
the time of submission but at the time of execution!

Alternatively you can quote the ``$`` character like this:

.. code-block:: bash

   echo "./run_simulation \$PBS_ARRAYID" | qsub -t 1-100 -d .