Skip to content
python.rst 9.33 KiB
Newer Older
Python
======

Python is probably the most versatile programming ecosystem available on the Tardis.
It's an interpreted language that is easy to read, quick to prototype in and comes with a huge number of useful libraries.

Versions
--------

Users need to be aware, that there are still two major versions around that are not 100% compatible to each other.
Although Python 3.0 was released back in 2008, its predecessor Python 2.7 is still around and not all libraries and packages have migrated.
Since the system version on Debian/stable is still 2.7, the :file:`python` program always points to Python 2 (2.7).
The :file:`python3` program designates the counterpart (3.5).

The system and core python libraries are, though stable, still quite old (even in the Python 3 branch).
You can use environment modules to switch to a newer version.

Example:

.. code-block:: bash

   # system libraries
   Python 2.7.13
   python3 --version
   Python 3.5.3

   # load new version for python 3
   module load python/3.6
   python3 --version
   Python 3.6.3



Packages
--------

Some important packages are already installed system-wide.
To see if they are available, simply try to import them:

.. code-block:: python

   Python 3.5.1 (default, Mar 14 2016, 16:32:54)
   [GCC 4.7.2] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import lxml
   >>> lxml.__path__
   ['/opt/software/python/3.5/lib/python3.5/site-packages/lxml']


If you happen to need a more recent one or something that hasn't been installed already, the easiest way is to use the default Python package manager :file:`pip`.

Of course it comes separately for Python 2 and 3:

.. code-block:: bash

   pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
   pip 9.0.1 from /opt/software/python/3.6.3/lib/python3.6/site-packages (python 3.6)



To install a package in your home directory, you can simply run :program:`pip3 install --user <packagename>`.

Example (install numpy):

.. code-block:: bash

   pip3 install --user numpy
   Collecting numpy
     Downloading numpy-1.11.2-cp35-cp35m-manylinux1_x86_64.whl (15.6MB)
       100% |████████████████████████████████| 15.6MB 58kB/s
   Installing collected packages: numpy
   Successfully installed numpy-1.11.2
   Python 3.5.1 (default, Mar 14 2016, 16:32:54)
   [GCC 4.7.2] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import numpy as np
   >>> np.__version__, np.__path__
   ('1.11.2', ['/home/mpib/krause/.local/lib/python3.5/site-packages/numpy'])


Remember to always install on the tardis login node in case a package needs
special development files which aren't installed on the computation hosts.

.. note::

   The Python Package Index (`PyPI`_), pip's source, contains a lot of user provided, custom and sometimes old and unstable software. Make sure that what you're installing is actually the package that you want. Usually the project's installations notes tell you what the package is called in pypi.

Virtual Environments
--------------------

Sometimes, especially for reproducibility reasons, it may be useful to freeze
the python package versions to a specific release.  You could create a
:file:`requirements.txt` with version numbers and then always run `pip3 install
--user -r requirements.txt`, when you switch a project, but it's way more
convenient to use different python environments for each project.

Starting with Python 3.4 the program `pyvenv`_ will help you manage different environments.
Once created it will copy the current system version's python and pip to a new directory with the environment's name.
Every package installed or upgraded will be contained to that specific directory and you can switch between them very easily.
For convenience reasons we also installed a module called `virtualenvwrapper`_, which provides three important commands to handle environments: `mkvirtualenv`, `workon`, and `deactivate`.

**Create a new virtual environment**

.. code-block:: bash

Michael Krause's avatar
Michael Krause committed
   $ mkvirtualenv --python=$(which python3) project


**Activate and use the virtual environment**

.. code-block:: bash


   # without virtual environment
   Python 2.7.13

   # with virtual environment
   workon project
   python --version
   Python 3.5.3
   /home/mpib/krause/.virtualenvs/project/bin/python
   pip 9.0.1 from /home/mpib/krause/.virtualenvs/project/lib/python3.5/site-packages (python 3.5)
   (project) [krause@master ~] pip install numpy
   Collecting numpy
   Using cached https://files.pythonhosted.org/packages/fe/94/7049fed8373c52839c8cde619acaf2c9b83082b935e5aa8c0fa27a4a8bcc/numpy-1.15.1-cp35-cp35m-manylinux1_x86_64.whl
   Installing collected packages: numpy
   Successfully installed numpy-1.15.1



**Deactivate the virtual environment**

.. code-block:: bash

  (project) $ deactivate
  $
Michael Krause's avatar
Michael Krause committed

Note how :file:`virtualenv` is also managing your shell prompt, so you always
know which Python environment you are currently running. All your virtual
environments created this way reside in your home directory under
:file:`~/.virtualenvs/`. In theory you could just run the virtual python
interpreter that is installed in :file:`~/.virtualenvs/<env_name>/bin/python`
directly. It is much more convenient to use the wrapper functions though.

**Upgrade a virtual environment**

If you intend to upgrade the python version in a virtualenv from 3.X to 3.Y you
will have to rebuild the virtualenv and install new packages, similar to
R major version upgrades. The process looks something like this:


.. code-block:: bash

    # 1. activate the old environment
    workon ENVNAME
    # 2. freeze the environment / packages with version
    pip freeze > env.txt
    # 3. verify env.txt makes sense
    cat env.txt
    # 4. leave the environment
    deactivate
    # 5. remove it
    rmvirtualenv ENVNAME
    # 6. switch to your desired python version
    module load python/3.7
    # 7. rebuild the environment with the new python
    mkvirtualenv -p $(which python3) ENVNAME
    # 8. reinstall all packages from the freeze file
    pip install -r env.txt
    # 9. inspect version incompatibilities, some old packages might not work/build with your new python
    diff env.txt <(pip freeze) # should output nothing or the difference of the old and new env
.. important::

   To use the virtualenvwrapper convenience functions (workon etc) in a torque/SLURM
   job file you need to add one of the following lines to your job definitions:
   :file:`source /etc/bash_completion` **or** :file:`module load virtualenvwrapper`

Michael Krause's avatar
Michael Krause committed
Conda
-----

Another approach to virtual environments (and a whole virtual operating system
in fact) is provided by a third party, commercial python distribution called
`Anaconda`_ (:file:`conda`). Though discouraged for smaller projects, you can
use an environment module to load and activate a (mini)conda distribution on the
Tardis:


.. code-block:: bash

   [krause@master ~] module avail conda

   -------- /opt/environment/modules --------
   conda/4.7.10
   [krause@master ~] module load conda
   [krause@master ~] conda -V
   conda 4.7.10

Once loaded, just like with `pyvenv` or `virtualenv`, you can create and manage
multiple conda environments and keep specific python versions and their library
dependencies in it. Note however, that conda will also download and manage
a large number of system libraries, which *may* make bugs very hard to debug
and could lead to unexpected reproducibility issues. Some software however can
only be installed with conda and I strongly recommend to limit the use of conda
for those specific projects.

One example of those projects is Theano and its optional dependency pygpu. To
install Theano (or other conda-only packages) you can create a new environment:

.. code-block:: bash

   [krause@master ~] module load conda                  # activate conda itself
   [krause@master ~] conda create --yes --name theano
   Collecting package metadata (current_repodata.json): done
   Solving environment: done
   [...]

   [krause@master ~] conda activate theano              # activate a conda env
   (theano) [krause@master ~] # now you can install packages into the env
   (theano) [krause@master ~] conda install --yes numpy scipy mkl
   [...]
   (theano) [krause@master ~] conda install --yes theano pygpu
   (theano) [krause@master ~] which python
   /home/beegfs/krause/.conda/envs/theano/bin/python
   (theano) [krause@master ~] python
   Python 3.7.4 (default, Aug 13 2019, 20:35:49)
   [GCC 7.3.0] :: Anaconda, Inc. on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import theano as t
   WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
   >>>


To deactivate (and possibly remove) an existing conda environment, run:

.. code-block:: bash

   (theano) [krause@master ~] conda deactivate
   [krause@master ~] # deactivated, safe to remove

   [krause@master ~] conda remove --yes --name theano --all
    Remove all packages in environment /home/beegfs/krause/.conda/envs/theano:
	[...]


.. _PyPI: https://pypi.python.org/pypi
.. _pyvenv: https://virtualenvwrapper.readthedocs.io/en/latest
.. _virtualenvwrapper: https://packaging.python.org/installing/#creating-virtual-environments
Michael Krause's avatar
Michael Krause committed
.. _Anaconda: https://docs.conda.io/projects/conda