Singularity
===========

(Most documentation here and much valuable background work was done by Dominique Hansen https://github.com/hansendx)

Singularity enables users to package their computational research environments,
making them more mobile and improving repeatability and reproducibility of
results. It can build and run so called containers, whose content can be
predefined in special text files called recipes.  A container, once built, can
be run on any system, that has singularity itself installed, even if it
contains software not normally supported for it. Since a singularity container is
(outwardly) only comprised of one file, it is easy to distribute it to other
computers.  This improves mobility and guarantees a consistent work environment
even if workflows are performed over heterogeneous set of computers.  It also
enables users to choose their software to use in environments like HPC-Clusters
without root privileges and without being dependent on system administrators.
All that is needed, is a singularity installation on the system.  A container
can also be published alongside research data and results, helping to ensure
good scientific practice by making results more easily testable.  Singularity
is however limited to software executable in linux based operating systems.

**Singularity in short:**

 * Helps users to build their software environment how they want it and use it where they want it. (with limits)
 * Helps the research community, by supporting the improvement of research software reproducibility.
 * May be necessary to have root privileges to build containers, but not to use them.
 * Can "only" run software compatible with linux distributions.


Usage
-----

Singularity can be used through the system shell/terminal.

.. code-block:: shell

   # Get help
   singularity --help
   # Pull from some registry
   singularity pull python-3.8-alpine.sif docker://python:3.8-alpine
   # Build from recipe
   singularity build my-image.sif my-container.recipe
   # Run default entry point
   singularity run python-3.8-alpine.sif
   # Execute other programs from the container
   singularity exec python-3.8-alpine.sif /bin/sh

Getting Containers
------------------

Pre-built containers can be obtained from container repositories. There are
multiple sources for that.

SingularityHub
~~~~~~~~~~~~~~

An open source, research focused repository is called `Singularity Hub
<https://www.singularity-hub.org/>`_ and hosts containers created by the user
community.

Singularity Hub allows users to link their GitHub account and projects with its
repository.  `Recipe <recipe_>`_ files, stored in chosen branches and
`conforming to specific naming conventions
<https://github.com/singularityhub/singularityhub.github.io/wiki/Automated-Build#finding-recipes>`_,
are then build and published through on the repository.  The repositories
content can be browsed and searched through, using the webpage.  Containers are
grouped in collections, defined by the username of the owner.  Every container
in a collection has a name and a version_.

**Every container visible on singularity-hub can be downloaded, using singularity with the pull command:**

.. code-block:: shell

    singularity pull shub://collection_name/container_name:version

The :code:`shub://` prefix tells singularity, that the source to download from is a singularity registry (sregistry).


Docker Integration
~~~~~~~~~~~~~~~~~~

Docker is another containerization solution. It is more focused on industry
and large scale software systems, while singularity is research focused. Docker
is not suited for shared systems like HPC-Clusters, since it causes problems
with user permissions_. There is however a lot of software already
containerized and available at their container repository `Docker Hub
<https://hub.docker.com/>`_.  Singularity can load docker images and convert
them into the singularity format, which can shrink the workload of building
containers. The command to pull a docker container would look somewhat like
this:

.. code-block:: shell

    singularity pull docker://python
    # Or with version specified
    singularity pull docker://python:3.7.0b3-alpine

Docker Hub stores some images, that are from official partners, like the python
example. Those do not have names in the form of :code:`username/container`.
Images, that are created my normal users do however have this naming schema.


.. _version:

Version Tags
~~~~~~~~~~~~

Version Tags are used by sregistries and Singularity Hub to distinguish
different versions of containers with the same name.
They are set during the building and publishing process. Singularity Hub takes
its version tags from the suffix (everything after the first . occurrence) of
the recipe_ files name.
A recipe_ named **Singularity.1.0** would result in the version tag **1.0**.
A recipe_ only named **Singularity** would lead to the assignment of the
special tag **latest** Addressing an image with shub// without a version tag
will lead to the use of the **latest** tag.
The following statements should therefore be equivalent:

.. code-block:: shell

    singularity pull shub://collection/container
    singularity pull shub://collection/container:latest

To improve reproducibility, a version tag should always be specified.

Container Architecture
----------------------

The default singularity settings are configured to automatically mount the
user's :code:`$HOME` directory and the current location :code:`$PWD` into
a container when it starts one. All other file systems need to be explicitly
mapped with the argument *-B*. This behaviour is very convenient, but it's also
dangerous regarding reproducibility. For instance, if you start an R session
and install a package in a container it will actually be placed in your home
folder outside of the container and give you the impression that the container
is somehow writable, when it's actually not.

The container has its own binaries, meaning that software installed on the host
should normally not be available in a container.

Containers are normally single image-files in the `SquashFS
<https://en.wikipedia.org/wiki/SquashFS>`_ format.  SquashFS is read-only by
design.

.. _permissions:

Permissions
~~~~~~~~~~~

The users privileges do not change inside the container.  This is different to
other container solutions like docker, that change the users role to that set
in the container, which also changes their privileges settings.  (This would
allow `Privilege escalation
<https://en.wikipedia.org/wiki/Privilege_escalation>`_ on shared systems.)

.. _build:

Building a Container
--------------------

To gain most visibility for a new container it's possible to build a docker
container first and then on top of that build a singularity container as seen above.
Check out the `official Docker guide <https://docs.docker.com/get-started/part2/>`_ to
see how to build a docker container.

Building a container from a singularity recipe requires root privileges.
Users, that want to build singularity containers themselves need access to
a (linux) computer where it is possible to attain those root privileges.  The most
common way to build a container is by using a recipe_ file and the singularity
build command:

.. code-block:: shell

    sudo singularity build my_image.sif my_recipe_file

.. _recipe:

Recipe Files
~~~~~~~~~~~~

A recipe is a plain text file, that is a blueprint for the building of
a container. It should begin with two lines looking somewhat like this:

.. code-block:: shell

    Bootstrap: shub
    From: collection/container:version

This defines the basis of the image.  :code:`From:` defines the base-image of
the container and :code:`Bootstrap:` the type of image. The target of
:code:`From:` can also be a container at a private registry or docker.

The :code:`%post` section can be used to install software and change the
container in general.  It is mostly made up of commands, that would just be
typed into the users terminal during manual installation.  To install the
editor vim, one would for example add the following to the recipe:


.. code-block:: shell

    %post

    apt-get update
    apt-get -y install vim

:code:`apt-get` must be run with the :code:`-y` flag since the building of the
image is not interactive.
The command :code:`apt-get install` would prompt for user input and cause the
build to fail.
It should be kept in mind, that such commands must **always** be told to run
noninteractively, to not cause errors and stop a potentially very long building
process prematurely.

It is advised to delete no longer needed packages and files
at the end of the post section:

.. code-block:: shell

    # Remove software no longer needed:
    apt-get purge software_i_used_only_to_do_installation
    # Remove the package list
    # created by "apt-get update",
    # downloaded package archives and
    # packages not used by anything:
    apt-get autoclean -y
    apt-get autoremove -y
    rm -rf /var/lib/apt/lists/*

A detailed documentation of singularity recipes is given at the `official
recipe documentation <https://sylabs.io/guides/3.5/user-guide/definition_files.html>`_

Problems with Consistency
~~~~~~~~~~~~~~~~~~~~~~~~~
.. _consistent:

Building a container from a recipe_ at separate occasions can lead to
containers, that differ from each other. The difference depends strongly on
the way the software inside the container is installed at build time.
Installing from software repositories using :code:`apt-get install` for example
can lead to the installation of different versions at different times.  Base
images used to build upon can also be subject to change.  Either because they
also rely on software repositories for building or because they where
deliberately modified. This means that adding to the recipe_ file and building
a container with additional functionality could compromise its consistent
functionality as a whole.

Keeping Containers Consistent
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are two ways to add software to a container without reinstalling its
already installed software.

 1. Creating a new recipe_, that takes the old image as base image

  * Either using an image from a repository:

    .. code-block:: shell

        Bootstrap: shub
        From: collection_name/container_name:OLDVERSION

  * Or building upon a local image

    .. code-block:: shell

        Bootstrap: localimage
        From: oldimage.simg

 2. Creating a sandbox and manually installing software through its shell.

    .. code-block:: shell

        sudo singularity build --writable writable.simg oldimage.simg
        sudo singularity shell --writable writable.simg
        # Install software

.. _sandbox:

    .. code-block:: shell

        sudo singularity build --sandbox sandbox_folder oldimage.simg
        sudo singularity shell --writable sandbox_folder
        # Install software
        sudo singularity build newimage.simg sandbox_folder
        sudo rm -rf sandbox_folder

Just for testing new software, a writable overlay can be used.

    .. code-block:: shell

        # Create overlay
        singularity image.create overlay.img
        # Run overlay
        sudo singularity shell --overlay overlay.img oldimage.simg
        # Install software

To use the changes made in the overlay, its use has to always be specified
with the :code:`--overlay` flag.

Using a sandbox will break the reproducibility of the used container, since it
is no longer possible to follow the installation process inside a recipe_ file.
An overlay on the other hand will break the reproducibility of produced
results, since it changed the behavior of the used container base container.
This is however only an issue if the container is to be published or shared
with others.  Using method 1 reproducibility is kept, if the base image is
publicly available or made available.
This can however lead to a, hard to follow, chain of recipes and images.

It is also possible to try to obtain third party software only from sources
that are sure to always provide the exact same version and that also enable the
user to choose specific versions.

A best practice for developing containers is described in `Keeping Containers
Reproducible`_.

Keeping Containers Reproducible
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To make the content of a container comprehensible, everything used to create it
should be kept in a recipe_ file. Since building of a research environment is
often an iterative process, writing everything into a recipe_ beforehand can be
difficult.  To make development of a container easier, a sandbox_ can be used
to install software interactively and iteratively. The commands used to set up
the sandbox should than be documented in a recipe_ from which the final
production container can be build. A container and a recipe_ can then be
distributed. It is however possible, that the final container is not a perfect
replication of the sandbox (see `Problems with Consistency <consistent_>`_)

Pros and Cons of Container Creation Methods
+++++++++++++++++++++++++++++++++++++++++++

Installing into Sandbox and creating a container from it:

 * Pro:

  * Interactive, errors can be corrected immediately.
  * No redundant installation processes: Shorter installation time.
  * No need to keep recipe file up to date.

 * Contra:

   * Changes, breaking the functionality, cannot be easily reversed.
   * Hampers reproducibility.

Iteratively adding to the recipe file and building from it:

 * Pro:

  * Makes used software transparent: Helps reproducibility.
  * Functionality breaking changes can be reversed.
  * Can be published on Singularity Hub.

 * Contra:

  * May lead to containers, that are different in more than the additional
    software.
  * Recipe file has to be maintained during development.

Iteratively using old containers as base
(This would have to incorporate always letting Singularity Hub build the
container)

 * Pro:

  * The already functioning part of the environment is not build again.
  * Comprehension of the building process is less easy but still possible.
  * Functionality breaking changes can be reversed.
  * Can be published on Singularity Hub.

 * Contra:

  * A more or less complicated chain of containers has to be inspected to
    comprehend the resulting containers function.
  * Several recipe files have to be maintained.
  * Software only needed for building, like make or git, have to be installed
    in every iteration, if the size size of the containers is not supposed be
    unnecessarily large.
  * Needs Singularity Hub to be reproducible.

None of these methods will protect against changes caused by the installation
of the new software.

.. _arch:


Singularity Documentation
-------------------------

For more detailed documentation,
see the `official singularity documentation <https://sylabs.io/docs/>`_.