Singularity =========== (Most documentation here and much valuable background work was done by Dominique Hansen https://github.com/hansendx) Singularity enables users to package their computational research environments, making them more mobile and improving repeatability and reproducibility of results. It can build and run so called containers, whose content can be predefined in special text files called recipes. A container, once built, can be run on any system, that has singularity itself installed, even if it contains software not normally supported for it. Since a singularity container is (outwardly) only comprised of one file, it is easy to distribute it to other computers. This improves mobility and guarantees a consistent work environment even if workflows are performed over heterogeneous set of computers. It also enables users to choose their software to use in environments like HPC-Clusters without root privileges and without being dependent on system administrators. All that is needed, is a singularity installation on the system. A container can also be published alongside research data and results, helping to ensure good scientific practice by making results more easily testable. Singularity is however limited to software executable in linux based operating systems. **Singularity in short:** * Helps users to build their software environment how they want it and use it where they want it. (with limits) * Helps the research community, by supporting the improvement of research software reproducibility. * May be necessary to have root privileges to build containers, but not to use them. * Can "only" run software compatible with linux distributions. Usage ----- Singularity can be used through the system shell/terminal. .. code-block:: shell # Get help singularity --help # Pull from some registry singularity pull python-3.8-alpine.sif docker://python:3.8-alpine # Build from recipe singularity build my-image.sif my-container.recipe # Run default entry point singularity run python-3.8-alpine.sif # Execute other programs from the container singularity exec python-3.8-alpine.sif /bin/sh Getting Containers ------------------ Pre-built containers can be obtained from container repositories. There are multiple sources for that. SingularityHub ~~~~~~~~~~~~~~ An open source, research focused repository is called `Singularity Hub `_ and hosts containers created by the user community. Singularity Hub allows users to link their GitHub account and projects with its repository. `Recipe `_ files, stored in chosen branches and `conforming to specific naming conventions `_, are then build and published through on the repository. The repositories content can be browsed and searched through, using the webpage. Containers are grouped in collections, defined by the username of the owner. Every container in a collection has a name and a version_. **Every container visible on singularity-hub can be downloaded, using singularity with the pull command:** .. code-block:: shell singularity pull shub://collection_name/container_name:version The :code:`shub://` prefix tells singularity, that the source to download from is a singularity registry (sregistry). Docker Integration ~~~~~~~~~~~~~~~~~~ Docker is another containerization solution. It is more focused on industry and large scale software systems, while singularity is research focused. Docker is not suited for shared systems like HPC-Clusters, since it causes problems with user permissions_. There is however a lot of software already containerized and available at their container repository `Docker Hub `_. Singularity can load docker images and convert them into the singularity format, which can shrink the workload of building containers. The command to pull a docker container would look somewhat like this: .. code-block:: shell singularity pull docker://python # Or with version specified singularity pull docker://python:3.7.0b3-alpine Docker Hub stores some images, that are from official partners, like the python example. Those do not have names in the form of :code:`username/container`. Images, that are created my normal users do however have this naming schema. .. _version: Version Tags ~~~~~~~~~~~~ Version Tags are used by sregistries and Singularity Hub to distinguish different versions of containers with the same name. They are set during the building and publishing process. Singularity Hub takes its version tags from the suffix (everything after the first . occurrence) of the recipe_ files name. A recipe_ named **Singularity.1.0** would result in the version tag **1.0**. A recipe_ only named **Singularity** would lead to the assignment of the special tag **latest** Addressing an image with shub// without a version tag will lead to the use of the **latest** tag. The following statements should therefore be equivalent: .. code-block:: shell singularity pull shub://collection/container singularity pull shub://collection/container:latest To improve reproducibility, a version tag should always be specified. Container Architecture ---------------------- The default singularity settings are configured to automatically mount the user's :code:`$HOME` directory and the current location :code:`$PWD` into a container when it starts one. All other file systems need to be explicitly mapped with the argument *-B*. This behaviour is very convenient, but it's also dangerous regarding reproducibility. For instance, if you start an R session and install a package in a container it will actually be placed in your home folder outside of the container and give you the impression that the container is somehow writable, when it's actually not. The container has its own binaries, meaning that software installed on the host should normally not be available in a container. Containers are normally single image-files in the `SquashFS `_ format. SquashFS is read-only by design. .. _permissions: Permissions ~~~~~~~~~~~ The users privileges do not change inside the container. This is different to other container solutions like docker, that change the users role to that set in the container, which also changes their privileges settings. (This would allow `Privilege escalation `_ on shared systems.) .. _build: Building a Container -------------------- To gain most visibility for a new container it's possible to build a docker container first and then on top of that build a singularity container as seen above. Check out the `official Docker guide `_ to see how to build a docker container. Building a container from a singularity recipe requires root privileges. Users, that want to build singularity containers themselves need access to a (linux) computer where it is possible to attain those root privileges. The most common way to build a container is by using a recipe_ file and the singularity build command: .. code-block:: shell sudo singularity build my_image.sif my_recipe_file .. _recipe: Recipe Files ~~~~~~~~~~~~ A recipe is a plain text file, that is a blueprint for the building of a container. It should begin with two lines looking somewhat like this: .. code-block:: shell Bootstrap: shub From: collection/container:version This defines the basis of the image. :code:`From:` defines the base-image of the container and :code:`Bootstrap:` the type of image. The target of :code:`From:` can also be a container at a private registry or docker. The :code:`%post` section can be used to install software and change the container in general. It is mostly made up of commands, that would just be typed into the users terminal during manual installation. To install the editor vim, one would for example add the following to the recipe: .. code-block:: shell %post apt-get update apt-get -y install vim :code:`apt-get` must be run with the :code:`-y` flag since the building of the image is not interactive. The command :code:`apt-get install` would prompt for user input and cause the build to fail. It should be kept in mind, that such commands must **always** be told to run noninteractively, to not cause errors and stop a potentially very long building process prematurely. It is advised to delete no longer needed packages and files at the end of the post section: .. code-block:: shell # Remove software no longer needed: apt-get purge software_i_used_only_to_do_installation # Remove the package list # created by "apt-get update", # downloaded package archives and # packages not used by anything: apt-get autoclean -y apt-get autoremove -y rm -rf /var/lib/apt/lists/* A detailed documentation of singularity recipes is given at the `official recipe documentation `_ Problems with Consistency ~~~~~~~~~~~~~~~~~~~~~~~~~ .. _consistent: Building a container from a recipe_ at separate occasions can lead to containers, that differ from each other. The difference depends strongly on the way the software inside the container is installed at build time. Installing from software repositories using :code:`apt-get install` for example can lead to the installation of different versions at different times. Base images used to build upon can also be subject to change. Either because they also rely on software repositories for building or because they where deliberately modified. This means that adding to the recipe_ file and building a container with additional functionality could compromise its consistent functionality as a whole. Keeping Containers Consistent ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are two ways to add software to a container without reinstalling its already installed software. 1. Creating a new recipe_, that takes the old image as base image * Either using an image from a repository: .. code-block:: shell Bootstrap: shub From: collection_name/container_name:OLDVERSION * Or building upon a local image .. code-block:: shell Bootstrap: localimage From: oldimage.simg 2. Creating a sandbox and manually installing software through its shell. .. code-block:: shell sudo singularity build --writable writable.simg oldimage.simg sudo singularity shell --writable writable.simg # Install software .. _sandbox: .. code-block:: shell sudo singularity build --sandbox sandbox_folder oldimage.simg sudo singularity shell --writable sandbox_folder # Install software sudo singularity build newimage.simg sandbox_folder sudo rm -rf sandbox_folder Just for testing new software, a writable overlay can be used. .. code-block:: shell # Create overlay singularity image.create overlay.img # Run overlay sudo singularity shell --overlay overlay.img oldimage.simg # Install software To use the changes made in the overlay, its use has to always be specified with the :code:`--overlay` flag. Using a sandbox will break the reproducibility of the used container, since it is no longer possible to follow the installation process inside a recipe_ file. An overlay on the other hand will break the reproducibility of produced results, since it changed the behavior of the used container base container. This is however only an issue if the container is to be published or shared with others. Using method 1 reproducibility is kept, if the base image is publicly available or made available. This can however lead to a, hard to follow, chain of recipes and images. It is also possible to try to obtain third party software only from sources that are sure to always provide the exact same version and that also enable the user to choose specific versions. A best practice for developing containers is described in `Keeping Containers Reproducible`_. Keeping Containers Reproducible ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To make the content of a container comprehensible, everything used to create it should be kept in a recipe_ file. Since building of a research environment is often an iterative process, writing everything into a recipe_ beforehand can be difficult. To make development of a container easier, a sandbox_ can be used to install software interactively and iteratively. The commands used to set up the sandbox should than be documented in a recipe_ from which the final production container can be build. A container and a recipe_ can then be distributed. It is however possible, that the final container is not a perfect replication of the sandbox (see `Problems with Consistency `_) Pros and Cons of Container Creation Methods +++++++++++++++++++++++++++++++++++++++++++ Installing into Sandbox and creating a container from it: * Pro: * Interactive, errors can be corrected immediately. * No redundant installation processes: Shorter installation time. * No need to keep recipe file up to date. * Contra: * Changes, breaking the functionality, cannot be easily reversed. * Hampers reproducibility. Iteratively adding to the recipe file and building from it: * Pro: * Makes used software transparent: Helps reproducibility. * Functionality breaking changes can be reversed. * Can be published on Singularity Hub. * Contra: * May lead to containers, that are different in more than the additional software. * Recipe file has to be maintained during development. Iteratively using old containers as base (This would have to incorporate always letting Singularity Hub build the container) * Pro: * The already functioning part of the environment is not build again. * Comprehension of the building process is less easy but still possible. * Functionality breaking changes can be reversed. * Can be published on Singularity Hub. * Contra: * A more or less complicated chain of containers has to be inspected to comprehend the resulting containers function. * Several recipe files have to be maintained. * Software only needed for building, like make or git, have to be installed in every iteration, if the size size of the containers is not supposed be unnecessarily large. * Needs Singularity Hub to be reproducible. None of these methods will protect against changes caused by the installation of the new software. .. _arch: Singularity Documentation ------------------------- For more detailed documentation, see the `official singularity documentation `_.