Skip to content
pseudonyms.rst 3.9 KiB
Newer Older
Bengfort's avatar
Bengfort committed
Pseudonyms
==========
Bengfort's avatar
Bengfort committed

Bengfort's avatar
Bengfort committed
Scientific data should never be stored with a subject's name. Instead,
Castellum provides pseudonyms that can be used to link the data back to the
subject. Anyone who wants to get in contact with a subject should have to go
through castellum.
Bengfort's avatar
Bengfort committed

.. warning::
    Traces of contact data can also exist in the systems that are used for
    communication, e.g. email servers or payment providers.

A subject can have many different pseudonyms in different domains. Castellum
Bengfort's avatar
Bengfort committed
automatically creates a new domain for each study. There can be more than one
domain per study as well as *general domains* that are not connected to studies
Bengfort's avatar
Bengfort committed
at all. You can think of domains as "coding lists" that are handled by
Castellum in the background.
Bengfort's avatar
Bengfort committed

Bengfort's avatar
Bengfort committed
Pseudonyms are only unique (and therefore useful) in the context of a domain.
Whenever you use a pseudonym, make sure that it is clear which domain it
belongs to. If in doubt, store the domain along with the pseudonym.
Bengfort's avatar
Bengfort committed

It is up to you to decide on a granularity of domains. For example you could
use a single domain for all bio samples. Or you could use separate domains for
blood, saliva, stool, ….

Using study pseudonyms
Bengfort's avatar
Bengfort committed
----------------------
Bengfort's avatar
Bengfort committed

Whenever you collect data in the context of a study, it should be stored with a
Bengfort's avatar
Bengfort committed
study pseudonym. Pseudonyms can also be printed on questionnaires or passed to
Bengfort's avatar
Bengfort committed
external survey services.

Relevant guides:

Bengfort's avatar
Bengfort committed
-   :ref:`study-domains`
-   :ref:`subject-by-pseudonym`
-   :ref:`subject-by-general-pseudonym`
Bengfort's avatar
Bengfort committed
-   :ref:`subject-get-pseudonym`

.. todo::
    -   attribute export

Bengfort's avatar
Bengfort committed
.. _general-domains:

Bengfort's avatar
Bengfort committed
Using pseudonyms from general domains
Bengfort's avatar
Bengfort committed
-------------------------------------
Bengfort's avatar
Bengfort committed

Central repositories (e.g. for bio samples or IQ scores) often store data that
is not related to a specific study. In these cases, you can use pseudonyms from
a *general domain*.
Bengfort's avatar
Bengfort committed
Because these pseudonyms are the same across all studies, access to them is
highly restricted. Both the user and the study need to be authorized before it
shows up in list of pseudonyms.
Bengfort's avatar
Bengfort committed
Relevant guides:
Bengfort's avatar
Bengfort committed
-   :ref:`admin-general-domains`
-   :ref:`admin-users`
-   :ref:`study-domains`
Bengfort's avatar
Bengfort committed
-   :ref:`subject-get-pseudonym`
Bengfort's avatar
Bengfort committed
-   :ref:`subject-delete`
Deleting domains
----------------

It is possible to delete a domain and all related pseudonyms. Once a pseudonym
is deleted, it is no longer possible to find the corresponding contact
information. Note, however, that additional steps might be necessary for full
anonymization of scientific data (e.g. image data).

The date when a study domain should be deleted is usually defined in the ethics
application and the study consent form.

Bengfort's avatar
Bengfort committed
How pseudonyms are generated
----------------------------

Castellum generates random pseudonyms and stores them in a database.

An alternative approach for generating pseudonyms would be to calculate an
encrypted hash over immutable, subject-related information (e.g. name, date of
birth). That approach would have the benefit of not relying on a central
infrastructure to store the pseudonyms. However, in cases where such a central
infrastructure with strict access control is feasible, Castellum's approach is
much simpler. For more information on these two approaches, see `Anforderungen
an den datenschutzkonformen Einsatz von Pseudonymisierungslösungen (german)
<https://www.de.digital/DIGITAL/Redaktion/DE/Digital-Gipfel/Download/2018/p9-datenschutzkonformer-einsatz-von-pseudonymisierungsloesungen.pdf>`_.

The algorithm that is used to generate pseudonyms can be configured. The
default algorithm uses digits and uppercase letters. In order to avoid mixups,
Bengfort's avatar
Bengfort committed
the letters "O", "I", "S", and "B" never appear in a pseudonym. When a user
enters those letters, they are automatically replaced by "0", "1", "5", or "8"
respectively. Single typos are guaranteed to be detected. This algorithm is
also available as a `standalone python package
<https://pypi.org/project/castellum-pseudonyms/>`_ so you can validate
pseudonyms in your scripts and pipelines.