Commit 26965254 authored by Hayat's avatar Hayat
Browse files

Merge branch 'refine-privacy' into 'main'

restructure "privacy" section to "pseudonyms"

See merge request !104
parents a33e2784 daed23eb
Pipeline #12987 passed with stages
in 34 seconds
......@@ -12,7 +12,7 @@ Welcome to Castellum's documentation!
overview
features
roles
privacy
pseudonyms
security
faqs
......
Privacy
=======
At its core, Castellum is about splitting a subject's data into little pieces.
On the one hand this means that users can only access the pieces that are
necessary for them. On the other hand this means that castellum contains the
necessary information to put all the pieces back together, e.g. so it can be
deleted on request.
Contact data
------------
Pseudonyms
==========
Contact details are stored in Castellum itself. This means that anyone who
wants to get in contact with a subject needs to go through castellum.
Scientific data should never be stored with a subject's name. Instead,
Castellum provides pseudonyms that can be used to link the data back to the
subject. Anyone who wants to get in contact with a subject should have to go
through castellum.
.. warning::
Traces of contact data can also exist in the systems that are used for
communication, e.g. email servers or payment providers.
Pseudonyms
----------
Scientific data should never be stored with a subject's name. Instead,
Castellum automatically generates and stores random pseudonyms that can be used
to link the data back to the subject.
.. note::
An alternative approach for generating pseudonyms would be to calculate an
encrypted hash over immutable, subject-related information (e.g. name, date
of birth)
That approach would have the benefit of not relying on a central
infrastructure to store the pseudonyms. However, in cases where such a
central infrastructure with strict access control is feasible, Castellum's
approach is much simpler.
For more information on these two approaches, see `Anforderungen an den
datenschutzkonformen Einsatz von Pseudonymisierungslösungen (german)
<https://www.de.digital/DIGITAL/Redaktion/DE/Digital-Gipfel/Download/2018/p9-datenschutzkonformer-einsatz-von-pseudonymisierungsloesungen.pdf>`_.
.. note::
The algorithm that is used to generate pseudonyms can be configured. The
algorithm that is used by default produces alphanumeric strings with 20
bits of entropy and two checkdigits that are guaranteed to detect single
errors. It is also available as a `standalone package
<https://pypi.org/project/castellum-pseudonyms/>`_.
A subject can have many different pseudonyms in different domains. Castellum
automatically creates a new domain for each study. There can be more than one
domain per study as well as *general domains* that are not connected to studies
at all.
.. warning::
Pseudonyms are only unique (and therefore useful) within their domain.
Whenever you use a pseudonym, make sure that it is clear which domain it
belongs to. If in doubt, store the domain along with the pseudonym.
Pseudonyms are only unique (and therefore useful) in the context of a domain.
Whenever you use a pseudonym, make sure that it is clear which domain it
belongs to. If in doubt, store the domain along with the pseudonym.
It is up to you to decide on a granularity of domains. For example you could
use a single domain for all bio samples. Or you could use separate domains for
blood, saliva, stool, ….
Using study pseudonyms
~~~~~~~~~~~~~~~~~~~~~~
----------------------
Whenever you collect data in the context of a study, it should be stored with a
study pseudonym. Pseudonyms can also be printed on questionnaires or passed to
......@@ -80,7 +42,7 @@ Relevant guides:
.. _general-domains:
Using pseudonyms from general domains
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------------------------------
Central repositories (e.g. for bio samples or IQ scores) often store data that
is not related to a specific study. In these cases, you can use pseudonyms from
......@@ -97,3 +59,22 @@ Relevant guides:
- :ref:`study-domains`
- :ref:`subject-get-pseudonym`
- :ref:`subject-delete`
How pseudonyms are generated
----------------------------
Castellum generates random pseudonyms and stores them in a database.
An alternative approach for generating pseudonyms would be to calculate an
encrypted hash over immutable, subject-related information (e.g. name, date of
birth). That approach would have the benefit of not relying on a central
infrastructure to store the pseudonyms. However, in cases where such a central
infrastructure with strict access control is feasible, Castellum's approach is
much simpler. For more information on these two approaches, see `Anforderungen
an den datenschutzkonformen Einsatz von Pseudonymisierungslösungen (german)
<https://www.de.digital/DIGITAL/Redaktion/DE/Digital-Gipfel/Download/2018/p9-datenschutzkonformer-einsatz-von-pseudonymisierungsloesungen.pdf>`_.
The algorithm that is used to generate pseudonyms can be configured. The
default algorithm produces alphanumeric strings with 20 bits of entropy and two
checkdigits that are guaranteed to detect single errors. It is also available
as a `standalone package <https://pypi.org/project/castellum-pseudonyms/>`_.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment