Commit 37cd044f authored by Bengfort's avatar Bengfort
Browse files

Merge branch 'general-pseudonyms' into 'master'

Document general pseudonyms

See merge request !78
parents 71e58099 8429e96d
Pipeline #10771 passed with stages
in 35 seconds
......@@ -13,10 +13,12 @@ Relevant guides:
- :ref:`subject-search`
- :ref:`subject-create`
- :ref:`subject-edit`
- :ref:`subject-by-pseudonym`
- :ref:`subject-get-pseudonym`
- :ref:`subject-to-be-deleted`
- :ref:`subject-delete`
- :ref:`study-create`
- :ref:`study-domains`
- :ref:`study-delete`
- :ref:`data-protection-dashboard`
- :ref:`subject-export`
......
......@@ -13,8 +13,9 @@ Manage Users
5. Add the appropriate global :ref:`roles`
6. Add the appropriate :ref:`privacy-level`
7. Set an expiration date
8. Click on one of the saving options
7. Add the appropriate **general domains**
8. Set an expiration date
9. Click on one of the saving options
.. _admin-unlock:
......@@ -121,3 +122,16 @@ case there is a two step process:
up in the data protection dashboard.
The legal basis for each subject can be found in the subject detail view.
.. _admin-general-domains:
Manage general domains
----------------------
1. Click on **Admin** on the front page
2. Go to **Domains**
3. Click on **Add Domain** (oval with grey background)
4. Enter a name
5. Leave the ``object_id`` and ``content_type`` fields empty
6. Click on one of the saving options
.. _subject-by-pseudonym:
Find subject by study pseudonym
===============================
1. Click on **Studies** on the front page
2. In the list of studies, find the study and click **Execution**
3. Go to the **By pseudonym** tab
4. Enter the pseudonym. If there is more than one study domain, you also have
to select the correct domain.
.. _subject-get-pseudonym:
Get the pseudonyms of a subject
===============================
1. Click on **Studies** on the front page
2. In the list of studies, find the study and click **Execution**
3. In the list of participating subjects, click **Details**
4. In the subject overview, the pseudonym is listed among the subject's
contact data and operational hints
.. note::
The pseudonyms are only shown once you click a button. Each access to a
pseudonym is monitored to detect abuse.
......@@ -172,6 +172,18 @@ your study:
links can be inserted as standard text.
.. _study-domains:
Manage study pseudonym domains
------------------------------
In the **Pseudonym domains** tab you can add a new domain or change the name of
an existing domain.
If there are general domains you can also define which general domains need to
be accessed in the context of this study.
.. _study-members:
Manage study members
......
.. _subject-get-pseudonym:
Get the pseudonym of a subject
==============================
1. Click on **Studies** on the front page
2. In the list of studies where you are a member, click **Execution**
next to the name of the study and its contact person
3. In the list of participating subjects, click **Details**
4. In the subject overview, the pseudonym is listed among the subject's
contact data and operational hints
......@@ -267,7 +267,7 @@ sufficient legal basis to keep the data.
This is only available for users with permissions that are granted to staff
members who are data protection coordinators or the like.
If deletion was requested by a subject you should follow your institutes
If deletion was requested by a subject you should follow your institute's
rules on verifying identity of requester.
In order to delete the externally and internally stored data of a subject,
......@@ -281,10 +281,18 @@ please proceed as follows:
- Contact the responsible person for each study and ask them to delete
all collected data of the subject concerned. Identify the subject using
the study pseudonym that is displayed.
- Once the responsible contact person has conformed the deletion of all
- Once the responsible contact person has confirmed the deletion of all
data, delete the participation record using the **Delete** button.
3. Once all participation have been deleted you will see a message saying
3. If you see a message saying **This subject may still have data in general
domains.**, proceed as follows:
- Click on **Pseudonyms** next to **General pseudonym domains** to get a
list of pseudonyms.
- Contact the responsible person for each general domain and make sure
that all data is deleted.
4. Once all participations have been deleted you will see a message saying
**Are you sure you want to permanently delete this subject and all related
data?** You can now click **Confirm** and the subject will be deleted.
......
......@@ -12,6 +12,7 @@ Welcome to Castellum's documentation!
overview
features
roles
privacy
security
faqs
......@@ -22,7 +23,7 @@ Welcome to Castellum's documentation!
guides/two-factor-authentication
guides/subject-management
guides/study-management
guides/subject-get-pseudonym
guides/pseudonyms
guides/data-protection
guides/consent-management
......
Privacy
=======
At its core, Castellum is about splitting a subject's data into little pieces.
On the one hand this means that users can only access the pieces that are
necessary for them. On the other hand this means that castellum contains the
necessary information to put all the pieces back together, e.g. so it can be
deleted on request.
Contact data
------------
Contact details are stored in Castellum itself. This means that anyone who
wants to get in contact with a subject needs to go through castellum.
.. warning::
Traces of contact data can also exist in the systems that are used for
communication, e.g. email servers or payment providers.
Pseudonyms
----------
Scientific data should never be stored with a subject's name. Instead,
Castellum automatically generates and stores random pseudonyms that can be used
to link the data back to the subject.
.. note::
An alternative approach for generating pseudonyms would be to calculate an
encrypted hash over immutable, subject-related information (e.g. name, date
of birth)
That approach would have the benefit of not relying on a central
infrastructure to store the pseudonyms. However, in cases where such a
central infrastructure with strict access control is feasible, Castellum's
approach is much simpler.
For more information on these two approaches, see `Anforderungen an den
datenschutzkonformen Einsatz von Pseudonymisierungslösungen (german)
<https://www.de.digital/DIGITAL/Redaktion/DE/Digital-Gipfel/Download/2018/p9-datenschutzkonformer-einsatz-von-pseudonymisierungsloesungen.pdf>`_.
.. note::
The algorithm that is used to generate pseudonyms can be configured. The
algorithm that is used by default produces alphanumeric strings with 20
bits of entropy and two checkdigits that are guaranteed to detect single
errors. It is also available as a `standalone package
<https://pypi.org/project/castellum-pseudonyms/>`_.
A subject can have many different pseudonyms in different domains. Castellum
automatically creates a new domain for each study. There can be more than one
domain per study as well as *general domains* that are not connected to studies
at all.
.. warning::
Pseudonyms are only unique (and therefore useful) within their domain.
Whenever you use a pseudonym, make sure that it is clear which domain it
belongs to. If in doubt, store the domain along with the pseudonym.
It is up to you to decide on a granularity of domains. For example you could
use a single domain for all bio samples. Or you could use separate domains for
blood, saliva, stool, ….
Using study pseudonyms
~~~~~~~~~~~~~~~~~~~~~~
Whenever you collect data in the context of a study, it should be stored with a
study pseudonym. Pseudonyms can also be printed on questionnaires or passed to
external survey services.
Relevant guides:
- :ref:`study-domains`
- :ref:`subject-by-pseudonym`
- :ref:`subject-get-pseudonym`
.. todo::
- attribute export
Using pseudonyms from general domains
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Central repositories (e.g. for bio samples or IQ scores) often store data that
is not related to a specific study. In these cases, you can use pseudonyms from
a *general domain*.
Because these pseudonyms are the same across all studies, access to them is
highly restricted. Both the user and the study need to be authorized before it
shows up in list of pseudonyms. This also means that, even though general
domains exist independently of studies, they can only be accessed through
studies.
Relevant guides:
- :ref:`admin-general-domains`
- :ref:`admin-users`
- :ref:`study-domains`
- :ref:`subject-get-pseudonym`
- :ref:`subject-delete`
Database split
--------------
In Castellum, contact data is handled in a database server which is separated
from everything else to provide an additional barrier.
This provides a clear structure for developers that should help avoiding
critical data leaks. Even if an attacker is able to dump a whole table or even
a whole database, this structure still limits the impact.
However, it is important to understand that the barrier between recruitment and
contact data is not that high. Since castellum has full access to both, an
attacker can also gain full access. Spreading the system across several
databases on different servers or even in different organizations does not help
much if there is still a single point of entry.
......@@ -41,6 +41,7 @@ Relevant guides:
- :ref:`study-members`
- :ref:`study-sessions`
- :ref:`study-recruitment-settings`
- :ref:`study-domains`
- :ref:`study-finish`
- :ref:`study-delete`
- :ref:`set-up-external-scheduler`
......
......@@ -81,60 +81,6 @@ user's privacy level is controlled via the special permissions
``privacy_level_1`` and ``privacy_level_2``. The three levels (0-2) accord to
the data security levels of the Max Planck Society.
Pseudonyms
----------
There are generally two approaches to generate pseudonyms:
- Calculate an encrypted hash over immutable, subject-related information
(e.g. name, date of birth)
- Generate a random pseudonym and store it in a mapping table
The former approach has the benefit of not relying on a central infrastructure.
However, in cases where such a central infrastructure with strict access
control is feasible, the latter approach is much simpler.
Castellum implements the latter approach.
For more information on these two approaches, see `Anforderungen an den
datenschutzkonformen Einsatz von Pseudonymisierungslösungen (german)
<https://www.de.digital/DIGITAL/Redaktion/DE/Digital-Gipfel/Download/2018/p9-datenschutzkonformer-einsatz-von-pseudonymisierungsloesungen.pdf>`_.
The algorithm that is used to generate pseudonyms can be configured. The
algorithm that is used by default produces alphanumeric strings with 20 bits of
entropy and two checkdigits that are guaranteed to detect single errors. It is
also available as a `standalone package
<https://pypi.org/project/castellum-pseudonyms/>`_.
Data separation
---------------
Implementation
~~~~~~~~~~~~~~
We chose to split the data into three different categories:
- Scientific data is handled outside of castellum. Castellum only
provides the pseudonyms that are used to map this data to subjects.
- Data relevant for recruitment is handled in castellum.
- Contact data is also handled in castellum, but in a separate database
to provide an additional barrier.
Security Considerations
~~~~~~~~~~~~~~~~~~~~~~~
The described architecture provides a clear structure for developers
that should help avoiding critical data leaks. Even if an attacker is
able to dump a whole table or even a whole database, this structure
still limits the impact.
However, it is important to understand that the barrier between
recruitment and contact data is not that high. Since castellum has full
access to both, an attacker can also gain full access. Spreading the
system across several databases on different servers or even in
different organizations does not help much if there is still a single
point of entry.
Monitoring
----------
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment