Skip to content

add castellum_import script

Bengfort requested to merge castellum-import into main

This adds a script to import studies from castellum. I am not 100% convinced this is the right approach though.

With this we have 3 import scripts:

  • ldap_sync_all_users to import all users from LDAP (~750)
  • arc_import to import studies from the old ARC study registration (~220)
  • castellum_import to import studies from castellum (~420)

ARC study registration has a much bigger overlap with this one in terms of features, so there is much more we can work with (e.g. budgets, deployments, storage location). With Castellum there is much less overlap, so the data is less useful. On the other hand, there is much more data.

  • I am pretty sure some studies exist in both ARC and Castellum, but with different names. During my tests, only a single study from Castellum was skipped because the name already existed.
  • I used some heuristics to find a contact person for each study. But in 110 cases I had to fall back to info@mpib-berlin.mpg.de. I believe most of this is because the contact person for that study is no longer at the institute. So these might be cases we have to take a closer look at anyway.
  • Castellum does not store information about when a study was created. I used the optional field Studies.sessions_start instead. ~200 studies did not have a value for that field, so I used 1970-01-01. The creation date is relevant because it is used to sort the study list. It is also relevant because we cannot easily change the value later.
  • Castellum does not have a concept of departments, so I used the default department of the contact user. However, that value is not very reliable either. Users can be in multiple department groups in LDAP, and so far I have not found a good heuristic which of them is the most relevant.
  • updates
    • ldap_sync_all_users can be run again. It will add new accounts and disable (but not delete) old ones.
    • arc_import and castellum_import can be run again. They will skip importing a study is a study of the same name already exists. So if we rename or delete a study manually and then run the import script again, it will be created a second time. This limits our ability to do manual adjustments before the the migration is complete.
Edited by Bengfort

Merge request reports