Skip to content

Draft: Expand Study API

Bengfort requested to merge metadata into main

replaces !74 (closed).

This is a collection of some ideas to expand the study API:

  • Update the json API so it contains all currently available data
  • Provide data in different formats, e.g. Dublin Core or MARC21
  • Provide data via an OAI-PMH API

The general goal is to improve interoperability with other tools, or at least to prepare for that. On a meta level the goal is also to learn more about available metadata schemas.

Open questions:

  • What is the primary object?
    • In many cases there is a 1-1 mapping between first author, study, dataset, storage location, and publication. Even if that is not the case I think that the study could be a useful proxy to access everything else. It is a rather fuzzy term though. Most metadata schemas I found are concerned with other aspects of the research lifecycle. We could go with datasets instead, but it has no distinguishing title or description right now. We might want to restructure that to better fit with existing metadata schemas.
  • Which metadata schemas do we want to support?
    • https://fairsharing.org/search lists 4079 metadata schemes and controlled vocabularies. That is a lot!
    • All of this is heavily based on XML and linked data, which has been out of style in web development for 20 years or so.
    • Popular schemas include:
      • Dublin Core: a small set of values that is suitable for a wide range of objects
      • schema.org: A large and generic schema mostly used for search engine optimization
      • MARC21: very old and well established in the library world
      • Datacite: for datasets
      • DOI metadata kernel: a small set of mostly technical information required for every DOI
      • ISA (investigation, study, assay): a schema that mentions studies, so it might be interesting
      • XNAT is a software platform for imaging that has published its data model
  • apart from the schema, it might also be interesting to look into controlled vocabularies to restrict the available keywords (or at least nudge users to use controlled terms)
  • I added a MARC21 implementation, but I am not at all sure if that is correct. I barely found any information on this and the format is not very nice to work with.
  • I am not sure if this is worth the effort at all. There is such a proliferation of metadata schemas that they are barely useful.
  • About the simple json export: Is there any data we want to keep secret?

Merge request reports