Metadata of datasets

Related Issues

Document Status

#18

In Progress

Section authors:

Philipp S. Sommer orcid , Marie Ryan, Linda Baldewein orcid , Hatef Takyar, Andrea Pörsch, Beate Geyer orcid , Lars Buntemeyer orcid , Emanuel Söding orcid , Nikolaus Groll, Ludwid Lierhammer, Klaus Getzlaff orcid , Tilman Dinter

A central aspect in the model data explorer is the metadata of the datasets. We do not want to mimic a metadata portal such as geonetwork, but nevertheless, we need information to

  1. let the user know what he or she is looking at

  2. link datasets to groups

  3. link datasets to people

Each author and group has a dedicated site where all the related datasets are listed. So we need to find a way to uniquely identify authors and associate the datasets with them.

We will implement a manual way where you can select the authors and edit the metadata through a web interface, but it should be possible to automatically interprete metadata standards.

In the following sections, we will describe what metadata we implement in the, and how.

Note

This document only covers global metadata of a dataset. Variable related metadata (units, standard name, etc.) shall be handled in a different document.

Todo

Document variable related metadata

Required metadata

Datasets in the model data explorer must define the following metadata attributes:

title

A one-line description of the dataset

Optional metadata

Optional but recommended metadata attributes are:

contacts

A list of authors that have some role related to the dataset. They participated in the generation, are responsible for providing the data, etc.

institutions

The institutions that are responsible for the dataset

projects

related projects that provided funding for the generation of the dataset

bbox

The bounding box of the geographic region of the data

abstract

A short description of the dataset

data_relations

datacite relation types (see Relations between Users, Groups and datasets and https://support.datacite.org/docs/relationtype_for_citation)

temporal_extent

The temporal window that is covered by the dataset

Todo

document geographic and temporal resolution as well?

Todo

Add descriptive spatial extent (such as global, continental, etc.)

Todo

add creation, publication and revision date

Interpretation of standards

The items mentioned in the previous sections are encoded in the metadata standards that we support, namely the netCDF header and the INSPIRE ISO- Standard. Our aim is to develop readers for each standard that transform the corresponding conventions into the metadata scheme of the model data explorer (see next section, Implementation details).

The exact database structure that allows this interpretation is however part of a different user story, namely #20.

CF-Conventions

For netCDF Headers (and NcML, a special markup language used by THREDDS) we want to develop guidelines based on the Binding Regulations for Storing Data as netCDF Files. For this purpose, we will transform the guidelines into a web-based format and enhance it with templates to make them easier to apply.

The guidelines are based on the CF-Conventions and extend by further attributes that are mainly motivated by the Conversion methodology to INSPIRE developed at the Geomar.

Todo

The UnidataDD2MI.xsl methodology needs to be elaborated further.

INSPIRE ISO

ISO-conform XML files will be read using the owslib python library. We will orient the format on the UnidataDD2MI.xsl file that has been developed by Franziska Weng (Geomar) and Andrea Pörsch (GFZ) (currently still work in progress).

Implementation

Note

Illustration of object attributes vs. object relations

Object attribute vs. object relations

class Author(models.Model):

    name = models.CharField(max_length=100)
    email = models.EmailField(max_length=100)

class Dataset(models.Model):

    title = models.CharField(max_length=50)
    abstract = models.CharField(max_length=50, null=True, blank=True)
    contacts = models.ManyToManyField(Author)

Metadata items described above are represented in the model data explorer as properties of Django objects that in turn translates into connections and attributes in a relational database. But for this document we will keep it simple and distinguish two metadata types: attributes and relations.

Attributes are simple string properties of a dataset. A title for instance. Relations describe how the dataset is connected to other items in the database. A dataset won’t have an authors string property, for instance, but it will define a connection to author objects, where one author holds a first name and last name attribute (for instance).

An example is shown in the graph on the right, Object attribute vs. object relations.

Attributes

A Dataset defines three simple attributes, title, abstract and bounding box (bbox), see the graph about Attributes of a dataset.

digraph model_graph {
  // Dotfile by Django-Extensions graph_models
  // Created: 2023-02-14 14:40
  // Cli Options: --output /home/docs/checkouts/readthedocs.org/user_builds/mde-prototype/checkouts/develop/source/tmp_graph.dot templateapp

  fontname = "Roboto"
  fontsize = 8
  splines  = true
  rankdir = "TB"

  node [
    fontname = "Roboto"
    fontsize = 8
    shape = "plaintext"
  ]

  edge [
    fontname = "Roboto"
    fontsize = 8
  ]

  // Labels


  templateapp_models_Dataset [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    Dataset
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT COLOR="#7B7B7B" FACE="Roboto">abstract</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT COLOR="#7B7B7B" FACE="Roboto">TextField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT COLOR="#7B7B7B" FACE="Roboto">bbox</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT COLOR="#7B7B7B" FACE="Roboto">JSONField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT COLOR="#7B7B7B" FACE="Roboto">end</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT COLOR="#7B7B7B" FACE="Roboto">DateTimeField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT COLOR="#7B7B7B" FACE="Roboto">end_s</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT COLOR="#7B7B7B" FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT COLOR="#7B7B7B" FACE="Roboto">start</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT COLOR="#7B7B7B" FACE="Roboto">DateTimeField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT COLOR="#7B7B7B" FACE="Roboto">start_s</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT COLOR="#7B7B7B" FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">title</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]




  // Relations


}

Attributes of a dataset

class Dataset(models.Model):

    title = models.CharField(max_length=50)
    abstract = models.TextField(max_length=10000, null=True, blank=True)
    bbox = models.JSONField(null=True, blank=True)
    start = models.DateTimeField(null=True, blank=True)
    end = models.DateTimeField(null=True, blank=True)

    start_s = models.CharField(max_length=50, null=True, blank=True)
    end_s = models.CharField(max_length=50, null=True, blank=True)

Title

The title is a short human-readable description as string of the dataset and should describe the purpose of the data in one sentence.

Interpretation of the title

The CF-Conventions define a title netCDF attribute that will be used

We are using the <gmd:title> tag of the CI_Citation element.

Abstract

The abstract is a longer human-readable description of the dataset that describes the content, purpose and methodology in a bit more details.

Interpretation of the abstract

We wiill look for global summary or abstract attribute.

We are using the <gmd:abstract> tag.

Bounding box

The bbox is a JSONField (or optionally we can also make it a georeferenced polygon) that defines the region where this dataset can be applied.

Interpretation of the bounding Box

We wiill look for the global geospatial_lon_min, geospatial_lat_min, geospatial_lon_max and geospatial_lat_max attributes, as well as a Bbox attribute.

We are using the EX_GeographicBoundingBox element in the <gmd:geographicElement> tag, namely westBoundLongitude, eastBoundLongitude, southBoundLatitude and northBoundLatitude

Temporal extent

The temporal extent is a DatetimeField that defines the start and end of a time window. We will expect two ISO-formatted timestamps here, one for the start and one for the end of the coverage.

This might not always be possible, as python does not support paleo dates. So we will also add a attributes start_s and end_s that accept plain text fields.

Interpretation of the temporal extent

We wiill look for the global StartTime, StopTime, time_coverage_start and time_coverage_end attributes.

We are using the EX_TemporalExtent element in the <gmd:temporalElement> tag, namely beginPosition, endPosition

Authors and Contact Persons

Todo

Add OrcID

Authors can have a dedicated role related when being related to a dataset. This roles describe how the authors have been involved in the generation of the dataset (motivated by the available roles for the CI_RoleCode tag in INSPIRE, see the Roles tab).

In the metadata display on the frontend, we will then group the contributors based on their role such that is clearly visible who is the responsible contact person.

digraph model_graph {
  // Dotfile by Django-Extensions graph_models
  // Created: 2023-02-14 14:40
  // Cli Options: --output /home/docs/checkouts/readthedocs.org/user_builds/mde-prototype/checkouts/develop/source/tmp_graph.dot templateapp

  fontname = "Roboto"
  fontsize = 8
  splines  = true
  rankdir = "TB"

  node [
    fontname = "Roboto"
    fontsize = 8
    shape = "plaintext"
  ]

  edge [
    fontname = "Roboto"
    fontsize = 8
  ]

  // Labels


  templateapp_models_Author [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    Author
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">email</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">EmailField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">name</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]

  templateapp_models_DatasetContact [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    DatasetContact
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>author</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>ForeignKey (id)</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>dataset</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>ForeignKey (id)</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">role</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]

  templateapp_models_Dataset [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    Dataset
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">title</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]




  // Relations

  templateapp_models_DatasetContact -> templateapp_models_Author
  [label=" author (datasetcontact)"] [arrowhead=none, arrowtail=dot, dir=both];

  templateapp_models_DatasetContact -> templateapp_models_Dataset
  [label=" dataset (datasetcontact)"] [arrowhead=none, arrowtail=dot, dir=both];


}
class Author(models.Model):

    name = models.CharField(max_length=100)
    email = models.EmailField(max_length=100)

class DatasetContact(models.Model):

    author = models.ForeignKey(Author, on_delete=models.CASCADE)
    dataset = models.ForeignKey("Dataset", on_delete=models.CASCADE)
    role = models.CharField(max_length=30)

class Dataset(models.Model):

    title = models.CharField(max_length=50)
    contacts = models.ManyToManyField(Author, through=DatasetContact)

Roles are taken from the INSPIRE ISO standard, https://inspire.ec.europa.eu/metadata-codelist/ResponsiblePartyRole. The role of an Author in a Dataset can be one of the following:

Code

Description

resourceProvider

Party that supplies the resource.

custodian

Party that accepts accountability and responsibility for the data and ensures appropriate care and maintenance of the resource.

owner

Party that owns the resource.

user

Party who uses the resource.

distributor

Party who distributes the resource.

originator

Party who created the resource

pointOfContact

Party who can be contacted for acquiring knowledge about or acquisition of the resource.

principalInvestigator

Key party responsible for gathering information and conducting research.

processor

Party who has processed the data in a manner such that the resource has been modified.

publisher

Party who published the resource.

author

Party who authored the resource.

Interpretation of Authors and Contact Persons

Although the CF-Conventions define an originator attribute, the information is rather limited. Therefore we aim to follow the suggestions by the UnidataDD2MI.xsl file of Franziska Weng (see INSPIRE ISO), and introduce further attributes such as creator_email, originator_email, contact_email, pi_email, contributor_role, etc.

The implementation is pretty straight-forward and will be taken from the CI_ResponsibleParty tags.

Projects

Projects are data groups within the Model Data Explorer Framework (see Data Group). As such, a project is also a relation between two objects in the database (see the Graph tab).

This relation can also be equipped with permissions, namely can_edit, can_view and can_list (see Datasets and data groups). These permissions need to be approved by both, the data group (project) owner and the dataset.

A relation can also be made visible or invisible, which will determine whether the group is listed explicitly on the detail page of the dataset or not.

Note

A dataset can also be related to other types of data groups, such as institutions and this will be using the same methodology as this. As such, we will distinguish projects from platforms, etc. based on the DS_InitiativeTypeCode identifier (see Data Group and #20).

digraph model_graph {
  // Dotfile by Django-Extensions graph_models
  // Created: 2023-02-14 14:40
  // Cli Options: --output /home/docs/checkouts/readthedocs.org/user_builds/mde-prototype/checkouts/develop/source/tmp_graph.dot templateapp

  fontname = "Roboto"
  fontsize = 8
  splines  = true
  rankdir = "TB"

  node [
    fontname = "Roboto"
    fontsize = 8
    shape = "plaintext"
  ]

  edge [
    fontname = "Roboto"
    fontsize = 8
  ]

  // Labels


  templateapp_models_DataGroup [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    DataGroup
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">name</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]

  templateapp_models_RelationPermission [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    RelationPermission
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">left_approved</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">BooleanField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">name</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">right_approved</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">BooleanField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]

  templateapp_models_DatasetDataGroupRelation [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    DatasetDataGroupRelation
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>data_group</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>ForeignKey (id)</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>dataset</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>ForeignKey (id)</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">visible</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">BooleanField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]

  templateapp_models_Dataset [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    Dataset
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">title</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]




  // Relations

  templateapp_models_DatasetDataGroupRelation -> templateapp_models_DataGroup
  [label=" data_group (datasetdatagrouprelation)"] [arrowhead=none, arrowtail=dot, dir=both];

  templateapp_models_DatasetDataGroupRelation -> templateapp_models_Dataset
  [label=" dataset (datasetdatagrouprelation)"] [arrowhead=none, arrowtail=dot, dir=both];

  templateapp_models_DatasetDataGroupRelation -> templateapp_models_RelationPermission
  [label=" permissions (datasetdatagrouprelation)"] [arrowhead=dot arrowtail=dot, dir=both];


}

Representation of the Dataset - Project relation.

class DataGroup(models.Model):
    name = models.CharField(max_length=100)


class RelationPermission(models.Model):

    name = models.CharField(max_length=20)
    left_approved = models.BooleanField(default=False)
    right_approved = models.BooleanField(default=False)


class DatasetDataGroupRelation(models.Model):

    data_group = models.ForeignKey(DataGroup, on_delete=models.CASCADE)
    dataset = models.ForeignKey("Dataset", on_delete=models.CASCADE)
    permissions = models.ManyToManyField(RelationPermission)
    visible = models.BooleanField(default=True)


class Dataset(models.Model):

    title = models.CharField(max_length=50)
    data_groups = models.ManyToManyField(
        DataGroup, through=DatasetDataGroupRelation
    )
Interpretation of Projects

netCDF files can define a project, program, projects or project_name attribute. We will then search for matching names in the data groups that define a DS_InitiativeTypeCode kind of project and suggest them to the data submitter. This will also be documented in the netCDF guidelines (see CF-Conventions).

We will look for MD_AggregateInformation entries that define a DS_AssociationTypeCode of largerWorkCitation and match the MD_Identifier against the available data groups.

Institutions

Todo

Add ROR ID

Institutions are handled the same internally as projects as both are represented as data groups in the model data explorer. Just the interpretation of the metadata standards differ.

Interpretation of Institutions

netCDF files can define an institution or creator_institution attribute, together with a corresponding institution_references attribute. They will then be matched against available names of institutions in the database to make suggestions to the data submitter.

Institutions will be identified from the organisationName in a CI_ResponsibleParty (see Authors and Contact Persons above).

Other relations

Other relations are references to internal or external resources, such as related studies or datasets. They are commonly described by datacite related identifiers, see https://support.datacite.org/docs/relationtype_for_citation.

However, neither the CF-Conventions nor INSPIRE define such a relation type. But both give the possibilities to add supplementary studies, (see below) and we’ll just add these informations as a DatasetReference object (see the Graph tab).

If the URI however corresponds to a handle in the model data explorer, we can also directly transfer this into a relation between datasets (see Graph tab) and suggest that Dataset A is supplement to Dataset B (see Datasets).

digraph model_graph {
  // Dotfile by Django-Extensions graph_models
  // Created: 2023-02-14 14:40
  // Cli Options: --output /home/docs/checkouts/readthedocs.org/user_builds/mde-prototype/checkouts/develop/source/tmp_graph.dot templateapp

  fontname = "Roboto"
  fontsize = 8
  splines  = true
  rankdir = "TB"

  node [
    fontname = "Roboto"
    fontsize = 8
    shape = "plaintext"
  ]

  edge [
    fontname = "Roboto"
    fontsize = 8
  ]

  // Labels


  templateapp_models_Dataset [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    Dataset
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">title</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]

  templateapp_models_DatasetReference [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    DatasetReference
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>dataset</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>ForeignKey (id)</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">description</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">uri</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">URLField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]

  templateapp_models_DatasetRelation [label=<
    <TABLE BGCOLOR="white" BORDER="1" CELLBORDER="0" CELLSPACING="0">
    <TR><TD COLSPAN="2" CELLPADDING="5" ALIGN="CENTER" BGCOLOR="#1b563f">
    <FONT FACE="Roboto" COLOR="white" POINT-SIZE="10"><B>
    DatasetRelation
    </B></FONT></TD></TR>
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>id</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>BigAutoField</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>left</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>ForeignKey (id)</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto"><B>right</B></FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto"><B>ForeignKey (id)</B></FONT>
    </TD></TR>
  
  
  
    <TR><TD ALIGN="LEFT" BORDER="0">
    <FONT FACE="Roboto">relation_type</FONT>
    </TD><TD ALIGN="LEFT">
    <FONT FACE="Roboto">CharField</FONT>
    </TD></TR>
  
  
    </TABLE>
    >]




  // Relations

  templateapp_models_DatasetReference -> templateapp_models_Dataset
  [label=" dataset (datasetreference)"] [arrowhead=none, arrowtail=dot, dir=both];

  templateapp_models_DatasetRelation -> templateapp_models_Dataset
  [label=" left (left_relation)"] [arrowhead=none, arrowtail=dot, dir=both];

  templateapp_models_DatasetRelation -> templateapp_models_Dataset
  [label=" right (right_relation)"] [arrowhead=none, arrowtail=dot, dir=both];


}
class Dataset(models.Model):

    title = models.CharField(max_length=50)

class DatasetReference(models.Model):

    dataset = models.ForeignKey(Dataset, on_delete=models.CASCADE)
    description = models.CharField(max_length=400)
    uri = models.URLField(max_length=300)

class DatasetRelation(models.Model):

    left = models.ForeignKey(Dataset, on_delete=models.CASCADE, related_name="left_relation")
    right = models.ForeignKey(Dataset, on_delete=models.CASCADE, related_name="right_relation")
    relation_type = models.CharField(max_length=30)
Interpretation of relations

netCDF files can define global references and doi attributes. We will check here for common DOI patterns and use this to extract the uri for the DatasetReference (see the Graph tab above).

INSPIRE encodes references as MD_AggregateInformation with a specific DS_AssociationTypeCode, namely crossReference. So we will just use the MD_Identifier of these tags. If the MD_Identifier is listed as gmd:code, we will assume it’s a DOI and transform it to the corresponding URL, otherwise we take it as the description of the DatasetReference and try if we find a URL in it.