Resources:
The core of the Cross Domain Interoperability Framerwork is a set of implementation-independent content that must be specified in any CDIF-conformant metadata. This core set is supplemented by a more extensive set of metadata properties that are expected to apply to any information resource of interest, but are optional in the model. These optional properties might not be applicable in some situations or, more commonly, are unknown, not available, or not provide for some reason.
This recommendation is a synthesis of various metadata schemes, including ISO 19115-1:2014, schema.org conventions from ESIPFed Science on Schema.org and Ocean Data net, DCAT, DCAT-AP, and FDO Kernel Attributes-2.0. These core content requirements are scoped for a broad spectrum of resource types; other fields will be added in the CDIF extension profiles.
Information Model¶
Required¶
If the content of a required element does not provide useful information, the metadata is considered useless for even the most rudimentary discovery use cases. Conformant metadata MUST provide valid values: an identifier for the described resource, a meaningful title that identifies the resource, either a URL or Distribution object (details later) that enables access to the resource, a statement of any licensing, usage, or access constraints (i.e., Rights), an identifier for the type of resource described in the metadata, and identifier(s) for the specification of the metadata serialisation.
Resource identifier (1 entry): A globally unique, resolvable identifier for the resource described by the metadata record.
Title (1 entry): Succinct (preferably <250 characters) name of the resource; should be sufficient to uniquely identify the resource for a human user.
Distribution: URL, Distribution object, or Access Instructions (1 entry): There are several options. If the resource is a single digital object accessible online, provide a URL that will retrieve the resource. If the resource has multiple representations, or to provide users more information about the resource representation, a Distribution Object should be used to document the various possible representations and component files with a URL for each. Metadata for distributions through an API that allows query, filter, or processing as part of a data access request are described in the Queryable Distribution Interfaces (API) section, below. If the resource is not accessible online, provide a URL to a landing page that describes how to access the resource.
Rights (1 to many entry): Information about required access permissions, licences, contractual requirements, use constraints, and security constraints. Might be described in text or through links to external documents.
Resource type (1 to many): A scoped name (label with classification scheme) that specifies the kind of resource described by the metadata. The resource type might be used to determine validation requirements specific to descriptions for that kind of resource.
Metadata profile identifier (1 to many): Identifier for metadata specification (profile) used to create this metadata record. Generally this will be populated automatically if the metadata is created using CDIF aware tools.
Recommended¶
Other properties that should be specified if possible and relevant. All are optional.
Description (0..1 entry): Inform users about the resource’s content, context, provenance, and any other information deemed useful for future cross-domain usage.
Originators (0 to many entries): One or more parties (person or organisation) that have a role related to the origin of the resource, e.g., author or editor. Each party has a name (label), identifier, and optional contact information.
Modified Date (0..1 entry): Date (not temporal extent) when the most recent changes to the resource were completed. Use ISO 8601 date and time format. Alternative date formatting must be machine-readable and consistent across all datasets.
Distribution Agent (0..1 entry):The party (person or organisation) to contact about accessing the resource. Each party has a name (label), identifier, and optional contact information. If there are multiple distribution options with different contact points, the Distribution Agent should be specified as part of the Distribution Object.
Checksum. (0 or 1): A string value calculated from the content of a digital object that allows verification that the content of the object has not been modified. Even insignificant changes to the content of the file will change its checksum. The algorithm used to calculate the checksum must be documented. See also RFC-6920 ‘Naming things with hashes’ that establishes ways to identify checksum algorithms and to represent checksum values as a URI. Note that checksums apply to specific digital objects, typically a unique resource representation. Non-digital resources do not have checksums; their representations can have checksums. See implementation notes in Appendix 1.
Funding. (0 to many entries): Cite funding sources (Grants, contracts...). Each source has a grant or contract identifier, source organisation, and label.
Keyword (0 to many entries): Distinguish ‘tags’ and ‘controlled terms’. Tags are simply words that a metadata creator thinks will be useful for users to identify resources of interest. Controlled terms are words defined in a vocabulary that minimally include the word (a fixed string to identify the term for humans) and a definition. Each term represents some concept. More semantically rich vocabularies would include resolvable identifiers, source information, and links to related terms (see Cox et al., 2021 ). One common set of relationships in a vocabulary is a kind-of hierarchy linking broader to narrower concepts. Controlled terms should minimally be represented with a label and scheme name that identifies the source vocabulary; ideally a term URI and scheme URI could be included for more accurate identification and data integration.
Policies (0 to many entries): Policies used in management of the described resource, including whether the content may be changed (mutable or immutable), any scheduled updates, what is the expected lifetime for resource availability, what (if any) is the maintenance schedule, versioning, documentation for changes and change requests. Explicit support for specific policy frameworks can be included (e.g., CARE).
Publication Date (0 or 1): Date (not temporal extent) when the resource was made accessible. Use a ‘year’ or ISO 8601 date and time format. Alternative date formatting must be machine-readable and consistent across all datasets. If no publication date is known, estimate the publication date range, enter the oldest year as the publication date, and include the estimated date range in the Description field.
Other related agents (0 to many entries): Recognition for others who have contributed to the production of the resource but are not recognized as authors/creators. Includes a variety of roles like maintainer, publisher, point of contact, copyright holder, contributor (see e.g. DataCite contributor types, ISO19115-1 role code )
Related resources (0 to many entries): Links to related data, publications, annotation, data sources, software used, etc. Links have at least a label, relationship type, and resolvable target resource identifier.
Version (0 or 1): If the resource is versioned, specify the label for this version. Version labels should follow a scheme that allows alphanumeric sorting reflecting the order of version release.
Provenance (0..many): For discovery, provide information about datasets that were used in the creation of the described resource and specify sensors, platforms, software, algorithms etc. used to aquire information contained in the resource. Details about workflows, activity sequences, association of sensors etc. with specific variables, individuals associate with particular activities in workflow etc. require used of cdif prov extension (TBD).
Properties for metadata management¶
These elements provide information for the operation of a distributed catalogue system with harvesting of metadata between catalogue servers. Values should be populated automatically by metadata creation tools, requiring no user input. Some providers might not include this information in metadata interchange files.
Metadata Date (0..1 entry): Last metadata update/creation date-time stamp in ISO 8601 date and time format. This may be automatically updated on metadata import if a metadata format conversion is necessary.
Metadata Contact Agent (0..1 entry): The party responsible for metadata content and accuracy; Agent object includes a name (label), identifier, and optional contact information
Metadata Identifier (0..1 entry): The identifier for the Digital object that contains the metadata.
Implementation¶
The current recommended implementation uses the schema.org vocabulary, with a few entities and properties from other vocabularies to fill gaps; see Implementation of metadata content items. For background on JSON, JSON-LD and general implementation patters CDIF is using, see Schema.org implementation notes.
- Cox, S. J. D., Gonzalez-Beltran, A. N., Magagna, B., & Marinescu, M.-C. (2021). Ten simple rules for making a vocabulary FAIR. PLOS Computational Biology, 17(6), e1009041. 10.1371/journal.pcbi.1009041