Implementation of metadata content items#
The following table maps the metadata content items described in the Metadata Content Requirements section to the schema.org JSON-LD keys to use in metadata serialization. Some example metadata documents follow. The ‘Obl.’ column specifies the cardinality obligation for the property; ‘1’ means one value required; 1..* means at least one value is required; 0..* means the property is optional and more that one value can be provided. Properties with path from “subjectOf” describe the metadata.
| CDIF content item | Obl. | Schema.org implementation | Scope note |
|---|---|---|---|
| Metadata identifier | 1 | "subjectOf"/"@id":{URI} or "@id":{uri} in node with "identifier":"@id" of the node containing the resource description | The URI for the metadata record should be the \@id value for the 'subjectOf' element in the JSON instance document tree or "@id":{uri} in a separate graph node with "identifier":"@id" of the node containing the resource description |
| Resource identifier | 1 | "identifier":{URI} | The URI for the resource that is the subject of the metadata record should be the "identifier": value for the root of the JSON instance document tree |
| Title | 1 | "name":{string} | A set of words that should uniquely identify the described resource for human use, in the scope of the metadata catalog containing this metadata record. |
| Distribution | 1 | "url":{URL} | If metadata is about a single digital object |
| "distribution": { "@type": "DataDownload", "contentURL": {URL },\... } |
If the metadata is about an abstract, non-digital, or physical resource that has multiple distributions, with different URL, encodingFormat, conformsTo properties. Each distribution is considered a distinct digital object. The dataDownload MUST include the contentURL, and SHOULD include encodingFormat, dcterms:conformsTo to specify the media type and specification or profile documenting the specific serialization conventions for the download content. | ||
| Rights | 1..* | "license":{text or URI} Or "conditionsOfAccess":{text or URI} |
URL to license document or text explanation of restrictions on use. There might be multiple links to documents specifying related security, privacy, usage, sharing, etc... concerns. |
| Metadata profile identifier | 1 | "subjectOf"/"dcterms:conformsTo": {identifier} | Use Dublin Core terms property. The value for Base CDIF metadata is 'CDIF_basic_1.0' [tbd; this should be a PID]. Different profiles extending this must define unique identifier strings to use here. Note that the schema.org schemaVersion is used to indicate the version of the schema.org vocabulary, but in general this is not needed for CDIF. |
| Metadata date | 0..1 | "subjectOf"/"dateModified":{Date or DateTime} | Use ISO8601 format. The most recent update date for the metadata content. Harvesters use this to determine if they have already harvested and processed this record. |
| Metadata contact | 0..1 | / "subjectOf"/"maintainer":{Person or Organization} | Should include a name and contact point (institutional e-mail is best) for the agent responsible for metadata content. This is the contact point to report problems with metadata content. Person and Organization are Agent objects with various properties. |
| Resource type | 1 | "@type":{schema.org type} | Use the most specific [Schema.org resource type](https://schema.org/docs/full.html) that is applicable. Multiple value can be provided but they must be logically consistent. |
| 0..* | "additionalType": [{DefinedTerm or URI}, ...] | If a more specific resource type needs to be specified, add a text or URI value here that identifies the type. MUST be consistent with the \@type. To simplify parsing, always encode as an array. | |
| Description | 0..1 | "description": {string} | Free text, with as much detail as is feasible |
| Originators | 0..* | "creator" : [{Person or Organization}, ...] | The value is a schema.org person or organization. To simplify parsing, always encode as an array. Use ORCID or other PID to identify person or organization where possible |
| Publication Date | 0..1 | "datePublished" : {date time} | Date on which the resource was made publicly accessible. Use ISO 8601 format. |
| Modification Date | 1 | "dateModified" : {date time} | Date of most recent update to resource content. If Publication date is not provided, defaults to the Modification Date. Use ISO 8601 format. |
| GeographicExtent (named place) | 0..* | "spatialCoverage": { "@type": "Place", "name": {string} or {schema:DefinedTerm} } |
To specify location with place names; if the names are from a gazeteer, use the schema:DefinedTerm to provide a name, identifier, and inDefinedTermSet to fully document the concept. |
| GeographicExtent (bounding box) | 0..1 | "spatialCoverage": { "@type": "Place", "geo": { "@type": "GeoShape", "box": "39.3280 120.1633 40.445 123.7878" } } |
For bounding box specification of the spatial extent of resource content. See [ESIP SOSO for details](https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#bounding-boxes). Recommend including only one bounding box; behavior of harvesting clients when multiple geometries are specified is unpredictable. |
| GeographicExtent (curvilinear trace) | 0..1 | "spatialCoverage": { "@type": "Place", "geo": { "@type": "GeoShape", "line": "39.33 120.77 40.44 123.96 41.00 121.34" } } |
For resource related to a linear trace like a ship track or airplane flight line |
| GeographicExtent (point location) | 0..1 | "spatialCoverage": { "@type": "Place", "geo": { "@type": "GeoCoordinates", "latitude": 39.3280, "longitude": 120.1633 } } |
For a point location specification of the spatial extent of resource content. Recommend including only one point; behavior of harvesting clients when multiple geometries are specified is unpredictable. |
| GeographicExtent (other serialization) | 0..* | "geosparql:hasGeometry": { "@type": "sf#Point", "geosparql:asWKT": "@type":#wktLiteral", "@value":"POINT(-76 -18)"}, "Geosparql:crs": {"@id":"CRS84"} } | Optional geographic extent using other more interoperable geometries, GeoSPARQL us recommended, see Ocean InfoHub. (Note URIs in example are truncated...) Other geometry schemes might be specified in a specific domain profile, e.g. for atmospheric, subsurface data, or local coordinate systems. |
| Distribution Agent | 0..* | "provider":{Person or Organization} | Contact point for the provider of a distribution. For a simple digital object with a download URL, or a resource with multiple distributions all from the same provider. |
| 0..* | "distribution": [ { "@type": "DataDownload","provider":{Person or Organization} }...] | If there are multiple distributions with different providers, each distribution can have a separate provider | |
| Variable (PropertyValue) | 0..* | "variableMeasured": [ { "@type":"PropertyValue", "@id": "astm:var0011", "propertyID": [ "pato:PATO_0000025", "astm:prop/0405" ], "name": "hostMineral", "description": "...." }...] |
Follow ESIPfed Science on Schema.org recommendation, see also discussion for representing more complex data structures in ESIPfed Experimental and the Data Integration module of CDIF. Variable must have a name and description, should have a propertyID with URI for the represented concept. The URI in the propertyID provides the semantic linkage for meaning of the variable. |
| Variable (StatisticalVariable) | 0..* | "variableMeasured": [ { "@type":"StatisticalVariable", "@id": "astm:var0011", "@type": "StatisticalVariable", "measuredProperty": {"@type":"Property", "identifier":"astm:id/305978", "name":"Average age"}] |
Statistical variable offers properties useful for describing social science statistical variables like populationType and statType. Use of StatisticalVariable is preferred for variables with values calculated from some aggregation process. |
| Keyword | 0..* | "keywords": [ {string}, {"@type":"DefinedTerm", "name": "OCEANS", "inDefinedTermSet": "gcmd:sciencekeywords", "identifier": "gcmd:concept/916b....6167d" },...] |
Implement with text for tags, and schema:DefinedTerm for keywords from a controlled vocabulary. The DefinedTerm approach is used to represent concepts. |
| Temporal coverage | 0..* | Temporal coverage can be expressed in several ways: a calendar/clock dateTime or date time interval using ISO8601 serialization, a named time ordinal era, an interval bounded by time ordinal era, or with a numeric coordinate in a temporal reference system. | |
| "temporalCoverage": "2018-01-22" | Calendar data or clock time instant use ISO8601 encoding | ||
| "temporalCoverage": "2012-09-20/2016-01-22" | Calendar data or clock time interval use ISO8601 encoding | ||
| "temporalCoverage": [{ "@type":"time:ProperInterval", "time:intervalStartedBy": "isc:LowerDevonian, "time:intervalFinishedBy": "isc:LowerPermian" }] |
Time ordinal era interval, use owl:time namespace, time: http://www.w3.org/2006/time#. This example uses International chronostratigraphic chart, isc. See PeriodO for identifiers for many other named time intervals. | ||
| "temporalCoverage": [{ "time:ProperInterval- 345/298 Ma" }] |
For time interval specified using geologic ages, in Ka, Ma or Ga; The text string is an abbreviated owl time interval (proposal, under discussion) | ||
| Related agents (contributor role) | 0..* | "contributor": [ {Person or Organization}, ... ] | Recognition for others who have contributed to the production of the resource but are not recognized as authors/creators. |
| Related agent (other role) | "contributor": {"@type": "Role", "roleName": "Principal Investigator", "contributor": {"@type": "Person", "@id": "https://orcid.org/...", "name": "John Doe", "affiliation": {"@type": "Organization", "@id": "https://ror.org/...", "name": "..."}, "contactPoint": {"@type": "ContactPoint", "email": "john.chodacki@ucop.edu"} |
To assign roles to contributors like editor, maintainer, publisher, point of contact, copyright holder (e.g. DataCite contributor types), use the rather convoluted role construction defined by schema.org | |
| Related resources | 0..* | "relatedLink": [{"@type":"LinkRole", "linkRelationship": "...", "target: {"@type": "EntryPoint", "encodingType": "text/html", "name": "...", "url": "https://example.org/data/stations" } } ] |
Use schema.org relatedLink with a LinkRole value, and the link URL in a 'target' EntryPoint object. These properties expect WebPage and Action as their domain, so the schema.org validator will throw a warning (not an error). Related resource links are useful for evaluation and use of data, but because of the wide variety of relationship possibilities, difficult to use in general search scenarios. Use a soft-type implementation, with a link relationship type using a schema:DefinedTerm, and a resolvable identifier for the relationship target. |
| Funding | 0..* | "funding" : { "@id": "URI for grant", "@type": "MonetaryGrant", "identifier": "grant id", "name": "grant title", "funder": { "@id": "ror for org", "@type": "Organization", "name": "org name", "identifier": [ "other identifiers" ] } } | Use schema.org encoding and science on schema.org pattern. Other organization properties can be included in the funder/Organization. |
| Policies | 0..* | "publishingPrinciples": [ {"@type": "CreativeWork"}.... ] | FDOF digitalObjectMutability, RDA digitalObjectPolicy, FDOF PersistencyPolicy. Policies related to maintenance, update, expected time to live. |
| Checksum | 0..1 | "distribution\": \[ { \"@type\": \"DataDownload\", \"spdx:checksum\": { "spdx:algorithm":"string", "spdx:checksumValue":"string" },.. }\...\] |
A string value calculated from the content of the resource representation, used to test if content has been modified. No schema.org property, follow DCAT v3 adoption of [Software Package Data Exchange (SPDX)](https://spdx.org/rdf/terms/) property; The [spdx Checksum object](https://spdx.org/rdf/spdx-terms-v2.1/classes/Checksum___-238837136.html) has two properties: algorithm and checksumValue. The checksum is a property of each distribution/DataDownload. |
| Provenance for discovery is limited to documenting technology used in the creation of the dataset and documening other datasets (datasets) that were inputs to the content of the described resource. | |||
| Provenance (instruments, software etc.) | 0..* | "prov:wasGeneratedBy": { "@type": "prov:Activity", "prov:used": [ "nerc:collection/L05/current/134", "nerc:collection/B76/current/B7600031" ] }, | Identify sensors, instruments, platforms, software, algorithms etc. used in the creation of the described resource | Provenance (input datasets) | |0..* |
"prov:wasDerivedFrom": [ "http://doi.org/10.547/347848", "http://doi.org/10.3578/h5ls", "http://doi.org/10.547/93578" ], | " |
| Quality information for discovery: A text statement documenting quality of the resource should be included in the sdo:description. If there are quality policies or certificates that apply, these should be specified in the sdo:policies. Quality measurement or assessment protocols that have an output result specific to this resource can be specified using dqv:hasQualityMeaurement | |||
| Quality measure | 0..* | "dqv:hasQualityMeasurement": [ { "@type": "dqv:QualityMeasurement", "dqv:isMeasurementOf": "nerc:collection/L27/current/ARGO_QC", "dqv:value": "good" }, { "@type": "dqv:QualityMeasurement", "dqv:isMeasurementOf": "imf:dsbb/2003/eng/dqaf.htm", "dqv:value": "http://linkToASpecificQualityReport" }] | Quality assesment or measument conducted using procedure or protocol specified by the dqv:isMeasurementOf property, with result value specified in the dqv:value property. The result might be numeric, a categorical term, or a link to a document describing the quality assessment. |