Basic discovery metadata content model

Basic discovery metadata content model#

The core of the CDIF profile for resource discovery is a set of implementation-independent content requirements that specify the required information to support a basic level of discovery interoperability for resources of any type. The following list includes the minimum required content for basic resource description, discovery, and access. This recommendation is a synthesis of various metadata schemes, including ISO 19115-1:2014, schema.org conventions from ESIPFed Science on Schema.org and Ocean Data net, DCAT, DCAT-AP, and FDO Kernel Attributes-2.0. A mapping between these various schemas and CDIF content elements is available in TBD. Note that these content requirements are scoped for a broad spectrum of resource types. It is expected that other fields will need to be added in extensions for specific kinds of resources.

Required#

If the content of a required element does not provide useful information, the metadata is considered useless for even the most rudimentary discovery use cases. Conformant metadata MUST provide valid values, i.e., a meaningful title that identifies the resource, either a URL or text statement of how to obtain the resource, a statement of any licensing, usage, or access constraints (i.e., Rights), and identifiers for the specification of the metadata serialisation and the type of the resource described.

  • Resource identifier (1 entry): A globally unique, resolvable identifier for the resource described by the metadata record.

  • Title (1 entry): Succinct (preferably <250 characters) name of the resource; should be sufficient to uniquely identify the resource for a human user.

  • Distribution: URL, Distribution object, or Access Instructions (1 entry): If the resource is a digital object accessible online, provide a URL that will retrieve the resource. If the resource has multiple representations, provide a Distribution Object documenting the various options with a URL and representation profile for each. Metadata for distributions through an API that allows query, filter, or processing as part of a data access request are described in the Queryable Distribution Interfaces (API) section, below. If the resource is not accessible online, provide a URL to a landing page used to access the resource, or minimally, provide a text description explaining how to access the resource in the metadata (Access Instructions).

  • Rights (1 to many entry): Information about required access permissions, licences, contractual requirements, use constraints, and security constraints. Might be described in text or through links to external documents. (See 6.4. Data Access for providing machine-actionable rights descriptions.)

  • Metadata profile identifier (1 to many): Identifier for metadata specification (profile) used to create this metadata record. Generally this will be populated automatically if the metadata is created using CDIF aware tools.

  • Resource type (1 to many): A scoped name (label with classification scheme) that specifies the kind of resource described by the metadata. The resource type might be used to determine validation requirements specific to descriptions for that kind of resource.

Required, but nilable#

These are content elements for which every resource should have useful information, but for which the information may not be available. A corresponding field should be included in each metadata record, but may have value ‘nil:missing’, ‘nil:unknown’ or similar nil value. Use ‘nil:notapplicable’ for Temporal Coverage, Geographic Extent or Statistical Variable when these are not applicable to the described resource.

  • Description (1 entry): Inform the reader about the resource’s content, context, provenance, and any other information deemed useful for future cross-domain usage. - Originators (1 to many entries): One or more parties (person or organisation) that have a role related to the origin of the resource, e.g., author or editor. Each party has a name (label), identifier, and optional contact information.

  • Modified Date (1 entry): Date (not temporal extent) when the most recent changes to the resource were completed. Use a “year” or ISO 8601 date and time format. Alternative date formatting must be machine-readable and consistent across all datasets.

  • Distribution Agent (1 entry):The party (person or organisation) to contact about accessing the resource. Each party has a name (label), identifier, and optional contact information. If there are multiple distribution options with different contact points, the Distribution Agent should be specified as part of the Distribution Object.

  • Statistical Variable (1 to many entries): Only applicable to datasets, otherwise nil:{reason}. A complete description of a dataset should include a list of the fields in the data, with each field mapped to a variable that is represented by the content of that field. Variable definitions should minimally specify the property represented by name. Identification of the property represented with a resolvable URI is strongly recommended. Variable descriptions should include documentation for how units of measure and reference systems for values are specified (see Universals ). Details of data structure and schema more closely related to interoperability, data integration, and usage than to data discovery are discussed in tbd. Describing Data to Make it “Integration-Ready”.

  • Temporal Coverage (1 entry). The time interval represented by or the subject of the described resource. This could be the time interval when data were collected, or an archaeological or geological time interval that is the subject of the resource. Need to account for clock time, calendar time (Gregorian, Julian, Hebrew, Islamic, Chinese, Mayan…), cyclical time (summer, first quarter, mating season, new moon, pay day) and for named time ordinal eras (Jurassic, Younger Dryas, Early Minoan I, Late Stone Age). See OWL Time.

  • Geographic Extent - horizontal (if applicable, 1 entry, minimum bounding rectangle or point): In order to support cross-domain searches based on geospatial location, location coordinates must be given in decimal degrees using the WGS 8486 datum. There are various other systems for describing location (see Space ); these can be provided as alternate location descriptions, recognizing that they might be meaningful to some metadata harvesting agents. Some resources may not be usefully described by a WGS 84 extent, in which case indicate nil:notapplicable; this would include extraterrestrial resources.

    • Bounding Rectangle: North Bounding Latitude, South Bounding Latitude, East Bounding Longitude, West Bounding Longitude. The minimum rectangle that completely contains the coverage extent for the resource content. Coordinate order and syntax are determined by the serialisation profile.

    • Point: Latitude, Longitude. A centroid point for the coverage extent of the resource, or the location of the resource content if a point location is appropriate. Coordinate order and syntax are determined by the serialisation profile.

    • Named location: Place name referenced to some gazetteer. Use scoped name pattern {label, authority, optional identifier} (see placename ).

Required for metadata management#

These elements provide essential information for the operation of a distributed catalogue system with harvesting of metadata between catalogue servers. Values should be populated automatically by metadata creation tools, requiring no user input. Nil values are allowed.

  • Metadata Date (1 entry): Last metadata update/creation date-time stamp in ISO 8601 date and time format. This may be automatically updated on metadata import if a metadata format conversion is necessary.

  • Metadata Contact Agent (1 entry): The party responsible for metadata content and accuracy; Agent object includes a name (label), identifier, and optional contact information

  • Metadata Identifier (1 entry): The identifier for the Digital object that contains the metadata.