Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Data Description Profile

Resources:

This profile specifies metadata for describing quantitative data sets at a detailed level, sufficient to support the machine-to-machine exchange of data for processing, including links to all needed semantic artefacts (i.e., codelists, controlled vocabularies) for scientists to understand the data. The emphasis is on structural metadata describing a physical dataset instance, to enable parsing and re-organizing data for use. The profile covers the description of wide (“unit record”) data sets, long (event stream) data sets, and multi-dimensional data sets (“data cubes”). The profile uses Schema.org and DDI-CDI, with a reliance on the Codelist profile for describing enumerated value domains. Documentation of physical dataset structure that is reusable for description of many dataset instance is specified in the Data Structure profile.

Conformance to this profile entails populating all mandatory content from cdifCore, using recommended discovery properties, and providing the additional data description constraints. The implementation target is an rdf serialization, which is an open world logical model; users are thus free to add additional properties that they find useful for dataset documentation in their community, but these can be ignored by other users without penalty.

see also graphical presentation of Data Description Profile

Artefacts for the Data Description profile are in this Github repository (TBD--update link to release tag)

Requirements

This profile imports all requirements from CDIF Core and CDIF Data Discovery profile. This profile adds additional requirements:

Implementation

class--Dataset properties added in Data Description Profile

cdif:hasPrimaryKey

cdif:statistics

class--InstanceVariable

A schema:variableMeasured item at the Data Description level is a CDIF profile of the DDI-CDI InstanceVariable. It composes the basic Discovery variableMeasured shape (PropertyValue-(variableMeasured)) and extends it with properties describing the variable’s data type, role, source, value domain, weighting, and summary statistics. The schema.org base properties on PropertyValue (@id, schema:name, schema:description, schema:alternateName, schema:propertyID, schema:measurementTechnique, schema:unitText, schema:unitCode, schema:minValue, schema:maxValue, schema:url) remain available unchanged; the additions below are CDIF-specific.

@type

cdif:physicalDataType

cdif:role

cdif:simpleUnitOfMeasure

cdif:uses

cdif:isDescribedBy_StatisticsCollection

cdi:function

cdi:platformType

cdi:source

cdi:hasIntendedDataType

cdi:describedUnitOfMeasure

cdi:takesSentinelValuesFrom

cdi:takesSubstantiveValuesFrom

cdi:qualifies

class--cdif:PhysicalMapping

Defines the physical realization of one field in a tabular or structured dataset distribution — the column index (for tabular), the locator (for structured/hierarchical formats like NetCDF/HDF5), the physical type, format pattern, length, null sequence, defaults, etc., and a cdif:formats_InstanceVariable reference linking the column or path back to the cdi:InstanceVariable it realises in the parent dataset’s schema:variableMeasured. Each item in a distribution’s cdif:hasPhysicalMapping array is one CdifPhysicalMapping node. When a WebAPI distribution’s schema:potentialAction/schema:result carries cdif:hasPhysicalMapping, the same shape applies to the response columns and the same @ids are referenced (a WebAPI response is another physical realization of the same conceptual variables; do not redeclare the InstanceVariables themselves on the result).

cdif:index

cdi:locator

cdif:format

cdi:numberPattern

cdif:physicalDataType

cdif:formats_InstanceVariable

cdi:length

cdi:defaultDecimalSeparator

cdi:defaultDigitGroupSeparator

cdif:displayLabel

cdi:nullSequence

cdi:defaultValue

cdi:scale

cdi:decimalPositions

cdi:minimumLength, cdi:maximumLength

cdi:isRequired

class--cdif:SubstantiveValueDomain

The set of valid, meaningful values an InstanceVariable can take — distinct from sentinel (missing/not-applicable) codes, which live on a sibling cdif:SentinelValueDomain. Used as the value of cdi:takesSubstantiveValuesFrom. A single SubstantiveValueDomain node provides EITHER cdif:takesValuesFrom (an enumerated list of allowed values) OR cdif:recommendedDataType (one or more XSD data type tokens), or both.

@type

@id

cdif:takesValuesFrom

cdif:displayLabel

cdif:recommendedDataType

cdi:isDescribedBy

class--cdif:SentinelValueDomain

The set of sentinel (missing / not-applicable / refusal / etc.) codes for an InstanceVariable, distinct from the substantive values the variable takes. Used as the value of cdi:takesSentinelValuesFrom. Same shape as cdif:SubstantiveValueDomain but typed cdif:SentinelValueDomain and intended for the non-substantive value codes (so survey “Don’t know” / “Refused” codes, sensor -9999-style fill values, etc. are represented separately from valid measurements).

@type

@id

cdif:takesValuesFrom

cdif:displayLabel

cdif:recommendedDataType

cdi:isDescribedBy

class--cdi:ValueAndConceptDescription

A formal description of a set of values — value ranges, format / number patterns, regular expressions, classification level, and logical expressions. Used as the value of cdi:isDescribedBy on a cdif:SubstantiveValueDomain or cdif:SentinelValueDomain to constrain or describe the admissible values beyond (or instead of) an enumerated list.

@type

@id

cdi:classificationLevel

cdi:description

cdi:identifier

cdi:formatPattern

cdi:logicalExpression

cdi:regularExpression

cdi:minimumValueInclusive, cdi:minimumValueExclusive

cdi:maximumValueInclusive, cdi:maximumValueExclusive

class--cdif:EnumerationDomain

A codification vocabulary documented as an enumerated value domain — typically a SKOS ConceptScheme listing the allowed values for a cdif:SubstantiveValueDomain or cdif:SentinelValueDomain. Provides a named extension point so that an EnumerationDomain can either declare an external concept scheme via cdif:references or be defined inline.

@type

@id

cdif:identifier

schema:name

cdif:references

cdif:purpose

class--cdif:Key

The CDIF profile of DDI-CDI PrimaryKey: an ordered set of cdi:InstanceVariable references that uniquely identify a data instance. Used as the value of cdif:hasPrimaryKey on the root Dataset. Each variable’s position in the key is recorded with an explicit cdi:ComponentPosition wrapper carrying cdi:indexes (the variable) and cdi:value (the integer position), matching the canonical DDI-CDI PrimaryKey structure defined in ddi-cdif-data-structure.

@type

@id

cdif:isComposedOf

class--cdi:ComponentPosition

Indexes a single component within a cdif:Key (or other ordered DDI-CDI component structure). Used as the items of cdif:isComposedOf on a cdif:Key: each wrapper pairs an InstanceVariable with its position number in the key.

@type

@id

cdi:indexes

cdi:value

class--cdif:StatisticsCollection

Groups one or more cdi:Statistics nodes. A typical use is a dataset-level collection holding row-count / mean / stddev Statistics for each measured variable. Referenced from a CdifInstanceVariable via cdif:isDescribedBy_StatisticsCollection, or from the root Dataset via cdif:statistics.

@id

@type

cdif:has_Statistics

cdi:hasWeight

cdif:indexedBy

class--cdi:Statistics

A named bundle of one or more Statistic value objects for an instance variable, optionally weighted, optionally broken down by Category.

@id

@type

cdi:statistic

cdi:typeOfStatistic

cdi:hasWeight

cdif:appliesTo

cdif:has_CategoryStatistics

class--cdi:CategoryStatistics

Statistics for a specific Category of an instance variable within a dataset.

@id

@type

cdi:for

cdi:statistic

cdi:typeOfStatistic

cdi:hasWeight

Notes

Shared encoding patterns such as object reference, DefinedTerm, and PropertyValue are defined on the Common data types page.