Resources:
This profile specifies metadata for describing quantitative data sets at a detailed level, sufficient to support the machine-to-machine exchange of data for processing, including links to all needed semantic artefacts (i.e., codelists, controlled vocabularies) for scientists to understand the data. The emphasis is on structural metadata describing a physical dataset instance, to enable parsing and re-organizing data for use. The profile covers the description of wide (“unit record”) data sets, long (event stream) data sets, and multi-dimensional data sets (“data cubes”). The profile uses Schema.org and DDI-CDI, with a reliance on the Codelist profile for describing enumerated value domains. Documentation of physical dataset structure that is reusable for description of many dataset instance is specified in the Data Structure profile.
Conformance to this profile entails populating all mandatory content from cdifCore, using recommended discovery properties, and providing the additional data description constraints. The implementation target is an rdf serialization, which is an open world logical model; users are thus free to add additional properties that they find useful for dataset documentation in their community, but these can be ignored by other users without penalty.
see also graphical presentation of Data Description Profile
Artefacts for the Data Description profile are in this Github repository (TBD--update link to release tag)
Requirements¶
This profile imports all requirements from CDIF Core and CDIF Data Discovery profile. This profile adds additional requirements:
Define the structure of the serialization used to deliver a specific dataset representation. Focus is on columnar data represented in tables (e.g. csv—any delimited text format.) and multidimensional data represented in structured binary formats (e.g. HDF5, NetCDF).
Required properties
Vocabularies used for enumerated domains
Locators for variable values within the physical data structure (column number, hdf path…).
Datatypes used to represent values
Domain for values, including substantive and sentinel values, or other restrictions on values (string length, regular expressions)
Roles of instance variable in the data structure, e.g. measure, unit identifer, attribute, dimension.
Primary key-- the variable(s) that uniquely identify each data instance
Linkage of attribute variable to variable(s) it qualifies.
Statistics on InstanceVariables
Implementation¶
class--Dataset properties added in Data Description Profile¶
cdif:hasPrimaryKey¶
Cardinality: Optional, Repeatable
JSON: cdif:Key
Description: Primary key of the dataset: a
cdif:Keywhosecdif:isComposedOfis an ordered list ofcdi:ComponentPositionwrappers. Each wrapper carriescdi:indexes(thecdi:InstanceVariableat that position, drawn fromschema:variableMeasured, inline or@id-reference) andcdi:value(the integer position in the key, 0- or 1-based). Together the wrappers identify each data instance. Matches the canonical DDI-CDI PrimaryKey structure defined inddi-cdif-data-structure.
cdif:statistics¶
Cardinality: Optional, Repeatable
JSON: cdi:Statistics, cdi:CategoryStatistics, or cdif:StatisticsCollection; inline or
@id-referenceDescription: Summary statistics describing the dataset’s values. Each entry is a
cdi:Statisticsbundle (one or more Statistic value objects, optionally weighted by an InstanceVariable, optionally broken down by Category), acdi:CategoryStatistics(per-category statistics), or acdif:StatisticsCollection(groups multiple Statistics nodes and records which InstanceVariables they index). Either inline a node here, or use an@id-reference to one declared elsewhere in the document.
class--InstanceVariable¶
A schema:variableMeasured item at the Data Description level is a CDIF profile of the DDI-CDI InstanceVariable. It composes the basic Discovery variableMeasured shape (PropertyValue-(variableMeasured)) and extends it with properties describing the variable’s data type, role, source, value domain, weighting, and summary statistics. The schema.org base properties on PropertyValue (@id, schema:name, schema:description, schema:alternateName, schema:propertyID, schema:measurementTechnique, schema:unitText, schema:unitCode, schema:minValue, schema:maxValue, schema:url) remain available unchanged; the additions below are CDIF-specific.
@type¶
Cardinality: Required, Repeatable
JSON: string.uri
Description: MUST include both
schema:PropertyValueandcdi:InstanceVariable. Additional types may be included.
cdif:physicalDataType¶
Cardinality: Optional, Repeatable
JSON: DefinedTerm, skos:Concept, or string
Description: Identifier or name for the data type concept describing the physical representation of values for this variable.
cdif:role¶
Cardinality: Optional
JSON: string (controlled-vocabulary entry)
Description: Specifies the role this variable plays in a data structure. Common values:
UnitIdentifier(names the unit a row describes),Measure(holds observed/derived values),Attribute(qualifies an observation),Dimension(addresses a position in a multi-dimensional value space).
cdif:simpleUnitOfMeasure¶
Cardinality: Optional
JSON: string, DefinedTerm, or skos:Concept
Description: Simple text-based unit of measure for the values of this variable. For a controlled-vocabulary unit entry, use
cdi:describedUnitOfMeasureinstead.
cdif:uses¶
Cardinality: Optional, Repeatable
JSON: DefinedTerm, skos:Concept, or string
Description: Essentially the same as
schema:propertyID. References to concepts that this variable measures or represents. When the dataset’s distribution carriescdi:isStructuredBy(CDIF Data Structure profile),cdif:usesconnects the InstanceVariable to a reusable RepresentedVariable concept.
cdif:isDescribedBy_StatisticsCollection¶
Cardinality: Optional
Description: The StatisticsCollection holding summary / category statistics for this InstanceVariable (InstanceVariable.isDescribedBy).
cdif:namespaced and target-suffixed because the DDI-CDIisDescribedByassociation is polymorphic.
cdi:function¶
Cardinality: Optional, Repeatable
JSON: DefinedTerm, skos:Concept, or string
Description: Immutable characteristic of the variable such as geographic designator, weight, temporal designation, etc. (InstanceVariable.function).
cdi:platformType¶
Cardinality: Optional
JSON: DefinedTerm, skos:Concept, or string
Description: The application or technical system context in which the variable has been realized -- typically a statistical processing package or processing environment (InstanceVariable.platformType).
cdi:source¶
Cardinality: Optional
JSON: object reference or string
Description: Reference capturing provenance information for this InstanceVariable (InstanceVariable.source).
cdi:hasIntendedDataType¶
Cardinality: Optional
JSON: xsdDataType, DefinedTerm, or skos:Concept
Description: The data type intended to be used by this variable, independent of its physical representation (RepresentedVariable.hasIntendedDataType). Recommended values are XML Schema datatypes; see xsdDataType.
cdi:describedUnitOfMeasure¶
Cardinality: Optional
JSON: DefinedTerm, skos:Concept, or string
Description: The unit in which the data values are measured, expressed as a controlled-vocabulary entry (RepresentedVariable.describedUnitOfMeasure). For a plain-string unit, use
cdif:simpleUnitOfMeasureinstead.
cdi:takesSentinelValuesFrom¶
Cardinality: Optional, Repeatable
JSON: cdif:SentinelValueDomain inline, or object reference (
@idonly)Description: Sentinel (missing / not-applicable) value domain(s) for this variable (RepresentedVariable.takesSentinelValuesFrom). The value MUST be a
cdif:SentinelValueDomainnode — referencing acdif:SubstantiveValueDomainhere is a schema violation. Added at the Data Description profile level; not present at the Discovery level; disallowed at the Data Structure level (where the property lives on the RepresentedVariable instead).
cdi:takesSubstantiveValuesFrom¶
Cardinality: Optional
JSON: cdif:SubstantiveValueDomain inline, or object reference (
@idonly)Description: The substantive value domain for this variable -- the set of valid, meaningful values (RepresentedVariable.takesSubstantiveValuesFrom). The value MUST be a
cdif:SubstantiveValueDomainnode — referencing acdif:SentinelValueDomainhere is a schema violation. Added at the Data Description profile level; same profile rules ascdi:takesSentinelValuesFromabove.
cdi:qualifies¶
Cardinality: Optional
JSON: object reference
Description: Reference to another InstanceVariable in this dataset that this variable qualifies (provides additional context for; e.g. a measurement-channel attribute qualifying a measure variable).
class--cdif:PhysicalMapping¶
Defines the physical realization of one field in a tabular or structured dataset distribution — the column index (for tabular), the locator (for structured/hierarchical formats like NetCDF/HDF5), the physical type, format pattern, length, null sequence, defaults, etc., and a cdif:formats_InstanceVariable reference linking the column or path back to the cdi:InstanceVariable it realises in the parent dataset’s schema:variableMeasured. Each item in a distribution’s cdif:hasPhysicalMapping array is one CdifPhysicalMapping node. When a WebAPI distribution’s schema:potentialAction/schema:result carries cdif:hasPhysicalMapping, the same shape applies to the response columns and the same @ids are referenced (a WebAPI response is another physical realization of the same conceptual variables; do not redeclare the InstanceVariables themselves on the result).
cdif:index¶
Cardinality: Optional (required for tabular text)
JSON: integer (≥ 0)
Description: Non-negative integer that orders the fields in the data structure (column number, 0-based). Required for
cdi:TabularTextDataSet; forcdi:StructuredDataSetusecdi:locatorinstead.
cdi:locator¶
Cardinality: Optional
JSON: string
Description: Path to the field inside a structured (hierarchical) physical container — for example a NetCDF/HDF5 group path like
/measurements/intensity, a JSON Pointer, or a Zarr array path. Used in place ofcdif:indexforcdi:StructuredDataSetdistributions where column-order positioning does not apply.
cdif:format¶
Cardinality: Optional
JSON: string
Description: Format pattern for the field — for numbers a token like
decimal,scientific,integer; for dates a pattern such asYYYY/MMorYYYY-MM-DDTHH:mm:ssZ; for booleans the literal token(s) used; etc.
cdi:numberPattern¶
Cardinality: Optional
JSON: string
Description: Number format pattern for the field (PhysicalMapping.numberPattern). Text-format properties (column width, decimal/digit-group separators, display label) live on the text-mapping shape below.
cdif:physicalDataType¶
Cardinality: Optional
JSON: string
Description: Name of the physical data type for the field as it appears in the file (e.g.,
float64,int32,string,dateTime). Distinct fromcdi:hasIntendedDataTypeon the InstanceVariable, which is the conceptual data type.
cdif:formats_InstanceVariable¶
Cardinality: Required (Warning if absent)
JSON: object reference (
@idto aschema:variableMeasureditem on the parent Dataset)Description: Links this column / path back to the
cdi:InstanceVariableit physically realises. The@idMUST match the@idof an item in the parent dataset’sschema:variableMeasured. SHACL warns if missing (the link is what makes the mapping useful).
cdi:length¶
Cardinality: Optional
JSON: integer
Description: Column width for fixed-width tabular text (text-mapping shape).
cdi:defaultDecimalSeparator¶
Cardinality: Optional
JSON: string
Description: Decimal separator used when not otherwise specified (text-mapping shape; TextMapping.defaultDecimalSeparator).
cdi:defaultDigitGroupSeparator¶
Cardinality: Optional
JSON: string
Description: Digit-group (thousands) separator (text-mapping shape; TextMapping.defaultDigitGroupSeparator).
cdif:displayLabel¶
Cardinality: Optional, Repeatable
JSON: string
Description: Human-readable label(s) for display of this field (text-mapping shape; CDIF plain-string simplification of DDI-CDI TextMapping.displayLabel).
cdi:nullSequence¶
Cardinality: Optional
JSON: string
Description: Literal token that represents a null/missing value for this field (e.g.,
NA,-9999, empty string). Becomes the null annotation for the described column.
cdi:defaultValue¶
Cardinality: Optional
JSON: string
Description: Default value substituted when the field is empty.
cdi:scale¶
Cardinality: Optional
JSON: integer
Description: Scale factor to apply to stored values to recover the conceptual value.
cdi:decimalPositions¶
Cardinality: Optional
JSON: integer
Description: Number of decimal positions (digits after the decimal separator) used to encode the value.
cdi:minimumLength, cdi:maximumLength¶
Cardinality: Optional
JSON: integer
Description: Bounds on the textual length of values for this field.
cdi:isRequired¶
Cardinality: Optional, default
falseJSON: boolean
Description: Whether a non-null value MUST be present in each row for this field.
class--cdif:SubstantiveValueDomain¶
The set of valid, meaningful values an InstanceVariable can take — distinct from sentinel (missing/not-applicable) codes, which live on a sibling cdif:SentinelValueDomain. Used as the value of cdi:takesSubstantiveValuesFrom. A single SubstantiveValueDomain node provides EITHER cdif:takesValuesFrom (an enumerated list of allowed values) OR cdif:recommendedDataType (one or more XSD data type tokens), or both.
@type¶
Cardinality: Required
JSON: string.uri array, MUST contain
cdif:SubstantiveValueDomain
@id¶
Cardinality: Optional
JSON: string.uri
Description: Identifier for this SubstantiveValueDomain node, used when the same domain is referenced from multiple InstanceVariables.
cdif:takesValuesFrom¶
Cardinality: Optional
JSON: cdif:EnumerationDomain inline, or object reference
Description: Enumerated list of allowed substantive values. Use when the value set is a closed vocabulary; combine with
cdif:recommendedDataTypeto additionally constrain the data type.
cdif:displayLabel¶
Cardinality: Optional
JSON: string
Description: Human-readable label for the domain (e.g., shown in UI).
cdif:recommendedDataType¶
Cardinality: Optional, Repeatable
JSON: xsdDataType
Description: One or more XSD data type tokens recommended for values from this domain. Required if
cdif:takesValuesFromis not provided; the SubstantiveValueDomain node MUST carry at least one ofcdif:takesValuesFromorcdif:recommendedDataType.
cdi:isDescribedBy¶
Cardinality: Optional
JSON: cdi:ValueAndConceptDescription inline, or object reference
Description: A
cdi:ValueAndConceptDescriptiongiving the formal description (value ranges, format/number pattern, regular expression, classification level, logical expression) of the values this domain admits.
class--cdif:SentinelValueDomain¶
The set of sentinel (missing / not-applicable / refusal / etc.) codes for an InstanceVariable, distinct from the substantive values the variable takes. Used as the value of cdi:takesSentinelValuesFrom. Same shape as cdif:SubstantiveValueDomain but typed cdif:SentinelValueDomain and intended for the non-substantive value codes (so survey “Don’t know” / “Refused” codes, sensor -9999-style fill values, etc. are represented separately from valid measurements).
@type¶
Cardinality: Required
JSON: string.uri array, MUST contain
cdif:SentinelValueDomain
@id¶
Cardinality: Optional
JSON: string.uri
cdif:takesValuesFrom¶
Cardinality: Optional
JSON: cdif:EnumerationDomain inline, or object reference
Description: Enumerated list of sentinel codes (e.g., a SKOS concept scheme of missing-value codes).
cdif:displayLabel¶
Cardinality: Optional
JSON: string
cdif:recommendedDataType¶
Cardinality: Optional, Repeatable
JSON: xsdDataType
Description: Same semantics as on
cdif:SubstantiveValueDomain. At least one ofcdif:takesValuesFromorcdif:recommendedDataTypeMUST be present.
cdi:isDescribedBy¶
Cardinality: Optional
JSON: cdi:ValueAndConceptDescription inline, or object reference
Description: Same semantics as on
cdif:SubstantiveValueDomain: acdi:ValueAndConceptDescriptiongiving the formal description of the sentinel values this domain admits.
class--cdi:ValueAndConceptDescription¶
A formal description of a set of values — value ranges, format / number patterns, regular expressions, classification level, and logical expressions. Used as the value of cdi:isDescribedBy on a cdif:SubstantiveValueDomain or cdif:SentinelValueDomain to constrain or describe the admissible values beyond (or instead of) an enumerated list.
@type¶
Cardinality: Required
JSON: string.uri array, MUST contain
cdi:ValueAndConceptDescription
@id¶
Cardinality: Optional
JSON: string.uri
Description: Identifier for this ValueAndConceptDescription node.
cdi:classificationLevel¶
Cardinality: Optional
JSON: string (one of
Continuous,Interval,Nominal,Ordinal,Ratio)Description: The measurement/relationship type of the representation: nominal, ordinal, interval, ratio, or continuous.
cdi:description¶
Cardinality: Optional
JSON: string
Description: A formal description of the set of values in human-readable language.
cdi:identifier¶
Cardinality: Optional
JSON: Identifier
Description: Identifier for objects requiring short- or long-lasting referencing and management.
cdi:formatPattern¶
Cardinality: Optional
JSON: skos:Concept
Description: A number/date format pattern as described in Unicode LDML (e.g.
#,##0.###for a decimal number, oryyyy.MM.dd G 'at' HH:mm:ss zzzfor a datetime).
cdi:logicalExpression¶
Cardinality: Optional
JSON: skos:Concept
Description: A logical expression whose satisfying values are the members of the valid value set (e.g. “all reals x such that x > 0”).
cdi:regularExpression¶
Cardinality: Optional
JSON: string
Description: A regular expression; strings matching it belong to the set of valid values.
cdi:minimumValueInclusive, cdi:minimumValueExclusive¶
Cardinality: Optional
JSON: string
Description: The minimum valid value, inclusive or exclusive respectively (per the W3C Tabular Data Metadata
minimum/minExclusiveannotations).
cdi:maximumValueInclusive, cdi:maximumValueExclusive¶
Cardinality: Optional
JSON: string
Description: The maximum valid value, inclusive or exclusive respectively (per the W3C Tabular Data Metadata
maximum/maxExclusiveannotations).
class--cdif:EnumerationDomain¶
A codification vocabulary documented as an enumerated value domain — typically a SKOS ConceptScheme listing the allowed values for a cdif:SubstantiveValueDomain or cdif:SentinelValueDomain. Provides a named extension point so that an EnumerationDomain can either declare an external concept scheme via cdif:references or be defined inline.
@type¶
Cardinality: Required
JSON: string.uri array, MUST contain
cdif:EnumerationDomain
@id¶
Cardinality: Optional
JSON: string.uri
cdif:identifier¶
Cardinality: Optional
JSON: Identifier
Description: Identifier for this enumerated (categorical) domain.
schema:name¶
Cardinality: Optional
JSON: string
Description: Human-understandable name (linguistic signifier, word, phrase, or mnemonic) for the domain.
cdif:references¶
Cardinality: Optional
JSON: SKOS ConceptScheme inline, or object reference
Description: SKOS concept scheme that contains the concepts defining the allowed values of this enumeration domain. Reference an external published vocabulary, or inline one. See skos:Concept for individual concept entries.
cdif:purpose¶
Cardinality: Optional
JSON: string
Description: Intent or reason for the enumeration domain (or for the description of the object).
class--cdif:Key¶
The CDIF profile of DDI-CDI PrimaryKey: an ordered set of cdi:InstanceVariable references that uniquely identify a data instance. Used as the value of cdif:hasPrimaryKey on the root Dataset. Each variable’s position in the key is recorded with an explicit cdi:ComponentPosition wrapper carrying cdi:indexes (the variable) and cdi:value (the integer position), matching the canonical DDI-CDI PrimaryKey structure defined in ddi-cdif-data-structure.
@type¶
Cardinality: Required --
cdif:Key, RepeatableJSON: string.uri
Description: MUST include
cdif:Key.
@id¶
Cardinality: Optional
JSON: string.uri
Description: Identifier for this Key node.
cdif:isComposedOf¶
Cardinality: Required, Repeatable
JSON: Array of cdi:ComponentPosition wrappers
Description: Ordered list of
cdi:ComponentPositionwrappers, one per key component. Each wrapper holdscdi:indexes(thecdi:InstanceVariableat that position -- inlinecdifInstanceVariableor@id-reference) andcdi:value(the integer position, 0- or 1-based).
class--cdi:ComponentPosition¶
Indexes a single component within a cdif:Key (or other ordered DDI-CDI component structure). Used as the items of cdif:isComposedOf on a cdif:Key: each wrapper pairs an InstanceVariable with its position number in the key.
@type¶
Cardinality: Required -- ‘cdi:ComponentPosition’, Repeatable
JSON: string.uri
Description: MUST include
cdi:ComponentPosition.
@id¶
Cardinality: Optional
JSON: string.uri
Description: Identifier for this ComponentPosition node.
cdi:indexes¶
Cardinality: Required
JSON: CdifInstanceVariable or object reference
Description: Reference to the
cdi:InstanceVariableat this position. Either an inlinecdifInstanceVariablenode or an@id-reference to one declared elsewhere (typically inschema:variableMeasured).
cdi:value¶
Cardinality: Required
JSON: integer
Description: Integer position of this component in the key, incrementing from 0 or 1.
class--cdif:StatisticsCollection¶
Groups one or more cdi:Statistics nodes. A typical use is a dataset-level collection holding row-count / mean / stddev Statistics for each measured variable. Referenced from a CdifInstanceVariable via cdif:isDescribedBy_StatisticsCollection, or from the root Dataset via cdif:statistics.
@id¶
Cardinality: Optional
JSON: string.uri
Description: Identifier for this StatisticsCollection node.
@type¶
Cardinality: Required -- ‘cdif:StatisticsCollection’, Repeatable
JSON: string.uri
Description: MUST include
cdif:StatisticsCollection.
cdif:has_Statistics¶
Cardinality: Required, Repeatable
JSON: cdi:Statistics or object reference
Description: Statistics nodes carried by this collection (inline or
@id-ref).cdif:namespaced and target-suffixed because the DDI-CDIcdi:hasassociation is polymorphic.
cdi:hasWeight¶
Cardinality: Optional
JSON: CdifInstanceVariable or object reference
Description: The InstanceVariable whose values were used as weights when computing the statistics in this collection.
cdif:indexedBy¶
Cardinality: Optional, Repeatable
JSON: CdifInstanceVariable or object reference
Description: CDIF addition (not in canonical DDI-CDI): the InstanceVariable(s) the contained Statistics index -- the collection-level coordinate space.
class--cdi:Statistics¶
A named bundle of one or more Statistic value objects for an instance variable, optionally weighted, optionally broken down by Category.
@id¶
Cardinality: Optional
JSON: string.uri
Description: Identifier for this Statistics node.
@type¶
Cardinality: Required -- ‘cdi:Statistics’, Repeatable
JSON: string.uri
Description: MUST include
cdi:Statistics.
cdi:statistic¶
Cardinality: Required, Repeatable
JSON: Array of Statistic value objects
Description: Ordered list of Statistic value objects carried by this bundle. Order is significant -- consumers MAY rely on array position.
cdi:typeOfStatistic¶
Cardinality: Optional
JSON: DefinedTerm, skos:Concept, or string
Description: Controlled-vocabulary entry naming the kind of statistic -- e.g. mean, median, count, sum, stdDev.
cdi:hasWeight¶
Cardinality: Optional
JSON: CdifInstanceVariable or object reference
Description: The InstanceVariable whose values were used as weights when computing the Statistic entries.
cdif:appliesTo¶
Cardinality: Optional, Repeatable
JSON: CdifInstanceVariable or object reference
Description: CDIF addition (not in canonical DDI-CDI): the InstanceVariable(s) this Statistics bundle summarizes -- the per-bundle “what these numbers describe” link.
cdif:has_CategoryStatistics¶
Cardinality: Optional, Repeatable
JSON: cdi:CategoryStatistics
Description: CategoryStatistics entries breaking this Statistics bundle down by Category.
cdif:namespaced and target-suffixed because the DDI-CDIcdi:hasassociation is polymorphic.
class--cdi:CategoryStatistics¶
Statistics for a specific Category of an instance variable within a dataset.
@id¶
Cardinality: Optional
JSON: string.uri
Description: Identifier for this CategoryStatistics node.
@type¶
Cardinality: Required -- ‘cdi:CategoryStatistics’, Repeatable
JSON: string.uri
Description: MUST include
cdi:CategoryStatistics.
cdi:for¶
Cardinality: Required
JSON:
cdi:Categorynode (a concept-like node typedcdi:Category, carryingcdif:name/cdif:definition/cdif:displayLabel/cdif:descriptiveText), or object referenceDescription: The Category this CategoryStatistics is for (inline
cdi:Categorynode or an@id-reference).
cdi:statistic¶
Cardinality: Required, Repeatable
JSON: Array of Statistic value objects
Description: Per-category Statistic value objects.
cdi:typeOfStatistic¶
Cardinality: Optional
JSON: DefinedTerm, skos:Concept, or string
Description: Controlled-vocabulary entry naming the kind of statistic.
cdi:hasWeight¶
Cardinality: Optional
JSON: CdifInstanceVariable or object reference
Description: The InstanceVariable whose values were used as weights.
Notes¶
Shared encoding patterns such as object reference, DefinedTerm, and PropertyValue are defined on the Common data types page.