Data Catalog Vocabulary (DCAT) vocabulary

Data Catalog Vocabulary (DCAT) vocabulary#

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. DCAT enables a publisher to describe datasets and data services in a catalog using a standard model and vocabulary that facilitates the consumption and aggregation of metadata from multiple catalogs. The current DCAT version 3 has been developed by the Dataset eXchange Working Group (DXWG) of the World Wide Web Consortium (W3C). DCAT has become very common in some areas: DCAT-AP and GeoDCAT have gained significant traction in Europe. DCAT-US promises to gain similar traction.

The scope of Schema.org is much broader than DCAT, and includes any business activity on the Web. DCAT is designed for describing data and thus has some features that are specifically useful for CDIF’s purposes. There are several different mappings between Schema.org and DCAT available, and some may be more appropriate than others for any particular implementation. CDIF recommends the mapping from the W3C DXWG group. The mapping in the context of DCAT version 2 is at https://ec-jrc.github.io/dcat-ap-to-schema-org/ or https://www.w3.org/TR/vocab-dcat-2/#dcat-sdo

https://w3c.github.io/dxwg/dcat/rdf/dcat-schema.ttl is a mapping between DCAT2 and SDO 3.4. axiomatized using the predicates rdfs:subClassOf, rdfs:subPropertyOf, owl:equivalentClass, owl:equivalentProperty, skos:closeMatch, and using the annotation properties sdo:domainIncludes and sdo:rangeIncludes to match Schema.org semantics.

There is an update for the mapping in the DCAT version 3 documentation

DCAT metadata is typically serialized using the Turtle format, so the implementation guideline here are based on that format.

Implementation of metadata content items#

The following table maps the metadata content items described in the Metadata Content Requirements section to the RDF terms defined by the DCAT specification for use in metadata serialization. The ‘Obl.’ column specifies the cardinality obligation for the property; ‘1’ means one value required; 1..* means at least one value is required; 0..* means the property is optional and more that one value can be provided.

CDIF content item

Obl.

DCAT v3 implementation

Scope note

Metadata identifier

1

ex:record-001
a dcat:CatalogRecord ;
foaf:primaryTopic ex:dataset-001 ;

ex:record-001 in the implementation has type ‘dcat:CatalogRecord’, and is a graph node that contains information about the metadata record for the a resource indicated by the foaf:primaryTopic. The use of dcat:CatalogRecord is considered optional. It is used to capture provenance information about entries in a catalog explicitly.

Resource identifier

1

ex:dataset1URI a dcat:Resource;
dcterms:identifier “string literal”;

The dcat:Resource (or subclass of dcat:Resource) dcterms:identifier is the identifer for the thing in world that is the subject of the DCAT record. The DCAT record is a representation of that thing as a digital object, and the subject of the ‘<uri> a dcat:Resource’ triple typically identifies the same thing, but that URI might dereference to GET a different representation for the thing in the world. OPEN DATA Wallonia-Brussels uses the dcterms:identifier with a string value for the URI that accesses a web page displaying data. The subject of the <uri> a dcat:Resource’ triple is like the JSON-LD @id value, which identifies a graph node that can be interpreted either as the thing in the world the graph node is about, or the JSON-LD (or other RDF serialization) object that is the web representation of that thing.

Title

1

a dcat:Resource;
dcterms:title “string literal”@lan;

A set of words that should uniquely identify the described resource for human use, in the scope of the metadata catalog containing this metadata record. Titles should be language localized with @lan tags; only one distinct title per @lan tag is allowed.

Distribution

1

a dcat:Resource;
dcat:landingPage “literal string URL”;

URL for a web location that provides information about the resource, generally expected to include information about how to get the resource. In general, a resource should have only one landing page.

a dcat:Dataset;
dcat:distribution [
a dcat:Distribution;
dcat:downloadURL “literal URL string” ];

URL for web location that will get a representation of the described resource. downloadURL is required. The Distribution object SHOULD include dcat:mediaType to specify the media type for the actual resource representation; if the distribution content is compressed or packaged, dcat:compressFormat or dcat:packageFormat SHOULD be specified as well. dcterms:conformsTo can be used to identify a specification that defines the full syntax and semantics for the resource content. Other properties in the dcat:Distribution object can be used to provide information about content size, rights, policies, etc. for the particular representation of the resource.

a dcat:Dataset;
dcat:distribution [
a dcat:Distribution;
dcat:accessURL “literal URL string”];

URL for a web location that provides information about how to get the resource. The accessURL might be the same as the landing page, but different distributions might offer different web applications (with different accessURLs) for different access methods or communities. The target of the URL MUST be a web page that can be displayed in standard web browsers.

Rights

1..*

DCAT uses several properties to convey information about access or usage limitations: dcterms:accessRights, dcterms:license and dcterms:rights, which all apply to any dcat:Resource or dcat:Distribution. ‘dcterms:license’ is used to associate a resource with a statement that is explicitly declared as a ‘license’. ‘dcterms:accessRights’ when the resource is associated with a statement denoting only access rights. ‘dcterms:rights for other cases, associating statements not concerning licensing conditions or access rights (e.g., copyright statements). CDIF requires that at least one of these rights properties is populated, either at the dcat:Resource level, applying to all distributions, or with a rights property on each dcat:Distribution.

a dcat:Resource OR dcat:Distribution;
dcterms:accessRights [
   rdf:type dcterms:RightsStatement ;
   rdfs:comment “literal rights statement”@en];
   OR <Rights statement URI>;

A statement or link to a statement associated with a resource or a specific resource distribution, denoting access rights, applicable to any distribution of the resource. Access rights can also be expressed as code lists / taxonomies. Examples include the access rights code list EUV-AR used in DCAT-AP and the Eprints Access Rights Vocabulary Encoding Scheme.

a dcat:Resource OR dcat:Distribution;
dcterms:license [
   rdf:type dcterms:LicenseDocument ;
   rdfs:comment “literal license statement or identifier”@en];
   OR <License URI>;

link to a statement associated with a resource that is explicitly declared as a ‘license’. Can be applied at the dcat:Resource or dcat:Distribution level. For interoperability, it is recommended to use canonical IRIs of well-known licenses such as those defined by Creative Commons.

a dcat:Resource OR dcat:Distribution;
dcterms:rights [
   rdf:type dcterms:RightsStatement ;
   rdfs:comment “literal rights statement”@en];
   OR <Rights statement URI>;

link to a statement associated with a resource for other types of rights statements, i.e. those that are not covered by dcterms:license and dcterms:accessRights, such as copyright statements. NOTE– odrl:hasPolicy property is available at the dcat:Resource or dcat:Distribution level if a formal ORDL rules statement is available (see example.

Metadata profile identifier

1

a dcat:CatalogRecord ;
foaf:primaryTopic <ex:dataset1URI> ;
dcterms:conformsTo <specification uri>;

The conformsTo property on the CatalogRecord specifies the metadata profile followed by the actual metadata record identified by foaf:primaryTopic. The dcat:CatalogRecord object is a separate node from the dcat metadata node whose subject is the described resource.

Metadata date

0..1

a dcat:CatalogRecord ;
foaf:primaryTopic <ex:dataset1URI> ;
dcterms:modified “date string”^^xsd:dateTime;

Use ISO8601 format. The most recent update date for the metadata content is specified in the modififed property on the CatalogRecord linked to the metadata record identified by foaf:primaryTopic.. Harvesters use this to determine if they have already harvested and processed this record.

Metadata contact

0..1

a dcat:CatalogRecord ;
foaf:primaryTopic <ex:dataset1URI> ;
dcat:contactPoint [
   a vcard:Kind;
   v:hasEmail <mailto:name@email.org> ;
   v:fn “full Name”;
   rdfs:label “Full Name”];

Not in DCAT CatalogRecord. Add a dcat:contactPoint property on CatalogRecord, based on open-world RDF. The vcard:Kind must be an individual or organization. Including rdfs:label is optional, but recommended for interoperability.

Resource type

1..*

rdf:type {DCAT class};
dcterms:type <uri for resource type>;

The nature or genre of the resource. The rdf:type Must be a DCAT class; currently Resource, Dataset, DatasetSeries, and DataCatalog are applicable for CDIF. A more sepecific type can be assigned using dcterms:type; the value SHOULD be taken from a well governed and broadly recognised controlled vocabulary. Use of Schema.org types will promote interoperability. Multiple types can be specified.

Description

0..1

a dcat:Resource;
dcterms:description “free text description of resource”;

Free text, with as much detail as is feasible

Originators

0..*

a dcat:Resource;
dcterms:creator [
   a foaf:Agent;
   foaf:name “name of agent”;
foaf:mbox <mailto:email@email.org>;
sdo:identifier <agent URI >; ]

The value is foaf:Agent, or foaf:Person or foaf:Organization (subclasses of foaf:Agent. foaf:name is required. The foaf spec provides properties for a name and e-mail address. DCAT examples use the adms:identifier, but since CDIF used schema.org, the schema.org identifier property is recommended. Value should be a PID. Use ORCID to identify person or ROR to identify organization if possible.

Publication Date

0..1

a dcat:Resource;
dcterms:issued “date string”^^xsd:dateTime;

Date on which the resource was made publicly accessible. Use ISO 8601 format.

Modification Date

1

a dcat:Resource;
dcterms:modified “date string”^^xsd:dateTime;

Date of most recent update to resource content. If Publication date is not provided, defaults to the Modification Date. Use ISO 8601 format.

GeographicExtent (named place)

0..*

a dcat:Dataset;
dcterms:spatial “string literal place name”
OR
a dcat:Dataset;
dcterms:spatial <location URI> “

To specify location with place name as a string or a URI (or IRI) from a gazetteer.

GeographicExtent (bounding box)

0..1

a dcat:Dataset;
dcterms:spatial [
a dcterms:Location ;
dcat:bbox “”POLYGON((
103.05 47.9 , 107.2 47.9 ,
107.2 53.5 , 103.0 53.5 ,
103.0 47.9 ))””^^geosparql:wktLiteral ;
] ;”

The range of dcat:bbox is rdfs:Literal. CDIF requires that the box geometry is encoded as a WKT literal (geosparql:wktLiteral, using WGS84. Coordinate pairs are {longitude latitude} decimal number pairs, with a space between coordinates. Commas separate coordinate pairs. The first and last coordinate must be the same to close the box. Coordinates are listed in a counterclockwise order around the box perimeter. CDIF recommends including only one bounding box; behavior of harvesting clients when multiple geometries are specified is unpredictable. See ESIP SOSO for more details.

GeographicExtent (point location)

0..1

a dcat:Dataset;
dcterms:spatial [
a dcterms:Location ;
locn:geometry “POINT (103.05 47.9)”^^geosparql:wktLiteral ;
];”

For a point location specification of the spatial extent of resource content. Note use of locn:geometry from the Location Vocabulary. Recommend including only one point; behavior of harvesting clients when multiple geometries are specified is unpredictable. Note DCAT reports point locations only as centroids; CDIF is more general– point locations might be centroids or any point within or near (in the case of intentionally spoofed locations) the resource location.

GeographicExtent (other serialization)

0..*

a dcat:Dataset;
dcterms:spatial [
a dcterms:Location ;
locn:geometry {see Location Vocabulary} ];

Optional geographic extent using other serialization for location. Other geometry schemes might be specified in a specific domain profile, e.g. for atmospheric, subsurface data, or local coordinate systems. These will likely not be interoperable across domains.

Variable measured

0..*

a dcat:Dataset;
sdo:variableMeasured [
   a sdo:PropertyValue;
   propertyID <pato:PATO_0000025>, <astm:prop/0405>;
  sdo:name “hostMineral”;
  sdo:description “….”; ….]”

Follow ESIPfed Science on Schema.org recommendation, see also discussion for representing more complex data structures in ESIPfed Experimental and the Data Integration module of CDIF. Variable must have a name and description, should have a propertyID with URI for the represented concept. The URI in the propertyID provides the semantic linkage for meaning of the variable. dcat does not have properties to specify variables/properties quantified in a cataloged resource.

Keyword

0..*

a dcat:Resource;
dcat:keyword “string literal”;

Implement with text for tags, free text words useful for indexing the resource.

a dcat:Resource;
dcat:theme <uri>
OR
a dcat:Resource;
dcat:theme [
<concept URI> a skos:Concept;
skos:prefLabel “term”@languageCode]

A main category of the resource. A resource can have multiple themes. Expectation is that the set of themes used to categorize resources are organized in a structured vocabulary describing all the categories and their relations in the catalog, e.g. skos:ConceptScheme, skos:Collection, owl:Ontology. Note dcat:theme in dcat OWL is an object property, the type of the object is not specified. In the example to the left the theme object is typed ‘skos:Concept’, but could be other type.

Temporal coverage

0..1

a dcat:Dataset;
dcterms:temporal
[ a dcterms:PeriodOfTime ;
dcat:startDate
  “2016-03-04”^^xsd:date ;
dcat:endDate
  “2018-08-05”^^xsd:date ;
];

Calendar data or clock time interval. rdfs:Literal encoded using the relevant ISO 8601 Date and Time compliant string DATETIME and typed using the appropriate XML Schema datatype XMLSCHEMA11-2, i.e. xsd:gYear, xsd:gYearMonth, xsd:date, or xsd:dateTime. The range of dcterms:temporal is expected to be PeriodOfTime; to specify a time instant the start and end should be the same. [tbd: add note on other temporal options offered by DCAT]

a dcat:Dataset;
dcterms:temporal [
a dcterms:PeriodOfTime ,
   time:ProperInterval ;
time:intervalStartedBy
   <isc:LowerDevonian>;
time:intervalFinishedBy
  <isc:LowerPermian>];

Time ordinal era interval, use owl:time namespace, time: http://www.w3.org/2006/time#. This example uses International chronostratigraphic chart ISC. See https://perio.do/en/ for identifiers for many other named time intervals.

a dcat:Dataset;
dcterms:temporal [
a dcterms:PeriodOfTime ,
     time:ProperInterval ;
time:hasBeginning [ a time:Instant ;
time:inTimePosition [ a time:TimePosition ;
time:hasTRS <gsmla:ma> ;
time:numericPosition “541.0”^^xsd:decimal
] ] ;
time:hasEnd [ a time:Instant ;
time:inTimePosition [ a time:TimePosition ;
time:hasTRS <gsmla:ma> ;
time:numericPosition “251.9”^^xsd:decimal
] ] ];

Temporal coverage for a geologic dataset, with interval bounds specified with numericPositions in millions of years before present. Namespace abbreviation: gsmla: http://resource.geosciml.org/classifier/cgi/geologicage/

Related agent - point of contact

a dcat:Resource;
  dcat:contactPoint [
    a vcard:Kind ;
   vcard:hasEmail <mailto:email@email.org> ;
   vcard:fn “Full Name” .] ;

DCAT defines a property on any dcat:Resource for the ‘point of contact’ role. Use Individual or Organization subclass of vcard:Kind.

Related agent - publisher

a dcat:Resource;
  dcat:publisher [
    a foaf:Agent ;
   foaf:mbox <mailto:email@email.org> ;
   foaf:name “Full Name” .] ;

DCAT defines a property on any dcat:Resource for the ‘publisher’ role. Use Person or Organization subclass of foaf:Agent.

Related agent with role

a dcat:Resource, prov:Entity;
  prov:qualifiedAttribution [
    a prov:Attribution ;
    prov:agent <agent URI> ;
    dcat:hadRole <role URI>
];

To assign roles to contributors like editor, maintainer, compiler, rightsOwner, etc.. Note PROV-O roles relate to activities, not entities. Therefore, DCAT defines a new property dcat:hadRole to attach a role to the association-class prov:Attribution between and entity and an agent. MARC relators provide many relationships between resources and agents.

Related agent - distributor

0..*

a dcat:Resource, prov:Entity;
prov:qualifiedAttribution [
   a prov:Attribution ;
   prov:agent <agent URI> ;
   dcat:hadRole <distributor role URI>
];

To assign an agent to a distributor role. Note PROV-O roles relate to activities, not entities. Therefore, DCAT defines a new property dcat:hadRole to attach a role to the association-class prov:Attribution between and entity and an agent. Note that in DCAT prov:qualifiedAttribution can only be used with dcat:Resource or subclass of dcat:resource, not with individual distributions. CDIF recommendation for the role uri is ‘http://id.loc.gov/vocabulary/relators/dst’.

a dcat:Dataset;
 dcat:distribution [
   a dcat:Distribution;
   sdo:provider [
    a sdo:Person OR sdo:Organization]
];

If there are multiple distributions with different providers, each distribution can have a separate provider. dcat:Distribution does not have pointOfContact or prov:qualifiedAttribution.

Related resources

0..*

a dcat:Resource;
<dcterms:relation or subProperty>
    <target resource URI>

dcterms:relation is used if the nature of the relationship between a cataloged resource and related resources is not known. More specific sub-properties of relation (dcterms:hasPart, dcterms:isPartOf, dcterms:conformsTo, dcterms:isFormatOf, dcterms:hasFormat, dcterms:isVersionOf, dcterms:hasVersion (and its sub-property dcat:hasVersion ), dcterms:replaces, dcterms:isReplacedBy, dcterms:references, dcterms:isReferencedBy, dcterms:requires, dcterms:isRequiredBy) SHOULD be used if the nature of the relationship of the link is known. These dcterms relation types will have to be mapped to linkRelationship values in schema.org to map between the schema. Note that the target of the relation should be a resolvable URI.

0..*

a dcat:Resource;
dcat:qualifiedRelation [
a dcat:Relationship ;
dcterms:relation <target resource URI> ;
dcat:hadRole <relationship type URI>
];

Representation of relationship that are not hard typed by dcterms or dcat, e.g. alternate, canonical, original, preview, stereo-mate, working-copy-of. Some of these roles are enumerated in the DS_AssociationTypeCodes values from ISO-19115-1, the IANA Registry of Link Relations IANA-RELATIONS, in the DataCite metadata schema, and included within the MARC relationships. Ideally a resolvable URI is available for the relationship role.

Funding

0..*

a dcat:Resource, prov:Entity;
  prov:qualifiedAttribution [
    a prov:Attribution ;
    prov:agent <agent URI> ;
    dcat:hadRole <role URI>
];

To assign roles to a funding instrument. Note PROV-O roles relate to activities, not entities. Therefore, DCAT defines a new property dcat:hadRole to attach a role to the association-class prov:Attribution between and entity and an agent. In this case the prov:agent should be a funding instrument (e.g. identified grant) under the auspices of a funding agency, and role should indicate that the agent is the provider of funding to create the resource

Policies

0..*

a dcat:Resource OR dcat:Distribution
odrl:hasPolicy [a odrl:Policy; ….];

DCAT provides a property odrl:hasPolich that has an ordl:Policy object as its targert, for both Resources, and individual distributions. More work is necessay to determ how policies like FDOF digitalObjectMutability, RDA digitalObjectPolicy, FDOF PersistencyPolicy can (or should) be expressed as ODRL policies and if there’s a better implemention of these.
An ODRL Policy MUST have one uid property value to identify the Policy, AND at least one permission, prohibition, or obligation property values of type Rule. (See the ODRL model Permission, Prohibition, and Obligation sections for more details.)

Checksum

0..1

a dcat:Distribution;
spdx:checksum [
a spdx:Checksum;
spdx:algorithm <algorithm URI>;
spdx:checksumValue “nnnn”^^xsd:hexBinary ];

A string value calculated from the content of the resource representation, used to test if content has been modified. Use Software Package Data Exchange (SPDX) property; The spdx Checksum object has two properties: algorithm and checksumValue. The checksum is a property of each distribution/Distribution