Various DCAT tools for harvesting metadata from Belgian open data portals, converting metadata to DCAT-AP files and updating the Belgian data.gov.be portal.
The portal itself is a Drupal 10 website, based on Fedict's / BOSA's Openfed distribution.
Only interested in the result ? The N-Triples and XML files (DCAT-AP) used to update data.gov.be can be found in the dcat repository.
The DCAT-AP XML file is being used by the European Data portal.
These tools can be used with a Java runtime 17 or newer, on a headless machine, i.e. there is no fancy GUI.
Internet connection is obviously required, although a proxy can be used.
- Helper classes: for storing scraped pages locally, conversion tools etc.
- Various scrapers: getting metadata from various repositories and websites, and turning the metadata into DCAT files
- Also part of the scrapers are a series of SPARQL scripts to turn DCAT into DCAT-AP: e.g. map site-specific themes, add missing properties and prepare the files for updating data.gov.be
- Missing translations are added using the European eTranslate service, using a translation proxy
- Data.gov.be updater: update the data.gov.be (currently Drupal 10) website using the enhanced DCAT files
- Some tools: link checker, EDP converter tool
There is also separate, stand-alone RDF validator project which can be used to validate DCAT metadata, regardless if the metadata is to be published on data.gov.be or not.
- The various portals (except
all) should be harvested using the scrapers. - The enhanced files can be uploaded to the data.gov.be portal using the updater
- Then use
allenhancer to merge all the files from the various portals into one filedatagovbe.nt - Convert the merged file using the EDP tool to an XML file called
datagovbe_edp.xml - Upload both the
datagovbe.ntanddatagovbe_edp.xmlto github - This will be used as input for the European Data Portal (scheduled Thursday morning, every week)
See also the Notes
