In recent years, digital technologies have significantly changed the economic and social environment, affecting all areas of activity and the daily lives of the European citizens. Data is at the heart of this transition, and even more significant developments in this direction are expected in the coming years. The volume of data is growing at an extremely rapid pace worldwide. Sharing data in an easy, convenient, yet secure and reliable way is a key prerequisite for realizing the value of data and developing a data-driven economy. Contemporary paradigms for data sharing include a standardized approach to data exchange based on data access and usage policies. They set clear guarantees and mechanisms that enable organizations to exchange, manage, and use data in a secure, fair, and efficient manner.
The objective of the thesis is to develop a software system for automated regulatory compliance verification based on modelling data access and usage policies and metadata in data sharing environments. The modelling processes are implemented using semantic technologies with the use of shared dictionaries and ontologies.
The work begins with a theoretical study in the form of an overview of various specifications, standards, technologies and approaches for modelling and validation of data policies and metadata such as W3C Open Digital Rights Language (ODRL), which has established itself as a standard for modelling data policies, and W3C Shapes Constraint Language (SHACL), which is used to validate semantic data. With regard to the applicable legal aspects, the regulatory frameworks of the Artificial Intelligence Act, the Data Act, and the Directive on Copyright in the Digital Single Market are examined. A comparative analysis is made between the General-Purpose AI Code of Practice and the Ethics Guidelines for Trustworthy AI which shows that the two documents are complementary in terms of their objectives – while the Code of Practice has a more practical and applied focus on compliance with specific regulatory requirements, the Guidelines set a broader ethical framework on which specific compliance mechanisms can be built.
The main technologies, standards, and dictionaries used for the implementation of the software system are presented. Attention is paid to the role of RDF as a basic model for representing semantic data, as well as to the use of established ontologies and dictionaries such as DPV, DCAT, PROV-O, etc. Their joint use proves that sustainable solutions for working with semantic data are built by combining specifications, rather than using single models.
The thesis continues with a sequential transition through the stages of analysis, design, implementation, testing, and deployment, which represent the main phases of the software development life cycle. During the analysis phase, the main functional and non-functional requirements for the software system are defined, as well as the main use case scenarios resulting from the regulatory frameworks.
The design phase presents a modular architecture that allows for the independent development and testing of individual functionalities in the modules, while ensuring their integration into a comprehensive system. A specific ODRL profile and related data policies based on regulations, as well as sample metadata, have been modelled. Summaries of selected dictionaries and ontologies for modelling legal requirements are presented. The system includes three modules: for validating ODRL policies, for assessing the compatibility of metadata with the data obligations for providers of general-purpose AI models, and for assessing the interoperability of metadata with the requirements for data providers in data spaces. Two interfaces have been designed for access to the system – a graphical user interface (GUI) and an application programming interface (REST API).
The stages of implementation, testing, and deployment are considered in close interrelation with a focus on the practical application of the project decisions taken. The implementation is accompanied by unit and system testing to validate the correctness of the logic and the deployment demonstrates the readiness for experimental use. The software system can be used both as a standalone tool and in future development as an integrated component in a broader data space infrastructure, contributing to increasing transparency and reliability in data exchange processes.
The volume of data is growing at an extremely rapid pace worldwide. Sharing data in an easy, convenient, but at the same time secure and reliable way is the main prerequisite for realizing value from data and developing a data-based economy. Modern data sharing paradigms include a standardized approach to data exchange, based on data access and usage policies. They set clear guarantees and mechanisms that enable organizations to exchange, manage and use data in a safe, fair and efficient manner. To fully achieve this level of trust and protection, a comprehensive process for their modelling is necessary.
A particularly important element in data sharing is the semantic interoperability, which ensures that data is understood in the same way by all systems and organizations, regardless of their origin. Through standardized vocabularies of concepts, common data models and shared ontologies, policies ensure the interpretation of content in a similar way and facilitate their automated exchange.
The main motivation of the thesis is to facilitate and automate the processes of modelling and validation of policies for access and usage of data and metadata in the context of the diverse regulatory frameworks related to data and their use. Within the work of the thesis, a software application for assessing compliance with regulations and specifications will be developed. It will be divided into several modules - for compliance of data access and exchange policies with the standardized ODRL Information Model 2.2, for compatibility of metadata and the data exchange process with the obligations described in the Artificial Intelligence Act (Regulation (EU) 2024/1689) and in the Data Act (Regulation (EU) 2023/2854).
The main objective of the thesis is to develop a software system for automated compliance checking based on modeling of data access and usage policies and metadata in data sharing environments. The modeling processes will be implemented with semantic technologies using shared vocabularies and ontologies.
- Research and current state:
- Research of existing approaches and literature – analysis of available ways and solutions for modeling policies and metadata in the context of data spaces
- Introduction to the building blocks behind data spaces – laying the foundations behind data sharing spaces, including basic concepts presented through the draft international standard ISO/IEC DIS 20151 and the Dataspace Protocol 2025-1
- Research of parts of the current existing legal framework in the field – focusing on ЕU regulatory frameworks such as Regulation (EU) 2024/1689 (Artificial Intelligence Act), Regulation (EU) 2023/2854 (Data Act), Directive (EU) 2019/790 (Copyright Directive in Single Market)
- Introduction to the W3C Open Digital Rights Language and the methods of modeling policies, including by defining specific profiles (ODRL Profiles) in the form of ontologies - stepping on the standardized ODRL Information Model 2.2
- Analysis of the requirements for the software system - defining functional and non-functional requirements and specifying use cases of the system
- Design of data policies and metadata in a sample data space using existing vocabularies and ontologies - using semantic technologies, together with established vocabularies and ontologies such as ODRL Core Vocabulary, Data Catalog Vocabulary, Data Privacy Vocabulary, Data Quality Vocabulary, Dublin Core, PROV Ontology, etc.
- Development of a profile including specific concepts that are not covered in the published generally accepted vocabularies
- Development of a software system for conformity assessment - using technologies such as Python, Streamlit, RDFLib, pySHACL, etc. in the process of implementing the individual modules:
- module for validating ODRL policies for compliance with ODRL Information Model 2.2 and a specifically developed profile
- module for assessing the compatibility of metadata with the obligations of providers of general-purpose AI models, who have the role of data consumers in the data space (is given data suitable for training an AI model), in accordance with:
- Regulation (EU) 2024/1689 (Artificial Intelligence Act), Art. 53(1)(a) and Annex XI, Section 1(2)(c)
- Regulation (EU) 2024/1689 (Artificial Intelligence Act), Art. 53(1)(c) and Directive (EU) 2019/790, Art. 3 and Art. 4
- module for assessing the interoperability of metadata with the requirements for participants that are in the role of providers in data spaces, in accordance with Regulation (EU) 2023/2854 (Data Act), Art. 33(1)
- Summary of the obrained results and conclusion – formulation of recommendations for improvement and future work in the field





