diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..603cdea
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,5 @@
+.vscode
+
+vendor
+
+.phpunit.result.cache
\ No newline at end of file
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..1b40876
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,52 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+
+- This **[CHANGELOG.md](https://github.com/gigascience/ro-crate-php/blob/Alex/CHANGELOG.md)** to document the essential changes of this project.
+- Missing thumbnail entity in **[index.php](https://github.com/gigascience/ro-crate-php/blob/Alex/src/index.php)**.
+- Usage guide, particularly for using with GigaDB datasets.
+- GigaDB example of metadata file and preview using the dataset 102736 in assets directory in the repository.
+- PHPCS for coding standard checks (PSR12) through composer.
+- Docker-compose configuration for running tests.
+- Preliminary HTML render with PHP documentation.
+- Additional unit tests for the ROCrate class.
+- RO-Crate exception class.
+- Partial validation of RO-Crate metadata when saving the manipulated crate.
+- Chaining capability for adding and removing entities.
+- Inline PHPDoc documentation for main code, i.e. the ROCrate class.
+- Basic unit test cases for ROCrate class.
+- Contributing documentation.
+- Code of conduct document.
+- CODEOWNERS file with GigaScience developers as owners.
+
+### Fixed
+
+- Multiple typos in **[Guide.md](https://github.com/gigascience/ro-crate-php/blob/Alex/Guide.md)** and **[README.md](https://github.com/gigascience/ro-crate-php/blob/Alex/README.md)**.
+- Minor formatting issues in generated HTML previews.
+- String concatenation and variable naming issues.
+- PHPCS warnings about coding style.
+- PHPDoc inline comment typos.
+- ISO 8601 DateTime validation's issues.
+
+### Changed
+
+- Update **[Guide.md](https://github.com/gigascience/ro-crate-php/blob/Alex/Guide.md)** to specify that it is primarily designed for the latest RO-Crate v1.2 but not earlier versions.
+- Enhance generated HTML formatting with type hyperlinks to [schema.org](https://schema.org/) definitions as specified in the **[RO-Crate v1.2 context](https://www.researchobject.org/ro-crate/specification/1.2/context.jsonld)**.
+- Abstract property-add/remove-pair methods to hide formatting details.
+- Replace Person class with a generic class implementation.
+- Update composer.json name and license information.
+- Improve **[README.md](https://github.com/gigascience/ro-crate-php/blob/Alex/README.md)** with installation section updates and downstream task compatibility.
+
+### Removed
+
+- Person class to be replaced with generic implementation.
+- Some unnecessary and development-only comments.
+
+[unreleased]: https://github.com/gigascience/ro-crate-php/tree/Alex
\ No newline at end of file
diff --git a/Guide.md b/Guide.md
new file mode 100644
index 0000000..193b874
--- /dev/null
+++ b/Guide.md
@@ -0,0 +1,84 @@
+
+# Usage Guide for ro-crate-php
+Below are some notes to pay attention to when developers are using the tool to manipulate RO-Crate Metadata file. The note is GigaDB-oriented. This note is written to help ease the use of the tool to create or manipulate the RO-Crate Metadata file concerning GigaDB datasets by removing unnecessary and only emphasizing relevant technical details about the specific standard of RO-Crate 1.2. Please note that the implementation of the tool and this guide is 1.2-oriented, and some constraints in standard of RO-Crate 1.1 are no longer required in the standard of RO-Crate 1.2. To facilitate the use with other existing RO-Crate applications based on the standard of 1.1, a RO-Crate file has to be built in accordance with the requirements in the standard of 1.1.
+
+---
+
+## Overview
+This is a PHP tool to create and manipulate Research Object Crate. Please refer to the repository's *[README.md](https://github.com/gigascience/ro-crate-php/tree/main)* for more details. Below are the high-level steps instructing the creation of the metadata file for a GigaDB dataset from scratch. The created file may not be perfect but ought to be able to provide sufficient description of the dataset. An example created following the flow is in the assets directory of the repository above.
+
+**Version**: [1.0]
+**Last Updated**: [2025-08-18]
+
+---
+
+## Note
+The general rule is that we use the @id construct (true flag if using the add/removePropertyPair methods) when referring to another entity, we otherwise use a plain literal (false flag if using the add/removePropertyPair methods). There are exceptions for specific constructs not following the rules.
+
+Another reminder is to add the entity to the crate after the creation of the entity.
+
+Also, only one entity with the same ID has to be created.
+
+In addition, name of an entity should be human-readable if it exists.
+
+The metadata file always has ro-crate-metadata.json as the @id. The preview file has ro-crate-preview.html as the @id and filename. In detached package, i.e. the metadata file is not within the package, which is most likely for GigaDB, the filename ro-crate-metadata.json is renamed to xxxx-ro-crate-metadata.json, e.g. xxxx can be the dataset ID.
+
+---
+
+## Step 1
+- **Initialization of the Crate**: Create the empty crate, then set the profile to specify the context version and the root data entity ID.- **Initialization of the Root Data Entity**: Set ID, name, description, datePublished, i.e. the date of first publication, and sdDatePublished, i.e. the date on which the current structured data was generated or published. The dates are is ISO 8601 standard, e.g. YYYY-MM-DD. For GigaDB, the dataset is most likely to be web-based, the ID has to be an absolute URI, e.g. **[https://gigadb.org/dataset/102736](https://gigadb.org/dataset/102736)**.
+- **Specification of the Components**: Specify the ID of the files, dataset such as zip file using hasPart, possibly using the \# directory construct to collectively describe many files. Refer to **[Step 2](#step-2)** for handling the data entities of these files and datasets and potentially any entities derived from them. Note that metadata file and the preview file, if it exists, are specially treated and not included in hasPart.
+- **Specification of the License**: Specify the ID of the license using license, e.g. *[https://creativecommons.org/publicdomain/zero/1.0/](https://creativecommons.org/publicdomain/zero/1.0/)* for the CC0 v1.0 license. Refer to **[Step 3](#step-3)** for handling the contextual entity of the license.
+- **Specification of the Thumbnail**: Specify the ID of the thumbnail using thumbnail. The ID is recommended to be the corresponding downloadable PNG. Refer to **[Step 4](#step-4)** for handling the contextual entity of the thumbnail.
+- **Specification of the Publisher and sdPublisher**: Specify the ID of the publisher and sdPublisher using publisher and sdPublisher, e.g. *[https://gigadb.org/](https://gigadb.org/)* for GigaDB being the publisher and sdPublisher. Refer to **[Step 5](#step-5)** for handling the contextual entity of the publisher and sdPublisher.
+- **Specification of the Identifier and Cite-as**: Specify the identifier of the crate using identifier as an @id. As a special construct, we also include the identifier one by one as a plain string using cite-as. The identifier should be chosen to be persistent and resolvable in this way from a URI, which is commonly possible for a GigaDB dataset that has its doi. For example, it can be **[https://doi.org/10.4225/59/59672c09f4a4b](https://doi.org/10.4225/59/59672c09f4a4b)**. Refer to **[Step 6](#step-6)** for handling the contextual entity of the identifier.
+- **Specification of Extra/Additional Information**: In case there is metadata that cannot be precisely described using existing properties, there is a special construct for it. Specify an exifData using a local identifier such as \#extraInfo as an @id. Refer to **[Step 7](#step-7)** for handling the contextual entity of the exifData. In a GigaDB dataset, information of the root dataset including Dataset type , Additional information , Additional information , Additional information , Additional information , Additional information , Additional information , Additional information , Additional information , Github links , Github links , Github links , Github links , Accessions (data not in GigaDB) and History can be wrapped by this construct. Note that this construct also works for other entities, e.g. Awardee and Award ID used with the organization entity for the funder, or Extra Information used with different file entities.
+- **Specification of Citation**: In case the dataset cites publications like other datasets or papers, we have to include this information by specifying the ID of the publication using citation. Note that the ID has to be a URL (for example a DOI URL). In case of citing another dataset/crate, the ID should be chosen to be the @id value of the identifier property of that crate instead of the actual ID of that crate. Refer to **[Step 8](#step-8)** for handling the contextual entity of the citation of the publication.
+- **Specification of Authors**: Specify the IDs of the author(s) one by one using author. For a GigaDB dataset, ORCID is usually picked as the ID for an author. For example, it may be **[https://orcid.org/0000-0001-9083-6757](https://orcid.org/0000-0001-9083-6757)**. Refer to **[Step 9](#step-9)** for handling the contextual entity of each of the author(s).
+- **Specification of Funders**: Here the assumption that no information about an explicit associated research project is present is made. Specify the ID of the funder using funder, which happens to be the case for some of the GigaDB datasets. For a gigaDB dataset, the ID is often selected to be a ror, for instance, **[https://ror.org/011kf5r70](https://ror.org/011kf5r70)**. Refer to **[Step 10](#step-10)** for handling the contextual entity of the funder.
+- **Specification of Keywords**: Specify the keyword(s) of the root dataset using keywords as a plain string that concatenates all keywords with comma as the delimiter. As a special construct, together with the use of keywords property, we have to specify the IDs of these keyword(s) one by one using about as @id's. Such ID is usually a url that explains the corresponding keyword, for example, **[https://nanoporetech.com/](https://nanoporetech.com/)** for the keyword of oxford nanopore technologies. Refer to **[Step 11](#step-11)** for handling the contextual entity of the about property.
+
+## Step 2
+- **File**: Create a File entity with the respective ID, which has to be an absolute URI for a web-based entity. For a GigaDB dataset, it is most likely web-based, the ID is often selected to be the url that directly downloads the file. Then, we set the name, contentSize and encodingFormat. Note that the contentSize is either in kB or MB. Also, note that the encodingFormat is a plain string xxx/yyy, for instance, text/csv. In some cases that a more informative encodingFormat of the form xxx/yyy followed by a **[PRONOM](https://www.nationalarchives.gov.uk/PRONOM/Default.aspx)** identifier, for example, ["application/pdf", {"@id": **["https://www.nationalarchives.gov.uk/PRONOM/fmt/19"]("https://www.nationalarchives.gov.uk/PRONOM/fmt/19")**}]. Additionally, we can include some extra information including data types and file attributes using the exifData construct.
+- **Directory/Dataset/zip file**: Create a Dataset entity with the respective ID, which has to be an absolute URI. Such URI should resolve to a listing of the content of the directory/dataset/zip file. For a GigaDB dataset, it is most likely a web-based zip file, the ID is often selected to be the url that shows its description, for example, **[https://gigadb.org/dataset/view/id/102736/Files_page/4](https://gigadb.org/dataset/view/id/102736/Files_page/4)**. Then, we set the name, description, distribution and releaseDate. Note that the distribution is the url that downloads the content, for example, **[https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip](http://127.0.0.1:5501/assets/ro-crate-preview.html#https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip)**. Also, note that the releaseDate should be in the ISO 8601 format. Furthermore, we can include some extra information including data types and file attributes using the exifData construct..
+- **Collective Construct with \#**: In case that we prefer describing some files or/and directories collectively, we create a Dataset entity with a local identifier as the ID, for example, \#other-files. Then, we set the name and description.
+
+## Step 3
+- **License Creation**: Create a contextual entity with the respective ID of type CreativeWork, then set the name and description of the license, where the description may have to be searched or recorded online.
+
+## Step 4
+- **Thumbnail Handling**: When the thumbnail is incidental to the root dataset, usually the case, we do not include it in the hasPart of the root data entity and creates a File entity with the respective ID.
+
+## Step 5
+- **Publisher and sdPublisher Handling**: Create an Organization entity with the respective ID, then set the name and description of the organization, where the name and the description may have to be searched or recorded online. Also, set the contactPoint with usually the email following *mailto:*, e.g. **[mailto:database@gigasciencejournal.com](mailto:database@gigasciencejournal.com)**. Then, create a contactPoint entity with this respective ID, and set the contactType, email and identifier. For the case of the example ID, the email and identifier can share a plain string database@gigasciencejournal.com, while the contactType may be a plain string saying the contact of the publisher.
+
+## Step 6
+- **Identifier Handling**: Create a contextual entity of type PropertyValue with the respective ID, then set the propertyID, value and url. For example, the propertyID is **[https://registry.identifiers.org/registry/doi](https://registry.identifiers.org/registry/doi)** given the ID of the identifier being a doi. In case of a doi's ID of **[https://doi.org/10.4225/59/59672c09f4a4b](https://doi.org/10.4225/59/59672c09f4a4b)**, the value is set to be a plain string of doi:10.5524/102736. The url is often chosen to be identical to the ID of the identifier.
+
+## Step 7
+- **exifData Handling**: Create a contextual entity with the respective ID of type PropertyValue, then set the name and value of the entity, where the name is the property name and the value is the property value as if such property existed in the context.
+
+## Step 8
+- **Citation Handling**: We will discuss the two cases when the publication is another dataset and a paper.
+-- **Another Dataset/Crate**: Create a Publication entity of type CreativeWork with the respective ID, add an additional type of Dataset. Then, set the property conformsTo to be the version-less generic RO-Crate profile **[https://w3id.org/ro/crate](https://w3id.org/ro/crate)**. Note that we do not set hasPart and usually other properties for the entity representing the another crate, since its content and further metadata is available from its own RO-Crate Metadata Document.
+-- **A Paper**: Create a Publication entity of type ScholarlyArticle with the respective ID. then set the name. Also, set the author, identifier, issn, journal, datePublished and creditText, if any. Note that author can has more than one value and datePublished should be in ISO 8601 format.
+
+## Step 9
+- **Author Handling**: Create a Person entity with the respective ID, then set the affiliation and the name. The affiliation should refer to an Organization entity. In case that such entity does not exist yet, we create an Organization entity with the respective ID, then set the name, where the name may have to be searched or recorded online. For a GigaDB dataset, a ror is often picked as the ID for the organization, for instance, **[https://ror.org/01ej9dk98](https://ror.org/01ej9dk98)**.
+
+## Step 10
+- **Funder Handling**: Create an Organization entity with the respective ID, then set the identifier, name and description. The identifier is always to be the same as the ID, and the description is Funding Body in this case. Additionally, we can use the exifData construct to include the information regarding the Awardee and the Award ID.
+
+## Step 11
+- **About Handling**: If the respective ID is an url, we create a contextual entity of type URL with the respective ID and set the name of the entity.
+
+---
+
+## Remark
+There are other ways to create a RO-Crate Metadata document for a GigaDB dataset. This only serves as a rather minimal possible way to construct the document, where not all possible metadata of all entities are included. For manipulating an existing metadata document, we can similarly refer to these steps to look for missing parts.
+
+---
+
+
+
+
diff --git a/README.md b/README.md
index 6cb7c9f..cd5cc33 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,55 @@
# ro-crate-php
-Tool in PHP for manipulating RO-crate objects
+
+This is a PHP tool to create and manipulate Research Object Crate.
+
+## Install
+
+Install the tool using composer:
+>composer require gigascience/ro-crate-php
+
+## Docs
+Inline phpDoc comments are written.
+
+## Usage
+
+Create a new empty crate with the base path set to resources directory in the parent directory:
+
+> $crate = new ROCrate(\_\_DIR\_\_ . '/../resources', false);
+
+The `ROCrate` constructor enables the creation of a crate using an existing metadata file:
+
+> $crate = new ROCrate(\_\_DIR\_\_ . '/../resources', true);
+
+Add an entity to the crate:
+> // A person
+> $author = $crate->createGenericEntity('#alice', ['Person']);
+> $author->addProperty('name', 'Alice Smith');
+> $author->addProperty('affiliation', 'Institution of Example');
+> // Add the person to the crate
+> \$crate->addEntity($author);
+>
+> // Adds the person as one of the creators of the root data entity, i.e. the dataset being described by the crate
+> $root = $crate->getRootDataset();
+> $root->addPropertyPair('creator', '#alice', true);
+
+Interact with the crate just like normal objects with methods:
+> \$crate->addEntity($author);
+> \$crate->removeEntity($author->getId());
+
+RECOMMENDED: Chain up the methods to enhance the compactness of the code when adding/removing properties of an entity:
+> $root->addPropertyPair('creator', '#bob', true)
+> ->addPropertyPair('creator', '#cathy')
+> ->removePropertyPair('creator', '#alice')
+> ->addPropertyPair('creator', '#alice');
+
+The methods of addPropertyPair and removePropertyPair abstract away the details about the file formatting. The user only has to provide the key for the property of the entity and the value to be added to or removed from the property. An optional boolean argument is to tell whether the value should be treated as a plain literal with false or as an identifier referencing other entity in the crate. By default, the flag is set to be comply with the previous value of this property if any.
+
+## HTML Rendering
+Use the following code to generate a human-readable HTML preview from the RO-Crate Metadata File.
+> ROCratePreviewGenerator::generatePreview(\_\_DIR\_\_ . '/../resources');
+
+## Change Log
+Please refer to **CHANGELOG.md** in the repository.
+
+## GigaDB Example
+An example based on GigaDB dataset 102736 is generated using the code in index.php to illustarte how the tool can be utilized with the general steps in **Guide.md** in the repository. The inforamtion to generate the metadata file is mainly from the *[Website](https://gigadb.org/dataset/102736)*. The example ro-crate-metadata.json and ro-crate-preview.html are stored in the assets directory. To make the file easier to read, only the first 2 files and a zipped file treated as a directory are included, where the remaining files are described using a collective directory construct. To ensure integribility and compatibility with other/downstream applications, the metadata file of this example is imported into **[RoHub](https://www.rohub.org/3543b082-9077-492e-a4c7-a3b7c8bb39e8?activetab=overview)** for testing, where the about property of the metadat descriptor and the id of the root data entity are replaced with ./ to be backward compatible to the RO-Crate 1.1 standard adopted by **[RoHub](https://www.rohub.org/about?what_is_rohub)**.
\ No newline at end of file
diff --git a/assets/ro-crate-metadata.json b/assets/ro-crate-metadata.json
new file mode 100644
index 0000000..30f60e2
--- /dev/null
+++ b/assets/ro-crate-metadata.json
@@ -0,0 +1,506 @@
+{
+ "@context": "https://w3id.org/ro/crate/1.2/context",
+ "@graph": [
+ {
+ "@id": "ro-crate-metadata.json",
+ "@type": [
+ "CreativeWork"
+ ],
+ "conformsTo": {
+ "@id": "https://w3id.org/ro/crate/1.2"
+ },
+ "about": {
+ "@id": "https://gigadb.org/dataset/102736"
+ }
+ },
+ {
+ "@id": "https://gigadb.org/dataset/102736",
+ "@type": [
+ "Dataset"
+ ],
+ "identifier": {
+ "@id": "https://doi.org/10.5524/102736"
+ },
+ "cite-as": "https://doi.org/10.5524/102736",
+ "name": "Supporting data for \"Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing.\"",
+ "description": "Polyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.",
+ "datePublished": "2025-07-29",
+ "sdDatePublished": "2025-07-29",
+ "publisher": {
+ "@id": "https://gigadb.org/"
+ },
+ "sdPublisher": {
+ "@id": "https://gigadb.org/"
+ },
+ "license": {
+ "@id": "https://creativecommons.org/publicdomain/zero/1.0/"
+ },
+ "thumbnail": {
+ "@id": "https://assets.gigadb-cdn.net/live/images/datasets/32d9369e-500d-5347-8842-9fe46cdc3693/102736.png"
+ },
+ "hasPart": [
+ {
+ "@id": "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/readme_102736.txt"
+ },
+ {
+ "@id": "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/boostnano_no_dorado_R1_tails.csv"
+ },
+ {
+ "@id": "https://gigadb.org/dataset/view/id/102736/Files_page/4"
+ },
+ {
+ "@id": "#other-files"
+ }
+ ],
+ "author": [
+ {
+ "@id": "https://orcid.org/0000-0001-9083-6757"
+ },
+ {
+ "@id": "#Xuan_Yang"
+ },
+ {
+ "@id": "https://orcid.org/0000-0003-0337-8722"
+ },
+ {
+ "@id": "#Benjamin_Reames"
+ },
+ {
+ "@id": "https://orcid.org/0000-0003-1155-0959"
+ },
+ {
+ "@id": "https://orcid.org/0000-0002-4300-455X"
+ }
+ ],
+ "citation": {
+ "@id": "https://doi.org/10.5524/100425"
+ },
+ "funder": {
+ "@id": "https://ror.org/011kf5r70"
+ },
+ "exifData": [
+ {
+ "@id": "#datasetTypes"
+ },
+ {
+ "@id": "#additionalInfo1"
+ },
+ {
+ "@id": "#additionalInfo2"
+ },
+ {
+ "@id": "#additionalInfo3"
+ },
+ {
+ "@id": "#additionalInfo4"
+ },
+ {
+ "@id": "#additionalInfo5"
+ },
+ {
+ "@id": "#additionalInfo6"
+ },
+ {
+ "@id": "#additionalInfo7"
+ },
+ {
+ "@id": "#additionalInfo8"
+ },
+ {
+ "@id": "#githubLink1"
+ },
+ {
+ "@id": "#githubLink2"
+ },
+ {
+ "@id": "#githubLink3"
+ },
+ {
+ "@id": "#githubLink4"
+ },
+ {
+ "@id": "#accessions"
+ },
+ {
+ "@id": "#history"
+ }
+ ],
+ "keywords": "oxford nanopore technologies, poly(a) tail, estimation, segmentation, direct rna sequencing",
+ "about": {
+ "@id": "https://nanoporetech.com/"
+ }
+ },
+ {
+ "@id": "https://assets.gigadb-cdn.net/live/images/datasets/32d9369e-500d-5347-8842-9fe46cdc3693/102736.png",
+ "@type": [
+ "File"
+ ]
+ },
+ {
+ "@id": "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/readme_102736.txt",
+ "@type": [
+ "File"
+ ],
+ "name": "readme_102736.txt",
+ "contentSize": "9.30 kB",
+ "encodingFormat": "text/txt",
+ "exifData": {
+ "@id": "#oneExtra"
+ }
+ },
+ {
+ "@id": "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/boostnano_no_dorado_R1_tails.csv",
+ "@type": [
+ "File"
+ ],
+ "name": "boostnano_no_dorado_R1_tails.csv",
+ "contentSize": "317.24 kB",
+ "encodingFormat": "text/csv",
+ "description": "PolyA tail lengths as found by Boostnano for R1 sequins which were filtered out by Dorado but kept by Boostnano; underlying data for figure 3",
+ "exifData": {
+ "@id": "#twoExtra"
+ }
+ },
+ {
+ "@id": "https://gigadb.org/dataset/view/id/102736/Files_page/4",
+ "@type": [
+ "Dataset"
+ ],
+ "name": "BoostNano-master",
+ "description": "Archival copy of the GitHub repository https://github.com/haotianteng/BoostNano downloaded 18-July-2025. BoostNano, a tool for preprocessing ONT-Nanopore RNA sequencing reads.This project is licensed under the MPL 2.0 license. Please refer to the GitHub repo for most recent updates.",
+ "distribution": {
+ "@id": "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip"
+ },
+ "releaseDate": "2025-07-23",
+ "exifData": {
+ "@id": "#zipExtra"
+ }
+ },
+ {
+ "@id": "https://doi.org/10.5524/100425",
+ "@type": [
+ "CreativeWork",
+ "Dataset"
+ ],
+ "conformsTo": {
+ "@id": "https://w3id.org/ro/crate"
+ }
+ },
+ {
+ "@id": "#oneExtra",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Extra Information",
+ "value": "Data Type: Readme, File Attributes: MD5 checksum: 450ef019cf8ba58beb644ef18d1411d0"
+ },
+ {
+ "@id": "#twoExtra",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Extra Information",
+ "value": "Data Type: Tabular data, File Attributes: MD5 checksum: 97ee210d263c783e4ddfe20352831d60 Figure in MS: 3"
+ },
+ {
+ "@id": "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip",
+ "@type": [
+ "DataDownload"
+ ],
+ "encodingFormat": [
+ "application/zip",
+ {
+ "@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"
+ }
+ ],
+ "contentSize": "2.44 MB"
+ },
+ {
+ "@id": "#zipExtra",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Extra Information",
+ "value": "Data Type: GitHub archive, File Attributes: MD5 checksum: 4b4d2ce7259e5045d89b731b7bfcf730 SWH: swh:1:snp:ee789638699e0e33ca3b1d09da5bb1f485ea7c70 license: MPL 2.0"
+ },
+ {
+ "@id": "#other-files",
+ "@type": [
+ "Dataset"
+ ],
+ "name": "other files",
+ "description": "This dataset contains too many files that are not individually described"
+ },
+ {
+ "@id": "https://orcid.org/0000-0001-9083-6757",
+ "@type": [
+ "Person"
+ ],
+ "affiliation": {
+ "@id": "https://ror.org/01ej9dk98"
+ },
+ "name": "Chang JJ"
+ },
+ {
+ "@id": "https://ror.org/01ej9dk98",
+ "@type": [
+ "Organization"
+ ],
+ "name": "The University of Melbourne"
+ },
+ {
+ "@id": "#Xuan_Yang",
+ "@type": [
+ "Person"
+ ],
+ "affiliation": {
+ "@id": "https://ror.org/01ej9dk98"
+ },
+ "name": "Yang X"
+ },
+ {
+ "@id": "https://orcid.org/0000-0003-0337-8722",
+ "@type": [
+ "Person"
+ ],
+ "affiliation": {
+ "@id": "https://ror.org/05x2bcf33"
+ },
+ "name": "Teng H"
+ },
+ {
+ "@id": "https://ror.org/05x2bcf33",
+ "@type": [
+ "Organization"
+ ],
+ "name": "Carnegie Mellon University"
+ },
+ {
+ "@id": "#Benjamin_Reames",
+ "@type": [
+ "Person"
+ ],
+ "affiliation": {
+ "@id": "https://ror.org/01ej9dk98"
+ },
+ "name": "Reames B"
+ },
+ {
+ "@id": "https://orcid.org/0000-0003-1155-0959",
+ "@type": [
+ "Person"
+ ],
+ "affiliation": {
+ "@id": "https://ror.org/01ej9dk98"
+ },
+ "name": "Corbin V"
+ },
+ {
+ "@id": "https://orcid.org/0000-0002-4300-455X",
+ "@type": [
+ "Person"
+ ],
+ "affiliation": {
+ "@id": "https://ror.org/01ej9dk98"
+ },
+ "name": "Coin LJM"
+ },
+ {
+ "@id": "https://ror.org/011kf5r70",
+ "@type": [
+ "Organization"
+ ],
+ "identifier": "https://ror.org/011kf5r70",
+ "name": "National Health and Medical Research Council",
+ "description": "Funding Body",
+ "exifData": [
+ {
+ "@id": "#awardee"
+ },
+ {
+ "@id": "#awardId"
+ }
+ ]
+ },
+ {
+ "@id": "#awardee",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Awardee",
+ "value": "L Coin"
+ },
+ {
+ "@id": "#awardId",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Award ID",
+ "value": "GNT1195743"
+ },
+ {
+ "@id": "#datasetTypes",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Dataset type",
+ "value": "Epigenomic, Bioinformatics, Software, Transcriptomic"
+ },
+ {
+ "@id": "https://nanoporetech.com/",
+ "@type": [
+ "URL"
+ ],
+ "name": "oxford nanopore technologies"
+ },
+ {
+ "@id": "https://doi.org/10.5524/102736",
+ "@type": [
+ "PropertyValue"
+ ],
+ "propertyID": "https://registry.identifiers.org/registry/doi",
+ "value": "doi:10.5524/102736",
+ "url": "https://doi.org/10.5524/102736"
+ },
+ {
+ "@id": "https://creativecommons.org/publicdomain/zero/1.0/",
+ "@type": [
+ "CreativeWork"
+ ],
+ "name": "Creative Commons Zero v1.0 Universal",
+ "description": "The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below."
+ },
+ {
+ "@id": "https://gigadb.org/",
+ "@type": [
+ "Organization"
+ ],
+ "name": "GigaScience DataBase",
+ "description": "GigaDB is a data repository supporting scientific publications in the Life/Biomedical Sciences domain. GigaDB organises and curates data from individually publishable units into datasets, which are provided openly and in as FAIR manner as possible for the global research community.",
+ "contactPoint": {
+ "@id": "mailto:database@gigasciencejournal.com"
+ }
+ },
+ {
+ "@id": "mailto:database@gigasciencejournal.com",
+ "@type": [
+ "ContactPoint"
+ ],
+ "contactType": "contact of the publisher",
+ "email": "database@gigasciencejournal.com",
+ "identifier": "database@gigasciencejournal.com"
+ },
+ {
+ "@id": "#additionalInfo1",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Additional information",
+ "value": "https://doi.org/10.26188/c.7767503.v1"
+ },
+ {
+ "@id": "#additionalInfo2",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Additional information",
+ "value": "https://registry.dome-ml.org/review/4ctuzhv3y5"
+ },
+ {
+ "@id": "#additionalInfo3",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Additional information",
+ "value": "https://archive.softwareheritage.org/swh:1:snp:1d8fdaa469108a2834a854d8249913d267fb9cfc"
+ },
+ {
+ "@id": "#additionalInfo4",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Additional information",
+ "value": "https://archive.softwareheritage.org/swh:1:snp:95b1531358ec75027da00fc8b539bce14188d30d"
+ },
+ {
+ "@id": "#additionalInfo5",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Additional information",
+ "value": "https://archive.softwareheritage.org/swh:1:snp:98b3a8996ab44283990fe707ffc44d45b2a61695"
+ },
+ {
+ "@id": "#additionalInfo6",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Additional information",
+ "value": "https://archive.softwareheritage.org/swh:1:snp:ee789638699e0e33ca3b1d09da5bb1f485ea7c70"
+ },
+ {
+ "@id": "#additionalInfo7",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Additional information",
+ "value": "https://scicrunch.org/resolver/RRID:SCR_026467"
+ },
+ {
+ "@id": "#additionalInfo8",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Additional information",
+ "value": "https://bio.tools/boostnano"
+ },
+ {
+ "@id": "#githubLink1",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Github links",
+ "value": "https://github.com/haotianteng/BoostNano"
+ },
+ {
+ "@id": "#githubLink2",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Github links",
+ "value": "https://github.com/adnaniazi/tailfindr"
+ },
+ {
+ "@id": "#githubLink3",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Github links",
+ "value": "https://github.com/haotianteng/chiron"
+ },
+ {
+ "@id": "#githubLink4",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Github links",
+ "value": "https://github.com/jts/nanopolish"
+ },
+ {
+ "@id": "#accessions",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "Accessions (data not in GigaDB)",
+ "value": "BioProject: PRJNA675370"
+ },
+ {
+ "@id": "#history",
+ "@type": [
+ "PropertyValue"
+ ],
+ "name": "History",
+ "value": "Date: July 29, 2025, Action: Dataset publish"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/assets/ro-crate-preview.html b/assets/ro-crate-preview.html
new file mode 100644
index 0000000..df1e43d
--- /dev/null
+++ b/assets/ro-crate-preview.html
@@ -0,0 +1,322 @@
+
+
+
+
+
+ RO-Crate Preview: Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing."
+
+
+
+
+
+
Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing."
+
Polyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
name [?] : Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing."
description [?] : Polyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
description [?] : PolyA tail lengths as found by Boostnano for R1 sequins which were filtered out by Dorado but kept by Boostnano; underlying data for figure 3
description [?] : Archival copy of the GitHub repository https://github.com/haotianteng/BoostNano downloaded 18-July-2025. BoostNano, a tool for preprocessing ONT-Nanopore RNA sequencing reads.This project is licensed under the MPL 2.0 license. Please refer to the GitHub repo for most recent updates.
description [?] : The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below.
description [?] : GigaDB is a data repository supporting scientific publications in the Life/Biomedical Sciences domain. GigaDB organises and curates data from individually publishable units into datasets, which are provided openly and in as FAIR manner as possible for the global research community.
+
+
+
+
+
+
\ No newline at end of file
diff --git a/resources/ro-crate-preview.html b/resources/ro-crate-preview.html
new file mode 100644
index 0000000..0e1cee9
--- /dev/null
+++ b/resources/ro-crate-preview.html
@@ -0,0 +1,322 @@
+
+
+
+
+
+ RO-Crate Preview: Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing."
+
+
+
+
+
+
Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing."
+
Polyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
name [?] : Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing."
description [?] : Polyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
description [?] : PolyA tail lengths as found by Boostnano for R1 sequins which were filtered out by Dorado but kept by Boostnano; underlying data for figure 3
description [?] : Archival copy of the GitHub repository https://github.com/haotianteng/BoostNano downloaded 18-July-2025. BoostNano, a tool for preprocessing ONT-Nanopore RNA sequencing reads.This project is licensed under the MPL 2.0 license. Please refer to the GitHub repo for most recent updates.
description [?] : The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below.
description [?] : GigaDB is a data repository supporting scientific publications in the Life/Biomedical Sciences domain. GigaDB organises and curates data from individually publishable units into datasets, which are provided openly and in as FAIR manner as possible for the global research community.
value [?] : Date: July 29, 2025, Action: Dataset publish
+
+
+
+
+
+
\ No newline at end of file
diff --git a/src/Json/FileHandler.php b/src/Json/FileHandler.php
new file mode 100644
index 0000000..73b7fd6
--- /dev/null
+++ b/src/Json/FileHandler.php
@@ -0,0 +1,54 @@
+ $value) {
+ $newKey = $prefix ? $prefix . $separator . $key : $key;
+ if (is_array($value)) {
+ $result = array_merge($result, $this->flatten($value, $newKey, $separator));
+ } else {
+ $result[$newKey] = $value;
+ }
+ }
+
+ return $result;
+ }
+}
diff --git a/src/Json/Unflattener.php b/src/Json/Unflattener.php
new file mode 100644
index 0000000..f7b9aac
--- /dev/null
+++ b/src/Json/Unflattener.php
@@ -0,0 +1,46 @@
+ $value) {
+ $this->assignValueByPath($result, (string) $key, $value, $separator);
+ }
+
+ return $result;
+ }
+
+ /**
+ * Assign value to nested array path
+ */
+ private function assignValueByPath(array &$array, string $path, $value, string $separator): void
+ {
+ $keys = explode($separator, $path);
+ $current = &$array;
+ while (count($keys) > 1) {
+ $key = array_shift($keys);
+ if (!isset($current[$key]) || !is_array($current[$key])) {
+ $current[$key] = [];
+ }
+
+ $current = &$current[$key];
+ }
+
+ $current[array_shift($keys)] = $value;
+ }
+}
diff --git a/src/exceptions/JsonFileException.php b/src/exceptions/JsonFileException.php
new file mode 100644
index 0000000..1387230
--- /dev/null
+++ b/src/exceptions/JsonFileException.php
@@ -0,0 +1,14 @@
+flatten($data);
+
+ // Process flattened data
+ foreach ($flattened as $key => $value) {
+ // Do something with key-value pairs
+ echo'' . $key . ' -> ' . $value . "\n";
+ }
+
+ // Unflatten and write
+ $nested = $unflattener->unflatten($flattened);
+ FileHandler::writeJsonFile(__DIR__ . '/../resources/output.json', $nested);
+} catch (JsonFileException $e) {
+ die("JSON Error: " . $e->getMessage());
+}
+
+ROCratePreviewGenerator::generatePreview(__DIR__ . '/../resources');
+
+// Create new crate
+//$crate = new ROCrate(__DIR__ . '/../resources', false);
+
+// Add Metadata Descriptor
+//$crate->addProfile();
+
+// Add Root Data Entity
+//$root = $crate->getRootDataset();
+//$root->addProperty('name', 'My Research Project');
+//$root->addProperty('description', 'Example RO-Crate');
+
+/*
+$crate = new ROCrate(__DIR__ . '/../resources', true);
+$root = $crate->getRootDataset();
+
+$root->addPropertyPair("description", "Test Description", false);
+$root->addPropertyPair("license", "Test License", false);
+
+// Add Data Entity (creator)
+// Similar for Contextual Entity
+$author = new Person('#alice');
+$author->addProperty('name', 'Alice Smith');
+$author->addProperty('affiliation', 'University of Example 1');
+$crate->addEntity($author);
+$author = new Person('#bob');
+$author->addProperty('name', 'Bob');
+$author->addProperty('affiliation', 'University of Example 2');
+$crate->addEntity($author);
+$author->addPropertyPair('knows', '#alice', true)->addPropertyPair('knows', '#cathy');
+
+//$root->addProperty('creator', [['@id' => '#alice']]);
+//$root->addProperty('creator', [['@id' => '#alice'], ['@id' => '#bob']]);
+//$root->addPropertyPair('creator', '#alice', true)->addPropertyPair('creator', '#bob')
+// ->addPropertyPair('creator', '#cathy')->removePropertyPair("creator", "#alice")
+// ->addPropertyPair('creator', '#alice')->addPropertyPair('creator', '#bob');
+$root->addProperty('creator', [['@id' => '#cathy'], ['@id' => '#alice']])
+ ->removePropertyPair('creator', '#bob')->removePropertyPair('creator', '#cathy');
+
+$crate->addEntity($crate->createGenericEntity('Test ID', [])->addType("TestType"));
+
+$author->addPropertyPair("encodingFormat", "test/pdf", false)
+ ->addPropertyPair("encodingFormat", "TRY", true);//->removePropertyPair("encodingFormat", "TRY");
+*/
+
+//$crate->getEntity("data.csv")->removePropertyPair("license", "https://creativecommons.org/licenses/by-nc-sa/3.0/au/");
+
+//$crate->removeEntity($author->getId());
+
+// GigaDB testing example using dataset of id 102736
+$crate = new ROCrate(__DIR__ . '/../resources');
+$crate->addProfile();
+$crate->getDescriptor()->removePropertyPair("about", "./")
+ ->addPropertyPair("about", "https://gigadb.org/dataset/102736", true);
+
+$root = $crate->getRootDataset();
+
+$name = 'Supporting data for "Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing."';
+$desc = 'Polyadenylation is a dynamic process which is important in cellular physiology. ' .
+'Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule ' .
+'and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) ' .
+'tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more ' .
+'recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of ' .
+'these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic ' .
+'RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the ' .
+'accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% ' .
+'of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run ' .
+'times, low coefficient of variation and ease of use with integration with base-calling.';
+$root->setId("https://gigadb.org/dataset/102736")
+ ->addPropertyPair("identifier", "https://doi.org/10.5524/102736", true)
+ ->addPropertyPair("cite-as", "https://doi.org/10.5524/102736", false)
+ ->addPropertyPair("name", $name, false)
+ ->addPropertyPair("description", $desc, false)
+ ->addPropertyPair("datePublished", "2025-07-29", false)
+ ->addPropertyPair("sdDatePublished", "2025-07-29", false)
+ ->addPropertyPair("publisher", "https://gigadb.org/", true)
+ ->addPropertyPair("sdPublisher", "https://gigadb.org/", true)
+ ->addPropertyPair("license", "https://creativecommons.org/publicdomain/zero/1.0/", true)
+ ->addPropertyPair("thumbnail", "https://assets.gigadb-cdn.net/live/images" .
+ "/datasets/32d9369e-500d-5347-8842-9fe46cdc3693/102736.png", true);
+
+$crate->addEntity(new File("https://assets.gigadb-cdn.net/live/images" .
+ "/datasets/32d9369e-500d-5347-8842-9fe46cdc3693/102736.png"));
+
+$parts = ["https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/" .
+"102001_103000/102736/readme_102736.txt", "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/" .
+"10.5524/102001_103000/102736/boostnano_no_dorado_R1_tails.csv",
+"https://gigadb.org/dataset/view/id/102736/Files_page/4", "#other-files"];
+foreach ($parts as $part) {
+ $root->addPropertyPair("hasPart", $part, true);
+}
+
+$fileOne = new File("https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/" .
+"102001_103000/102736/readme_102736.txt");
+$fileOne->addPropertyPair("name", "readme_102736.txt", false)
+ ->addPropertyPair("contentSize", "9.30 kB", false)
+ ->addPropertyPair("encodingFormat", "text/txt", false);
+$crate->addEntity($fileOne);
+$fileOne->addPropertyPair("exifData", "#oneExtra", true);
+$oneExtra = new ContextualEntity("#oneExtra", ["PropertyValue"]);
+$oneExtra->addPropertyPair("name", "Extra Information", false)
+ ->addPropertyPair("value", "Data Type: Readme, File Attributes: MD5 checksum: " .
+ "450ef019cf8ba58beb644ef18d1411d0", false);
+$crate->addEntity($oneExtra);
+
+$fileTwo = new File("https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/" .
+"10.5524/102001_103000/102736/boostnano_no_dorado_R1_tails.csv");
+$fileTwo->addPropertyPair("name", "boostnano_no_dorado_R1_tails.csv", false)
+ ->addPropertyPair("contentSize", "317.24 kB", false)
+ ->addPropertyPair("encodingFormat", "text/csv", false)
+ ->addPropertyPair("description", "PolyA tail lengths as found by Boostnano " .
+ "for R1 sequins which were filtered out by Dorado but kept by Boostnano; underlying data for figure 3", false);
+$crate->addEntity($fileTwo);
+$fileTwo->addPropertyPair("exifData", "#twoExtra", true);
+$twoExtra = new ContextualEntity("#twoExtra", ["PropertyValue"]);
+$twoExtra->addPropertyPair("name", "Extra Information", false)
+ ->addPropertyPair("value", "Data Type: Tabular data, File Attributes: MD5 checksum: " .
+ "97ee210d263c783e4ddfe20352831d60 Figure in MS: 3", false);
+$crate->addEntity($twoExtra);
+
+$zip = new Dataset("https://gigadb.org/dataset/view/id/102736/Files_page/4");
+$zip->addPropertyPair("name", "BoostNano-master", false)
+ ->addPropertyPair("description", "Archival copy of the GitHub repository " .
+ "https://github.com/haotianteng/BoostNano downloaded 18-July-2025. BoostNano, a tool for " .
+ "preprocessing ONT-Nanopore RNA sequencing reads.This project is licensed under the MPL 2.0 " .
+ "license. Please refer to the GitHub repo for most recent updates.", false)
+ ->addPropertyPair("distribution", "https://s3.ap-northeast-1.wasabisys.com/" .
+ "gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip", true)
+ ->addPropertyPair("releaseDate", "2025-07-23", false);
+$crate->addEntity($zip);
+$zipDist = new ContextualEntity("https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/" .
+"pub/10.5524/102001_103000/102736/BoostNano-master.zip", ["DataDownload"]);
+$zipDist->addPropertyPair("encodingFormat", "application/zip", false)
+ ->addPropertyPair("encodingFormat", "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263", true)
+ ->addPropertypair("contentSize", "2.44 MB", false);
+$crate->addEntity($zipDist);
+$zip->addPropertyPair("exifData", "#zipExtra", true);
+$zipExtra = new ContextualEntity("#zipExtra", ["PropertyValue"]);
+$zipExtra->addPropertyPair("name", "Extra Information", false)
+ ->addPropertyPair("value", "Data Type: GitHub archive, File Attributes: MD5 checksum: " .
+ "4b4d2ce7259e5045d89b731b7bfcf730 SWH: swh:1:snp:ee789638699e0e33ca3b1d09da5bb1f485ea7c70 license: MPL 2.0", false);
+$crate->addEntity($zipExtra);
+
+$otherFiles = new Dataset("#other-files");
+$otherFiles->addPropertyPair("name", "other files", false)
+ ->addPropertyPair("description", "This dataset contains too many files that are not individually described", false);
+$crate->addEntity($otherFiles);
+
+$authors = ["https://orcid.org/0000-0001-9083-6757", "#Xuan_Yang",
+"https://orcid.org/0000-0003-0337-8722", "#Benjamin_Reames",
+"https://orcid.org/0000-0003-1155-0959", "https://orcid.org/0000-0002-4300-455X"];
+$affiliations = ["https://ror.org/01ej9dk98", "https://ror.org/01ej9dk98",
+"https://ror.org/05x2bcf33", "https://ror.org/01ej9dk98",
+"https://ror.org/01ej9dk98", "https://ror.org/01ej9dk98"];
+$names = ["Chang JJ", "Yang X", "Teng H", "Reames B", "Corbin V", "Coin LJM"];
+$affiliationNames = ["The University of Melbourne", "The University of Melbourne",
+"Carnegie Mellon University", "The University of Melbourne",
+"The University of Melbourne", "The University of Melbourne"];
+$idx = 0;
+$usedAffiliations = [];
+foreach ($authors as $author) {
+ $root->addPropertyPair("author", $author, true);
+
+ $person = new Person($author);
+ $person->addPropertyPair("affiliation", $affiliations[$idx], true)
+ ->addPropertyPair("name", $names[$idx], false);
+ $crate->addEntity($person);
+
+ if (in_array($affiliations[$idx], $usedAffiliations)) {
+ $idx++;
+ continue;
+ }
+ $org = new Organization($affiliations[$idx]);
+ $org->addPropertyPair("name", $affiliationNames[$idx], false);
+ $crate->addEntity($org);
+ $usedAffiliations[] = $affiliations[$idx];
+
+ $idx++;
+}
+
+$root->addPropertyPair("citation", "https://doi.org/10.5524/100425", true);
+$otherCrate = new Publication("https://doi.org/10.5524/100425", "CreativeWork");
+$otherCrate->addType("Dataset")
+ ->addPropertyPair("conformsTo", "https://w3id.org/ro/crate", true);
+$crate->addEntity($otherCrate);
+
+
+$root->addPropertyPair("funder", "https://ror.org/011kf5r70", true);
+$funder = new Organization("https://ror.org/011kf5r70");
+$funder->addPropertyPair("identifier", "https://ror.org/011kf5r70", false)
+ ->addPropertyPair("name", "National Health and Medical Research Council", false)
+ ->addPropertyPair("description", "Funding Body", false);
+$crate->addEntity($funder);
+$funder->addPropertyPair("exifData", "#awardee", true);
+$awardee = new ContextualEntity("#awardee", ["PropertyValue"]);
+$awardee->addPropertyPair("name", "Awardee", false)
+ ->addPropertyPair("value", "L Coin", false);
+$crate->addEntity($awardee);
+$funder->addPropertyPair("exifData", "#awardId", true);
+$awardId = new ContextualEntity("#awardId", ["PropertyValue"]);
+$awardId->addPropertyPair("name", "Award ID", false)
+ ->addPropertyPair("value", "GNT1195743", false);
+$crate->addEntity($awardId);
+
+$root->addPropertyPair("exifData", "#datasetTypes", true);
+$datasetTypes = new ContextualEntity("#datasetTypes", ["PropertyValue"]);
+$datasetTypes->addPropertyPair("name", "Dataset type", false)
+ ->addPropertyPair("value", "Epigenomic, Bioinformatics, Software, Transcriptomic", false);
+$crate->addEntity($datasetTypes);
+
+$root->addPropertyPair("keywords", 'oxford nanopore technologies, poly(a) tail, ' .
+'estimation, segmentation, direct rna sequencing', false)
+ ->addPropertyPair("about", "https://nanoporetech.com/", true);
+
+$keyword = new ContextualEntity("https://nanoporetech.com/", ["URL"]);
+$keyword->addPropertyPair("name", "oxford nanopore technologies", false);
+$crate->addEntity($keyword);
+
+$rootDoi = new ContextualEntity("https://doi.org/10.5524/102736", ["PropertyValue"]);
+$rootDoi->addPropertyPair("propertyID", "https://registry.identifiers.org/registry/doi", false)
+ ->addPropertyPair("value", "doi:10.5524/102736", false)
+ ->addPropertyPair("url", "https://doi.org/10.5524/102736", false);
+$crate->addEntity($rootDoi);
+
+$cc0 = new ContextualEntity("https://creativecommons.org/publicdomain/zero/1.0/", ["CreativeWork"]);
+$cc0->addPropertyPair("name", "Creative Commons Zero v1.0 Universal", false)
+ ->addPropertyPair("description", 'The person who associated a work with this deed has' .
+ ' dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under ' .
+ 'copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, ' .
+ 'modify, distribute and perform the work, even for commercial purposes, all without asking permission. ' .
+ 'See Other Information below.', false);
+$crate->addEntity($cc0);
+
+$gigaDB = new Organization("https://gigadb.org/");
+$gigaDB->addPropertyPair("name", "GigaScience DataBase", false)
+ ->addPropertyPair("description", "GigaDB is a data repository supporting scientific " .
+ "publications in the Life/Biomedical Sciences domain. GigaDB organises and curates data from individually " .
+ "publishable units into datasets, which are provided openly and in as FAIR manner as possible for the global " .
+ "research community.", false)
+ ->addPropertyPair("contactPoint", "mailto:database@gigasciencejournal.com", true);
+$crate->addEntity($gigaDB);
+
+$contact = new ContactPoint("mailto:database@gigasciencejournal.com");
+$contact->addPropertyPair("contactType", "contact of the publisher", false)
+ ->addPropertyPair("email", "database@gigasciencejournal.com", false)
+ ->addPropertyPair("identifier", "database@gigasciencejournal.com", false);
+$crate->addEntity($contact);
+
+$root->addPropertyPair("exifData", "#additionalInfo1", true);
+$addInfo = new ContextualEntity("#additionalInfo1", ["PropertyValue"]);
+$addInfo->addPropertyPair("name", "Additional information", false)
+ ->addPropertyPair("value", "https://doi.org/10.26188/c.7767503.v1", false);
+$crate->addEntity($addInfo);
+
+$root->addPropertyPair("exifData", "#additionalInfo2", true);
+$addInfo = new ContextualEntity("#additionalInfo2", ["PropertyValue"]);
+$addInfo->addPropertyPair("name", "Additional information", false)
+ ->addPropertyPair("value", "https://registry.dome-ml.org/review/4ctuzhv3y5", false);
+$crate->addEntity($addInfo);
+
+$root->addPropertyPair("exifData", "#additionalInfo3", true);
+$addInfo = new ContextualEntity("#additionalInfo3", ["PropertyValue"]);
+$addInfo->addPropertyPair("name", "Additional information", false)
+ ->addPropertyPair("value", "https://archive.softwareheritage.org/swh:1:snp:" .
+ "1d8fdaa469108a2834a854d8249913d267fb9cfc", false);
+$crate->addEntity($addInfo);
+
+$root->addPropertyPair("exifData", "#additionalInfo4", true);
+$addInfo = new ContextualEntity("#additionalInfo4", ["PropertyValue"]);
+$addInfo->addPropertyPair("name", "Additional information", false)
+ ->addPropertyPair("value", "https://archive.softwareheritage.org/swh:1:snp:" .
+ "95b1531358ec75027da00fc8b539bce14188d30d", false);
+$crate->addEntity($addInfo);
+
+$root->addPropertyPair("exifData", "#additionalInfo5", true);
+$addInfo = new ContextualEntity("#additionalInfo5", ["PropertyValue"]);
+$addInfo->addPropertyPair("name", "Additional information", false)
+ ->addPropertyPair("value", "https://archive.softwareheritage.org/swh:1:snp:" .
+ "98b3a8996ab44283990fe707ffc44d45b2a61695", false);
+$crate->addEntity($addInfo);
+
+$root->addPropertyPair("exifData", "#additionalInfo6", true);
+$addInfo = new ContextualEntity("#additionalInfo6", ["PropertyValue"]);
+$addInfo->addPropertyPair("name", "Additional information", false)
+ ->addPropertyPair("value", "https://archive.softwareheritage.org/swh:1:snp:" .
+ "ee789638699e0e33ca3b1d09da5bb1f485ea7c70", false);
+$crate->addEntity($addInfo);
+
+$root->addPropertyPair("exifData", "#additionalInfo7", true);
+$addInfo = new ContextualEntity("#additionalInfo7", ["PropertyValue"]);
+$addInfo->addPropertyPair("name", "Additional information", false)
+ ->addPropertyPair("value", "https://scicrunch.org/resolver/RRID:SCR_026467", false);
+$crate->addEntity($addInfo);
+
+$root->addPropertyPair("exifData", "#additionalInfo8", true);
+$addInfo = new ContextualEntity("#additionalInfo8", ["PropertyValue"]);
+$addInfo->addPropertyPair("name", "Additional information", false)
+ ->addPropertyPair("value", "https://bio.tools/boostnano", false);
+$crate->addEntity($addInfo);
+
+$root->addPropertyPair("exifData", "#githubLink1", true);
+$gitLink = new ContextualEntity("#githubLink1", ["PropertyValue"]);
+$gitLink->addPropertyPair("name", "Github links", false)
+ ->addPropertyPair("value", "https://github.com/haotianteng/BoostNano", false);
+$crate->addEntity($gitLink);
+
+$root->addPropertyPair("exifData", "#githubLink2", true);
+$gitLink = new ContextualEntity("#githubLink2", ["PropertyValue"]);
+$gitLink->addPropertyPair("name", "Github links", false)
+ ->addPropertyPair("value", "https://github.com/adnaniazi/tailfindr", false);
+$crate->addEntity($gitLink);
+
+$root->addPropertyPair("exifData", "#githubLink3", true);
+$gitLink = new ContextualEntity("#githubLink3", ["PropertyValue"]);
+$gitLink->addPropertyPair("name", "Github links", false)
+ ->addPropertyPair("value", "https://github.com/haotianteng/chiron", false);
+$crate->addEntity($gitLink);
+
+$root->addPropertyPair("exifData", "#githubLink4", true);
+$gitLink = new ContextualEntity("#githubLink4", ["PropertyValue"]);
+$gitLink->addPropertyPair("name", "Github links", false)
+ ->addPropertyPair("value", "https://github.com/jts/nanopolish", false);
+$crate->addEntity($gitLink);
+
+$root->addPropertyPair("exifData", "#accessions", true);
+$accessions = new ContextualEntity("#accessions", ["PropertyValue"]);
+$accessions->addPropertyPair("name", "Accessions (data not in GigaDB)", false)
+ ->addPropertyPair("value", "BioProject: PRJNA675370", false);
+$crate->addEntity($accessions);
+
+$root->addPropertyPair("exifData", "#history", true);
+$history = new ContextualEntity("#history", ["PropertyValue"]);
+$history->addPropertyPair("name", "History", false)
+ ->addPropertyPair("value", "Date: July 29, 2025, Action: Dataset publish", false);
+$crate->addEntity($history);
+
+$errMsg = $crate->saveWithErrorMessage();
+if ($errMsg !== []) {
+ foreach ($errMsg as $msg) {
+ echo "\n$msg";
+ }
+}
diff --git a/src/rocrate/ContactPoint.php b/src/rocrate/ContactPoint.php
new file mode 100644
index 0000000..e3ad6e4
--- /dev/null
+++ b/src/rocrate/ContactPoint.php
@@ -0,0 +1,30 @@
+properties);
+ }
+}
diff --git a/src/rocrate/DataEntity.php b/src/rocrate/DataEntity.php
new file mode 100644
index 0000000..5d83557
--- /dev/null
+++ b/src/rocrate/DataEntity.php
@@ -0,0 +1,33 @@
+properties);
+ }
+}
diff --git a/src/rocrate/Dataset.php b/src/rocrate/Dataset.php
new file mode 100644
index 0000000..eb8636c
--- /dev/null
+++ b/src/rocrate/Dataset.php
@@ -0,0 +1,30 @@
+properties);
+ }
+}
diff --git a/src/rocrate/Descriptor.php b/src/rocrate/Descriptor.php
new file mode 100644
index 0000000..dc18903
--- /dev/null
+++ b/src/rocrate/Descriptor.php
@@ -0,0 +1,29 @@
+properties);
+ }
+}
diff --git a/src/rocrate/Entity.php b/src/rocrate/Entity.php
new file mode 100644
index 0000000..2edecf8
--- /dev/null
+++ b/src/rocrate/Entity.php
@@ -0,0 +1,286 @@
+id = $id;
+ $this->types = $types;
+ }
+
+ /**
+ * Gets the ID of the entity instance
+ * @return string The ID string
+ */
+ public function getId(): string
+ {
+ return $this->id;
+ }
+
+ /**
+ * Sets the ID of the entity instance
+ * @param string $id
+ * @return Entity The entity instance itself
+ */
+ public function setId(string $id): Entity
+ {
+ $this->id = $id;
+ return $this;
+ }
+
+ /**
+ * Gets the type(s) of the entity instance
+ * @return array The type(s) as an array
+ */
+ public function getTypes(): array
+ {
+ return $this->types;
+ }
+
+ /**
+ * Sets the type(s) ot the entity instance
+ * @param array $newTypes The new type(s)
+ * @return Entity The entity instance itself
+ */
+ public function setTypes(array $newTypes): Entity
+ {
+ $this->types = $newTypes;
+ return $this;
+ }
+
+ /**
+ * Adds a new type to the existing type(s) of the entity instance, or does nothing when the type already exists
+ * @param string $type The type to add
+ * @return Entity The entity instance itself
+ */
+ public function addType(string $type): Entity
+ {
+ if (!in_array($type, $this->types, true)) {
+ $this->types[] = $type;
+ }
+ return $this;
+ }
+
+ /**
+ * Removes a new type from the existing type(s) of the entity instance, or does nothing when the type does not exist
+ * @param string $type The type to be removed
+ * @return Entity The entity instance itself
+ */
+ public function removeType(string $type): Entity
+ {
+ if (in_array($type, $this->types, true)) {
+ $key = array_search($type, $this->types, true);
+ unset($this->types[$key]);
+ }
+ return $this;
+ }
+
+ /**
+ * Gets a property value of the entity instance given the key string
+ * @param string $key The key string
+ * @return mixed The value corresponding to the key or null if there is no such key
+ */
+ public function getProperty(string $key): mixed
+ {
+ return $this->properties[$key] ?? null;
+ }
+
+ /**
+ * Gets all the properties of the entity instance
+ * @return array The properties as an array of key-value pair(s)
+ */
+ public function getProperties(): array
+ {
+ return $this->properties;
+ }
+
+ /**
+ * Adds a new property to the entity instance, or overwrites the old property of the same key,
+ * i.e. sets the property
+ * @param string $key The key string of the new property
+ * @param mixed $value The value of the property
+ * @return Entity The entity instance itself
+ */
+ public function addProperty(string $key, $value): Entity
+ {
+ $this->properties[$key] = $value;
+ return $this;
+ }
+
+ /**
+ * Adds a new key-value pair to a property, or does nothing when the pair alraedy exists in the property
+ * @param string $propertyKey The key string of the property
+ * @param mixed $value The value to be added of the property
+ * @param mixed $flag The flag is true if @id, or is false if literal
+ * @return Entity The entity instance itself
+ */
+ public function addPropertyPair(string $propertyKey, $value, ?bool $flag = null): Entity
+ {
+
+ if (array_key_exists($propertyKey, $this->properties)) {
+ if (!is_array($this->properties[$propertyKey])) {
+ return $this;
+ }
+ if ($this->properties[$propertyKey] === []) {
+ return $this;
+ }
+
+ if (!is_null($flag)) {
+ if ($flag) {
+ if (in_array(['@id' => $value], $this->properties[$propertyKey], true)) {
+ return $this;
+ }
+ $this->properties[$propertyKey][] = ['@id' => $value];
+ } else {
+ if (in_array($value, $this->properties[$propertyKey], true)) {
+ return $this;
+ }
+ $this->properties[$propertyKey][] = $value;
+ }
+ return $this;
+ }
+
+ if (!is_array($this->properties[$propertyKey][0])) {
+ if (in_array($value, $this->properties[$propertyKey], true)) {
+ return $this;
+ }
+ $this->properties[$propertyKey][] = $value;
+ } else {
+ if (in_array(['@id' => $value], $this->properties[$propertyKey], true)) {
+ return $this;
+ }
+ $this->properties[$propertyKey][] = ['@id' => $value];
+ }
+ } else {
+ if (is_null($flag)) {
+ return $this;
+ }
+ if ($flag) {
+ $this->addProperty($propertyKey, [['@id' => $value]]);
+ } else {
+ $this->addProperty($propertyKey, [$value]);
+ }
+ }
+
+ return $this;
+ }
+
+ /**
+ * Removes a property from the entity instance, or does nothing when the key does not exist
+ * @param string $key The key string of the property to remove
+ * @return Entity The entity instance itself
+ */
+ public function removeProperty(string $key): Entity
+ {
+ if (array_key_exists($key, $this->properties)) {
+ unset($this->properties[$key]);
+ }
+ return $this;
+ }
+
+ /**
+ * Removes a key-value pair from a property of the entity instance,
+ * or does nothing when the either key does not exist or there is no inner array
+ * @param string $propertyKey The key string of the property to remove
+ * @param mixed $value The value to be deleted of the property
+ * @return Entity The entity instance itself
+ */
+ public function removePropertyPair(string $propertyKey, $value): Entity
+ {
+
+ if (array_key_exists($propertyKey, $this->properties)) {
+ if (!is_array($this->properties[$propertyKey])) {
+ return $this;
+ }
+
+ if (array_search($value, $this->properties[$propertyKey]) !== false) {
+ // is literal
+ unset($this->properties[$propertyKey][array_search($value, $this->properties[$propertyKey])]);
+ if ($this->properties[$propertyKey] === []) {
+ $this->removeProperty($propertyKey);
+ }
+ } elseif (array_search(["@id" => $value], $this->properties[$propertyKey]) !== false) {
+ // is ["@id" => "..."]
+ $key = $propertyKey;
+ unset($this->properties[$key][array_search(["@id" => $value], $this->properties[$key])]);
+ $this->properties[$propertyKey] = array_values($this->properties[$propertyKey]);
+ if ($this->properties[$propertyKey] === []) {
+ $this->removeProperty($propertyKey);
+ }
+ }
+
+ /*
+ foreach($this->properties[$propertyKey] as $pair) {
+ // pair should be ['@id' => ...] We do not check
+ // or pair is string literal
+ if (!is_array($pair)) {
+ if ($pair == $value) {
+ unset($this->properties[$propertyKey][array_search($pair, $this->properties[$propertyKey])]);
+ if ($this->properties[$propertyKey] === []) $this->removeProperty($propertyKey);
+ break;
+ }
+ }
+ else if (array_key_exists('@id', $pair)) {
+ if ($pair['@id'] == $value) {
+ unset($this->properties[$propertyKey][array_search($pair, $this->properties[$propertyKey])]);
+ $this->properties[$propertyKey] = array_values($this->properties[$propertyKey]);
+ if ($this->properties[$propertyKey] === []) $this->removeProperty($propertyKey);
+ break;
+ }
+ }
+ }*/
+ }
+
+ return $this;
+ }
+
+ /**
+ * Sets the crate object to which the entity instance belongs
+ * @param \ROCrate\ROCrate $crate The crate object
+ * @return void
+ */
+ public function setCrate(ROCrate $crate): void
+ {
+ $this->crate = $crate;
+ }
+
+ /**
+ * Gets the information of the entity as an array for printing, debugging and inheritance and is to be overriden
+ * @return array The information array
+ */
+ abstract public function toArray(): array;
+
+ /**
+ * Gets the basic information of the entity as an array
+ * @return array{@id: string, @type: array} The array consisting of the ID and type(s) of the entity instance
+ */
+ protected function baseArray(): array
+ {
+ if ($this->types) {
+ // there is at least one type
+ return [
+ '@id' => $this->id,
+ '@type' => $this->types
+ ];
+ }
+ // types array is empty
+ return [
+ '@id' => $this->id
+ ];
+ }
+}
diff --git a/src/rocrate/File.php b/src/rocrate/File.php
new file mode 100644
index 0000000..d7bc8c0
--- /dev/null
+++ b/src/rocrate/File.php
@@ -0,0 +1,20 @@
+merge($data);
+ }
+
+ /**
+ * Merge data into the structure
+ * @param array $data Data to merge
+ */
+ public function merge(array $data): void
+ {
+ foreach ($data as $key => $value) {
+ $this->offsetSet($key, $value);
+ }
+ }
+
+ // ArrayAccess implementation
+ public function offsetSet($offset, $value): void
+ {
+ if (is_array($value)) {
+ $value = new self($value);
+ }
+
+ if ($offset === null) {
+ $this->data[] = $value;
+ } else {
+ $this->data[$offset] = $value;
+ }
+ }
+
+ public function offsetExists($offset): bool
+ {
+ return isset($this->data[$offset]);
+ }
+
+ public function offsetUnset($offset): void
+ {
+ unset($this->data[$offset]);
+ }
+
+ public function offsetGet($offset): mixed
+ {
+ return $this->data[$offset] ?? null;
+ }
+
+ // IteratorAggregate implementation
+ public function getIterator(): \Traversable
+ {
+ return new \ArrayIterator($this->data);
+ }
+
+ // Countable implementation
+ public function count(): int
+ {
+ return count($this->data);
+ }
+
+ // JsonSerializable implementation
+ public function jsonSerialize(): array
+ {
+ return $this->toArray();
+ }
+
+ /**
+ * Convert to JSON string
+ * @param int $options JSON encoding options
+ * @return string JSON representation
+ */
+ public function toJson(int $options = 0): string
+ {
+ return json_encode($this->jsonSerialize(), $options);
+ }
+
+ /**
+ * Create from JSON string
+ * @param string $json JSON string
+ * @return self JsonData instance
+ * @throws \InvalidArgumentException on JSON decode error
+ */
+ public static function fromJson(string $json): self
+ {
+ $data = json_decode($json, true);
+
+ if (json_last_error() !== JSON_ERROR_NONE) {
+ throw new \InvalidArgumentException(
+ 'JSON decode error: ' . json_last_error_msg()
+ );
+ }
+
+ return new self($data);
+ }
+
+ /**
+ * Convert to plain PHP array
+ * @return array Plain array representation
+ */
+ public function toArray(): array
+ {
+ $result = [];
+
+ foreach ($this->data as $key => $value) {
+ if ($value instanceof self) {
+ $result[$key] = $value->toArray();
+ } else {
+ $result[$key] = $value;
+ }
+ }
+
+ return $result;
+ }
+
+ // Magic methods for object-style access
+ public function __get($name)
+ {
+ return $this->offsetGet($name);
+ }
+
+ public function __set($name, $value)
+ {
+ $this->offsetSet($name, $value);
+ }
+
+ public function __isset($name)
+ {
+ return $this->offsetExists($name);
+ }
+
+ public function __unset($name)
+ {
+ $this->offsetUnset($name);
+ }
+
+ /**
+ * String representation for debugging
+ * @return string JSON representation
+ */
+ public function __toString(): string
+ {
+ return $this->toJson();
+ }
+}
diff --git a/src/rocrate/Organization.php b/src/rocrate/Organization.php
new file mode 100644
index 0000000..5bbb1dd
--- /dev/null
+++ b/src/rocrate/Organization.php
@@ -0,0 +1,29 @@
+properties);
+ }
+}
diff --git a/src/rocrate/Person.php b/src/rocrate/Person.php
new file mode 100644
index 0000000..0305499
--- /dev/null
+++ b/src/rocrate/Person.php
@@ -0,0 +1,29 @@
+properties);
+ }
+}
diff --git a/src/rocrate/Place.php b/src/rocrate/Place.php
new file mode 100644
index 0000000..05ecf06
--- /dev/null
+++ b/src/rocrate/Place.php
@@ -0,0 +1,29 @@
+properties);
+ }
+}
diff --git a/src/rocrate/Publication.php b/src/rocrate/Publication.php
new file mode 100644
index 0000000..021a81a
--- /dev/null
+++ b/src/rocrate/Publication.php
@@ -0,0 +1,38 @@
+properties);
+ }
+}
diff --git a/src/rocrate/ROCrate.php b/src/rocrate/ROCrate.php
new file mode 100644
index 0000000..4d300e7
--- /dev/null
+++ b/src/rocrate/ROCrate.php
@@ -0,0 +1,1099 @@
+attached = $attachedFlag;
+ $this->preview = $previewFlag;
+
+ $this->basePath = realpath($directory) ?: $directory;
+ $this->graph = new Graph();
+ $this->httpClient = new Client();
+
+ RdfNamespace::set('rocrate', 'https://w3id.org/ro/crate/1.2');
+ RdfNamespace::set('schema', 'http://schema.org/');
+
+ if (!file_exists($this->basePath)) {
+ mkdir($this->basePath, 0755, true);
+ }
+
+ if ($loadExisting && file_exists($this->getMetadataPath())) {
+ $this->loadMetadata();
+ } else {
+ $this->initializeNewCrate();
+ }
+ }
+
+ /**
+ * Sets the context of the RO-Crate
+ * @param mixed $newContext The new context
+ * @return ROCrate The crate whose context is updated
+ */
+ public function setContext(mixed $newContext): ROCrate
+ {
+ $this->context = $newContext;
+ return $this;
+ }
+
+ /**
+ * Gets the path to the ro-crate metadata file
+ * @return string The path to the ro-crate metadata file as a string
+ */
+ private function getMetadataPath(): string
+ {
+ return $this->basePath . '/ro-crate-metadata.json';
+ }
+
+ /**
+ * Initializes a ro-crate instance
+ * @return void
+ */
+ private function initializeNewCrate(): void
+ {
+ $this->descriptor = new Descriptor();
+ $this->addEntity($this->descriptor);
+
+ $this->rootDataset = new Dataset();
+ $this->addEntity($this->rootDataset);
+
+ if ($this->preview) {
+ $this->website = new class ("ro-crate-preview.html", ["CreativeWork"]) extends ContextualEntity {
+ public function toArray(): array
+ {
+ return array_merge($this->baseArray(), $this->properties);
+ }
+ };
+ $this->website->addProperty("about", ["@id" => "./"]);
+ $this->addEntity($this->website);
+ }
+
+ // make values of all properties, i.e. key-value pairs, of each entity to be [...]
+ foreach ($this->entities as $entity) {
+ foreach (array_keys($entity->getProperties()) as $key) {
+ if (is_array($entity->getProperties()[$key])) {
+ $property = $entity->getProperties()[$key];
+ if (array_keys($property) !== range(0, count($property) - 1)) {
+ // if {"@id" : "..."} by checking whether $val is an associative array
+ $entity->addProperty($key, [$entity->getProperties()[$key]]);
+ }
+ // else already [...]
+ } else {
+ // literal
+ $entity->addProperty($key, [$entity->getProperties()[$key]]);
+ }
+ }
+ }
+ }
+
+ /**
+ * Reads and loads the existing ro-crate file as an instance
+ * @throws \Exceptions\ROCrateException Exceptions with specific messages to indicate possible errors
+ * @return void
+ */
+ public function loadMetadata(): void
+ {
+ $path = $this->getMetadataPath();
+
+ if (!file_exists($path)) {
+ throw new ROCrateException("Metadata file not found: $path");
+ }
+
+ try {
+ $json = json_decode(file_get_contents($path), true, 512, JSON_THROW_ON_ERROR);
+ } catch (JsonException $e) {
+ throw new ROCrateException("Invalid JSON in metadata: " . $e->getMessage());
+ }
+
+ $this->descriptor = new Descriptor();
+ $this->addEntity($this->descriptor);
+ $this->addProfile();
+
+ // Set context
+ $this->context = $json['@context'] ?? $this->context;
+
+ // Parse entities
+ $rootId = './';
+ foreach ($json['@graph'] as $entityData) {
+ $condtionOne = str_contains($entityData['@id'], "ro-crate-metadata.json");
+ $conditionTwo = array_key_exists("conformsTo", $entityData);
+ if ($condtionOne && $conditionTwo) {
+ $conformsTo = $entityData["conformsTo"]["@id"];
+ $rootId = $entityData['about']['@id'];
+ $this->addProfile($conformsTo, $rootId);
+ continue;
+ }
+ $this->addEntityFromArray($entityData);
+ }
+
+ // Find root dataset
+ $this->rootDataset = $this->getEntity($rootId);
+ /*
+ foreach ($this->entities as $entity) {
+ if (in_array('Dataset', $entity->getTypes()) && ($entity->getId() === $rootId)) {
+ $this->rootDataset = new Dataset($rootId);
+ foreach ($entity->getTypes() as $type) $this->rootDataset->addType($type);
+ foreach ($entity->getProperties() as $key => $val) $this->rootDataset->addProperty($key, $val);
+ break;
+ }
+ }*/
+
+ // Find preview if it exists
+ $this->website = $this->getEntity("ro-crate-preview.html");
+ /*
+ if ($this->preview) {
+ foreach ($this->entities as $entity) {
+ if (in_array('CreativeWork', $entity->getTypes()) && ($entity->getProperty("about")["@id"] === $rootId)
+ && (!array_key_exists("conformsTo", $entity->getProperties()))) {
+ $this->website = $this->getEntity("ro-crate-preview.html");
+ $this->website = new class("ro-crate-preview.html", ["CreativeWork"]) extends ContextualEntity {
+ public function toArray(): array {
+ return array_merge($this->baseArray(), $this->properties);
+ }
+ };
+ foreach ($entity->getTypes() as $type) $this->website->addType($type);
+ foreach ($entity->getProperties() as $key => $val) $this->website->addProperty($key, $val);
+ break;
+ }
+ }
+ }*/
+
+ if (!$this->descriptor) {
+ throw new ROCrateException("Metadata descriptor not found in crate");
+ }
+
+ if (!$this->rootDataset) {
+ throw new ROCrateException("Root dataset not found in crate");
+ }
+
+ // make values of all properties, i.e. key-value pairs, of each entity to be [...]
+ foreach ($this->entities as $entity) {
+ foreach (array_keys($entity->getProperties()) as $key) {
+ if (is_array($entity->getProperties()[$key])) {
+ $property = $entity->getProperties()[$key];
+ if (array_keys($property) !== range(0, count($property) - 1)) {
+ // if {"@id" : "..."} by checking whether $val is an associative array
+ $entity->addProperty($key, [$entity->getProperties()[$key]]);
+ }
+ // else already [...]
+ } else {
+ // literal
+ $entity->addProperty($key, [$entity->getProperties()[$key]]);
+ }
+ }
+ }
+ }
+
+ /**
+ * Adds entities to the crate given an array
+ * @param array $data The given array
+ * @throws \Exceptions\ROCrateException Exceptions with specific messages to indicate possible errors
+ * @return ROCrate The crate to which the entity is added
+ */
+ private function addEntityFromArray(array $data): ROCrate
+ {
+ $id = $data['@id'] ?? null;
+ $types = (array)($data['@type'] ?? []);
+
+ if (!$id) {
+ throw new ROCrateException("Entity missing @id property");
+ }
+
+ if (empty($types)) {
+ throw new ROCrateException("Entity missing @type property: $id");
+ }
+
+ $entity = $this->createGenericEntity($id, $types);
+
+ // Set properties
+ foreach ($data as $key => $value) {
+ if (!in_array($key, ['@id', '@type'])) {
+ $entity->addProperty($key, $value);
+ }
+ }
+
+ $this->addEntity($entity);
+
+ return $this;
+ }
+
+ /**
+ * Creates a generic entity
+ * @param string $id The ID of the entity
+ * @param array $types The type(s) of the entity as an array
+ * @return Entity The entity instance
+ */
+ public function createGenericEntity(string $id, array $types): Entity
+ {
+ return new class ($id, $types) extends Entity {
+ public function toArray(): array
+ {
+ return array_merge($this->baseArray(), $this->properties);
+ }
+ };
+ }
+
+ /**
+ * Adds an entity to the crate given an entity instacne
+ * @param Entity $entity The given entity instacne
+ * @throws \Exceptions\ROCrateException Exceptions with specific messages to indicate possible errors
+ * @return ROCrate The crate to which the entity is added
+ */
+ public function addEntity(Entity $entity): ROCrate
+ {
+ $id = $entity->getId();
+
+ if (isset($this->entities[$id])) {
+ throw new ROCrateException("Entity with ID $id already exists");
+ }
+
+ $entity->setCrate($this);
+ $this->entities[$id] = $entity;
+ return $this;
+ }
+
+ /**
+ * Gets an entity instance with its ID from the crate
+ * @param string $id The ID of the entity instacne to retrieve
+ * @return mixed The entity instacne or null if the ID is invalid
+ */
+ public function getEntity(string $id): ?Entity
+ {
+ return $this->entities[$id] ?? null;
+ }
+
+ /**
+ * Removes an entity from the crate with its ID
+ * @param string $id The ID of the entity to remove
+ * @throws \Exceptions\ROCrateException Exceptions with specific messages to indicate possible errors
+ * @return ROCrate The crate from which the entity is deleted
+ */
+ public function removeEntity(string $id): ROCrate
+ {
+ if (!isset($this->entities[$id])) {
+ throw new ROCrateException("Entity not found: $id");
+ }
+ unset($this->entities[$id]);
+ return $this;
+ }
+
+ /**
+ * Validates the crate before saving with minimal checks only
+ * @return string[] The error(s) or issue(s) found during the validation as a string array
+ */
+ public function validate(): array
+ {
+ $errors = [];
+
+ // MUST Checks
+ // RO-Crate Structure
+
+ // 1. Metadata descriptor check
+ if (!$this->descriptor) {
+ $errors[] = "Missing metadata descriptor";
+ }
+
+ // 2. Root dataset check
+ if (!$this->rootDataset) {
+ $errors[] = "Missing root dataset";
+ }
+
+ // Metadata of the RO-Crate
+
+ // 1. @id check
+ foreach ($this->entities as $entity) {
+ if (!is_string($entity->getId())) {
+ $errors[] = "There is an entity without an id.";
+ }
+ }
+
+ // 2. @type check
+ foreach ($this->entities as $entity) {
+ if ($entity->getTypes() === []) {
+ $errors[] = "There is an entity without a type using id: " . $entity->getId() . ".";
+ }
+ }
+
+ // 3. entity property references to other entities using {"@id": "..."} check
+ // newly created property managed using addPropertyPair and removePropertyPair automatically satisfy
+ // the only exceptions that require manual manipulation are encodingFormat
+ // due to mixed use of literal and reference
+ // and context with extra terms
+ // old property imported either:
+ // case 1: reset it using removeProperty and manage using ...Pair
+ // case 2: the developers have to be cautious for any change
+
+ // 4. flat @graph list is enforced in the implementation
+
+ // Root Data Entity
+
+ // 1. @id value of the descriptor has to be "ro-crate-metadata.json"
+ // or "ro-crate-metadata.jsonld" (legacy from v1.0 or before) check
+ // This is needed even if the actual metadata file maybe absent or has a prefix in detached package.
+ $conditionOne = (strcmp($this->descriptor->getId(), "ro-crate-metadata.json") !== 0);
+ $conditionTwo = (strcmp($this->descriptor->getId(), "ro-crate-metadata.jsonld") !== 0);
+ if ($conditionOne && $conditionTwo) {
+ $errors[] = "The descriptor's id is invalid.";
+ }
+
+ // 2. @type of the descriptor has to be CreativeWork check
+ if (count($this->descriptor->getTypes()) == 1) {
+ if (strcmp($this->descriptor->getTypes()[0], "CreativeWork") !== 0) {
+ $errors[] = "The descriptor's type is invalid.";
+ }
+ } else {
+ $errors[] = "The descriptor's type is invalid.";
+ }
+
+ // 3. The descriptor has an about property and it references the Root Data Entity's @id check
+ if (array_key_exists("about", $this->descriptor->getProperties())) {
+ $conditionOne = (is_array($this->getDescriptor()->getProperty("about")));
+ $conditionTwo = (strcmp($this->descriptor->getProperty("about")['@id'], $this->rootDataset->getId()) !== 0);
+ if ($conditionOne && $conditionTwo) {
+ $errors[] = "The descriptor's about property is invalid.";
+ }
+ } else {
+ $errors[] = "The descriptor does not have an about property.";
+ }
+
+ // 4. One of the root data entity's type(s) has to be Dataset check
+ if (!in_array("Dataset", $this->rootDataset->getTypes())) {
+ $errors[] = "The root data entity's type is invalid.";
+ }
+
+ // 5. The root data entity has to have the property name check
+ if (!array_key_exists("name", $this->rootDataset->getProperties())) {
+ $errors[] = "The root data entity does not have a name property.";
+ }
+
+ // 6. The root data entity has to have the property description check
+ if (!array_key_exists("description", $this->rootDataset->getProperties())) {
+ $errors[] = "The root data entity does not have a description property.";
+ }
+
+ // 7. The root data entity has to have the property datePublished check, and the property
+ // has to be a string in ISO 8601 date format
+ if (!array_key_exists("datePublished", $this->rootDataset->getProperties())) {
+ $errors[] = "The root data entity does not have a datePublished property.";
+ } elseif (is_string($this->rootDataset->getProperty("datePublished"))) {
+ if (!ROCrate::isValidISO8601Date($this->rootDataset->getProperty("datePublished"))) {
+ $errors[] = "The root data entity's datePublished property is not in ISO 8601 date format.";
+ }
+ } else {
+ $errors[] = "The root data entity's datePublished property is not a string.";
+ }
+
+ // 8. The root data entity has to have the property license check
+ if (!array_key_exists("license", $this->rootDataset->getProperties())) {
+ $errors[] = "The root data entity does not have a license property.";
+ }
+
+ // Data Entities
+
+ // 1. all file and folders as data entities have to be indirectly or directly linked to the root data
+ // entity via hasPart
+ // This is not possible to check for detached package without knowing how to access the details.
+ // For attached package, since contextual entity of type Dataset with # local identifier to collectively
+ // describe a bunch of files is possible and it has no strict criteria for its use, it is also not
+ // possible to check.
+
+ // 2. a file as a data entity has file as one of its type(s)
+ // This satisfies if it is created using new File
+ // This cannot be strictly enforced for the same reason as above
+
+ // 3. @id have to be valid URI references
+ //foreach($this->entities as $entity) {
+ // if (ROCrate::isValidUri($entity->getId(), false)) {
+ // $errors[] = "The entity's id (" . $entity->getId() . ") is not a valid URI.";
+ // }
+ //}
+
+ // 4. file data entity @id relative or absolute URI
+
+ // 5. Dataset data entity has Dataset as one of its type(s)
+ // This satisfies if it is created using new Dataset
+ // Difficult to strictly enforce for the same reason as above
+
+ // 6. Dataset data entity @id has to resolve to a directory present in the crate root for an attached package
+ // Difficult to strictly enforce
+
+ // 7.
+
+ // 8.
+
+ // Contextual Entities
+
+ // 1. Contextual data entity as a standalone object
+ // automatically enforced by treating things as entity instances
+
+ // 2. no repeated use of @id
+ $idArray = [];
+ foreach ($this->entities as $entity) {
+ $idArray[] = $entity->getId();
+ }
+ $uniqueIdArray = array_unique($idArray);
+ if (sizeof($idArray) !== sizeof($uniqueIdArray)) {
+ $errors[] = "There are multiple entities using the same @id value.";
+ }
+
+ // 3. The crate metadata file needs a URL as the @id of a publication using citation property if we
+ // want to associate a publication with the dataset
+ // It relies on disciplined use.
+
+ // 4. subjects & keywords
+ // It relies on disciplined use.
+
+ // 5. include thumbnail if have
+ // It relies on disciplined use.
+
+ // 6. put thumbnail in the bagit manifest if it is present and it is a bagged ro-crate
+ // It releis on disciplined use.
+
+ // Provenance of entities
+
+ // 1. A curation action, i.e. type of CreateAction or UpdateAction, has at least one object check
+ foreach ($this->entities as $entity) {
+ if (in_array("CreateAction", $entity->getTypes()) || in_array("UpdateAction", $entity->getTypes())) {
+ if (!array_key_exists("object", $entity->getProperties())) {
+ $errors[] = "There is no object property for a curation action.";
+ }
+ }
+ }
+
+ // 2. An action's endTime has to be in ISO 8601 date format if this property is present, same for startTime
+ foreach ($this->entities as $entity) {
+ if (in_array("CreateAction", $entity->getTypes()) || in_array("UpdateAction", $entity->getTypes())) {
+ // startTime
+ if (!array_key_exists("startTime", $entity->getProperties())) {
+ continue;
+ }
+ if (is_string($entity->getProperty("startTime"))) {
+ if (!ROCrate::isValidISO8601Date($entity->getProperty("startTime"))) {
+ $errors[] = "An action's startTime property is not in ISO 8601 date format.";
+ }
+ } else {
+ $errors[] = "An action's startTime property is not in ISO 8601 date format.";
+ }
+
+ // endTime
+ if (!array_key_exists("endTime", $entity->getProperties())) {
+ continue;
+ }
+ if (is_string($entity->getProperty("endTime"))) {
+ if (!ROCrate::isValidISO8601Date($entity->getProperty("endTime"))) {
+ $errors[] = "An action's endTime property is not in ISO 8601 date format.";
+ }
+ } else {
+ $errors[] = "An action's endTime property is not in ISO 8601 date format.";
+ }
+ }
+ }
+
+ // 3. if an action has an actionStatus property, the property has to be ActiveActionStatus,
+ // CompletedActionStatus, FailedActionStatus or PotentialActionStatus of type ActionStatusType
+ foreach ($this->entities as $entity) {
+ if (in_array("CreateAction", $entity->getTypes()) || in_array("UpdateAction", $entity->getTypes())) {
+ if (!array_key_exists("actionStatus", $entity->getProperties())) {
+ continue;
+ }
+ $actionStatus = $entity->getProperty("actionStatus")["@id"];
+ $validStatuses = [
+ "http://schema.org/ActiveActionStatus",
+ "https://schema.org/ActiveActionStatus",
+ "http://schema.org/CompletedActionStatus",
+ "https://schema.org/CompletedActionStatus",
+ "http://schema.org/FailedActionStatus",
+ "https://schema.org/FailedActionStatus",
+ "http://schema.org/PotentialActionStatus",
+ "https://schema.org/PotentialActionStatus"
+ ];
+
+ if (!in_array($actionStatus, $validStatuses)) {
+ $errors[] = "An action's actionStatus property is invalid.";
+ }
+ }
+ }
+
+ // Profiles
+
+ // 1. The profile URI, i.e. the reference of comformsTo property of the root data entity, resolves
+ // to a human-readable profile description
+ // It relies on disciplined use.
+
+ // 2. If the root data entity conforms to a profile, it has to be a contextual entity having Profile
+ // as one of its type(s), similarly for multiple profiles
+ if (array_key_exists("conformsTo", $this->rootDataset->getProperties())) {
+ if (is_array($this->rootDataset->getProperty("conformsTo"))) {
+ foreach ($this->rootDataset->getProperty("conformsTo") as $profile) {
+ $flag = true;
+ foreach ($this->entities as $entity) {
+ if (strcmp($entity->getId(), $profile["@id"]) == 0) {
+ if (in_array("Profile", $entity->getTypes())) {
+ $flag = false;
+ break;
+ }
+ }
+ }
+ if ($flag) {
+ $errors[] = "The contextual entity for a profile is missing.";
+ }
+ }
+ } else {
+ $flag = true;
+ foreach ($this->entities as $entity) {
+ if (strcmp($entity->getId(), $this->rootDataset->getProperty("conformsTo")["@id"]) == 0) {
+ if (in_array("Profile", $entity->getTypes())) {
+ $flag = false;
+ break;
+ }
+ }
+ }
+ if ($flag) {
+ $errors[] = "The contextual entity for the profile is missing.";
+ }
+ }
+ }
+
+ // 3. if it is a profile crate, it has Profile as one of its type(s)
+ // It relies on disciplined use.
+
+ // 4. if it is a profile crate, its hasPart references the human-readable profile description as a data entity,
+ // and this data entity has to reference the absolute URI of the root data entity of the profile crate
+ // using the about property
+ // It relies on disciplined use.
+
+ // 5. any terms defined in the profile has to be used as full URIs matching @id
+ // or mapped to these URIs from the conforming crate's
+ // @context in the conforming crate.
+ // It relies on disciplined use.
+
+ // 6. An entity representing a JSON-LD context has to have an encodingFormat of application/ld+json and
+ // has an absolute URI as @id retrievable as JSON-LD directly or indirectly
+ // It relies on disciplined use.
+
+ // Workflows and scripts
+
+ // 1. script and workflow type, id and name
+ // It relies on disciplined use.
+
+ // 2. If a contextual entity has type ComputerLanguage and/or SoftwareApplication,\
+ // it has a name, url and version
+ foreach ($this->entities as $entity) {
+ $conditionOne = in_array("ComputerLanguage", $entity->getTypes());
+ $conditionTwo = in_array("SoftwareApplication", $entity->getTypes());
+ if ($conditionOne || $conditionTwo) {
+ if (!array_key_exists("name", $entity->getProperties())) {
+ $errors[] = 'The name property for the contextual entity of type ComputerLanguage ' .
+ 'and/or SoftwareApplication is missing.';
+ }
+ if (!array_key_exists("url", $entity->getProperties())) {
+ $errors[] = 'The url property for the contextual entity of type ComputerLanguage ' .
+ 'and/or SoftwareApplication is missing.';
+ }
+ if (!array_key_exists("version", $entity->getProperties())) {
+ $errors[] = 'The version property for the contextual entity of type ComputerLanguage ' .
+ 'and/or SoftwareApplication is missing.';
+ }
+ }
+ }
+
+ // 3. complying with the Bioschemas computational workflow profile
+ // Difficult to check and less generic to check for a particular profile
+
+ // 4. complying with the Bioschemas formal parameter profile
+ // same as above
+
+ // Changelog
+
+ // 1. The descriptor has conformsTo to indicate RO-Crate version
+ if (!array_key_exists("conformsTo", $this->descriptor->getProperties())) {
+ $errors[] = "The conformsTo property for the descriptor is missing.";
+ }
+
+ // Handling relative URI references
+
+ // 1. When we have to parse as RDF, if ro-crate-metadata.json is not recognised, we rename it to jsonld
+ // it relies on disciplined use
+
+ // Implementation notes
+
+ // 1. Bagit enforcement
+ // It relies on disciplined use.
+
+ // RO-Crate JSON-LD
+
+ // 1. / and escape character care (should: utf-8 encoded, i.e. #, space, ... encoded with %)
+ // It relies on disiplined use.
+
+ // 2. if (present) generate ro-crate website, use sameAs for the term.
+ // it relies on disiplined use.
+
+ // 3. if there is extra / ad-hoc term / vocab, put them in context.
+ // it relies on disiplined use.
+
+ return $errors;
+ }
+
+ /**
+ * Saves the crate object as a ro-crate metadata file
+ * @param mixed $path The path to save the crate object if the default base path is not used
+ * @param string $prefix The prefix of the metadata file, needed for a detached ro-crate package
+ * @throws \Exceptions\ROCrateException Exceptions with specific messages to indicate possible errors
+ * @return void
+ */
+ public function save(?string $path = null, string $prefix = ""): void
+ {
+ $this->errors = [];
+
+ // make values of all properties, i.e. key-value pairs, of each entity to be without [...] if there
+ // is only a single literal or {"@id" : "..."}
+ foreach ($this->entities as $entity) {
+ foreach (array_keys($entity->getProperties()) as $key) {
+ if (strcmp($key, "hasPart") == 0) {
+ continue;
+ }
+ // safety check if $val is an array
+ if (is_array($entity->getProperty($key))) {
+ if (!array_key_exists('@id', $entity->getProperty($key))) {
+ // safety check if $val is not an associative array
+ if (count($entity->getProperty($key)) === 1) {
+ // there is only a single item
+ $entity->addProperty($key, $entity->getProperty($key)[0]);
+ //$this->printNestedArray($this->descriptor->getProperties());
+ }
+ }
+ }
+ }
+ }
+
+
+ if (!$this->attached) {
+ if (strcmp($prefix, "") == 0) {
+ throw new ROCrateException("The prefix cannot be empty for a detached RO-Crate Package.");
+ }
+ }
+
+ $this->errors = $this->validate();
+ if (!($this->errors === [])) {
+ throw new ROCrateException("Validation before saving failed.");
+ }
+
+ $target = $path ? realpath($path) : $this->basePath;
+
+ if (!$target) {
+ throw new ROCrateException("Invalid target directory: $path");
+ }
+
+ // Ensure metadata directory exists
+ if (!is_dir($target) && !mkdir($target, 0755, true)) {
+ throw new ROCrateException("Failed to create directory: $target");
+ }
+
+ // Generate JSON-LD
+ $rootId = "";
+ $first = [];
+ $second = [];
+ $last = [];
+
+ $graph = [];
+ foreach ($this->entities as $entity) {
+ $conditionOne = str_contains($entity->getId(), "ro-crate-metadata.json");
+ $conditionTwo = array_key_exists("conformsTo", $entity->getProperties());
+ if ($conditionOne && $conditionTwo) {
+ $rootId = $entity->getProperty("about")["@id"];
+ $first[] = $entity->toArray();
+ $key = array_search($entity, $this->entities, true);
+ unset($this->entities[$key]);
+ break;
+ }
+ }
+
+ foreach ($this->entities as $entity) {
+ $conditionOne = in_array('CreativeWork', $entity->getTypes());
+ $conditionTwo = (!array_key_exists("conformsTo", $entity->getProperties()));
+ if ($conditionOne && $conditionTwo) {
+ if (!array_key_exists("about", $entity->getProperties())) {
+ continue;
+ }
+ if (!($entity->getProperty("about")["@id"] === $rootId)) {
+ continue;
+ }
+
+ $first[] = $entity->toArray();
+ $key = array_search($entity, $this->entities, true);
+ unset($this->entities[$key]);
+ break;
+ }
+ }
+
+ foreach ($this->entities as $entity) {
+ if (in_array('Dataset', $entity->getTypes()) && ($entity->getId() === $rootId)) {
+ $first[] = $entity->toArray();
+ continue;
+ }
+
+ if (in_array("Dataset", $entity->getTypes()) && (strcmp($entity->getId()[0], '#') !== 0)) {
+ $second[] = $entity->toArray();
+ continue;
+ } elseif (in_array("File", $entity->getTypes()) && (strcmp($entity->getId()[0], '#') !== 0)) {
+ $second[] = $entity->toArray();
+ continue;
+ }
+
+ $last[] = $entity->toArray();
+ }
+
+ $graph = array_merge($graph, $first);
+ $graph = array_merge($graph, $second);
+ ;
+ $graph = array_merge($graph, $last);
+ ;
+
+ $metadata = [
+ '@context' => $this->context,
+ '@graph' => $graph
+ ];
+
+ // Save metadata file
+ try {
+ $json = json_encode($metadata, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_THROW_ON_ERROR);
+ } catch (JsonException $e) {
+ throw new ROCrateException("JSON encoding failed: " . $e->getMessage());
+ }
+ if (strcmp($prefix, "") == 0) {
+ file_put_contents($target . '/ro-crate-metadata.json', $json);
+ } else {
+ file_put_contents($target . '/' . $prefix . '-ro-crate-metadata.json', $json);
+ }
+ }
+
+ /**
+ * Gets the metadata descriptor instance from the crate
+ * @throws \Exceptions\ROCrateException Exceptions with specific messages to indicate possible errors
+ * @return Entity|null The metadata descriptor instance or null if the instance does not exist
+ */
+ public function getDescriptor(): Entity
+ {
+ if (!$this->descriptor) {
+ throw new ROCrateException("Metadata descriptor not initialized");
+ }
+ return $this->descriptor;
+ }
+
+ /**
+ * Gets the root data entity instance from the crate
+ * @throws \Exceptions\ROCrateException Exceptions with specific messages to indicate possible errors
+ * @return Entity|null The root data entity instance or null if the instance does not exist
+ */
+ public function getRootDataset(): Entity
+ {
+ if (!$this->rootDataset) {
+ throw new ROCrateException("Root dataset not initialized");
+ }
+ return $this->rootDataset;
+ }
+
+ /**
+ * Adds a metadata descriptor to the crate
+ * @param string $profile The ro-crate standard used with the specific version
+ * @param string $about The dataset to describe
+ * @return void
+ */
+ public function addProfile(string $profile = 'https://w3id.org/ro/crate/1.2', string $about = './'): void
+ {
+ $this->descriptor->addPropertyPair('conformsTo', $profile, true);
+ $this->descriptor->addPropertyPair('about', $about, true);
+ }
+
+ /**
+ * Sets the base path
+ * @param string $basePath The base path as a string
+ * @return void
+ */
+ public function setBasePath(string $basePath): void
+ {
+ $this->basePath = $basePath;
+ }
+
+ /**
+ * Tells if a text is valid according to ISO 8601 standard and allows only up-to-day, up-to-month
+ * and up-to-day specification for more flexibility and compatibility
+ * @param string $dateString The text to be validated
+ * @return bool The flag that indicates the result of validation
+ */
+ public static function isValidISO8601Date(string $dateString): bool
+ {
+
+ $MM = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"];
+
+ if (strcmp(substr($dateString, 0, 1), "-") == 0) {
+ $dateString = substr($dateString, 1);
+ }
+
+ $year = "";
+ $month = "";
+ $day = "";
+
+ // It does not check for the presence of day 29, 30 and 31 in the particular month for day-only,
+ // month-only and year-only cases
+ $flag = true;
+ switch (strlen($dateString)) {
+ case 4:
+ $year = $dateString;
+ if (!ctype_digit($year)) {
+ $flag = false;
+ }
+ break;
+ case 7:
+ if (strcmp(substr($dateString, 4, 1), "-") !== 0) {
+ $flag = false;
+ }
+ $year = substr($dateString, 0, 4);
+ $month = substr($dateString, 5, 2);
+ if (!ctype_digit($year)) {
+ $flag = false;
+ } elseif (!in_array($month, $MM)) {
+ $flag = false;
+ }
+ break;
+ case 10:
+ if (strcmp(substr($dateString, 4, 1), "-") !== 0) {
+ $flag = false;
+ } elseif (strcmp(substr($dateString, 7, 1), "-") !== 0) {
+ $flag = false;
+ }
+ $year = substr($dateString, 0, 4);
+ $month = substr($dateString, 5, 2);
+ $day = substr($dateString, 8, 2);
+ if (!ctype_digit($year)) {
+ $flag = false;
+ } elseif (!in_array($month, $MM)) {
+ $flag = false;
+ } elseif (!in_array($day, $MM)) {
+ if (!ctype_digit($year)) {
+ $flag = false;
+ } elseif (((int)$day < 13) || ((int)$day > 31)) {
+ $flag = false;
+ }
+ }
+ break;
+ default:
+ $flag = false;
+ break;
+ }
+ if ($flag) {
+ return true;
+ }
+
+
+ // Regex to match the structure: optional minus, date, time, and optional timezone
+ $pattern = '/^(-)?(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2})(Z|([+-]\d{2}:\d{2}))?$/';
+ if (!preg_match($pattern, $dateString, $matches)) {
+ return false;
+ }
+
+ // Extract components from matches
+ $hasMinus = ($matches[1] === '-');
+ $yearStr = $matches[2];
+ $month = $matches[3];
+ $day = $matches[4];
+ $hour = $matches[5];
+ $minute = $matches[6];
+ $second = $matches[7];
+ $timezone = $matches[8] ?? '';
+
+ // Convert year to integer (accounting for optional minus)
+ $year = (int)$yearStr;
+ if ($hasMinus) {
+ $year = -$year;
+ }
+
+ // Validate month (01-12)
+ $monthInt = (int)$month;
+ if ($monthInt < 1 || $monthInt > 12) {
+ return false;
+ }
+
+ // Validate day based on month/year
+ $daysInMonth = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31];
+ // Adjust February days for leap years
+ if ($monthInt === 2) {
+ $isLeap = ($year % 4 === 0) && ($year % 100 !== 0 || $year % 400 === 0);
+ $daysInMonth[1] = $isLeap ? 29 : 28;
+ }
+ $dayInt = (int)$day;
+ $maxDay = $daysInMonth[$monthInt - 1];
+ if ($dayInt < 1 || $dayInt > $maxDay) {
+ return false;
+ }
+
+ // Validate time (00-23 for hour, 00-59 for minute/second)
+ if ($hour < '00' || $hour > '23') {
+ return false;
+ }
+ if ($minute < '00' || $minute > '59') {
+ return false;
+ }
+ if ($second < '00' || $second > '59') {
+ return false;
+ }
+
+ // Validate timezone if present
+ if ($timezone !== '') {
+ if ($timezone === 'Z') {
+ return true; // UTC is valid
+ }
+ // Check offset structure: [+-]hh:mm
+ $offsetParts = explode(':', $timezone);
+ if (count($offsetParts) !== 2) {
+ return false;
+ }
+ $offsetSignPart = $offsetParts[0];
+ $offsetMinute = $offsetParts[1];
+ // Validate sign and hour part (00-23)
+ $sign = $offsetSignPart[0];
+ $offsetHour = substr($offsetSignPart, 1);
+ if (
+ ($sign !== '+' && $sign !== '-') ||
+ $offsetHour < '00' || $offsetHour > '23'
+ ) {
+ return false;
+ }
+ // Validate minute part (00-59)
+ if ($offsetMinute < '00' || $offsetMinute > '59') {
+ return false;
+ }
+ }
+
+ return true;
+ }
+
+ /**
+ * Tells if a string is a valid uri
+ * @param string $uri The uri string to be examined
+ * @param bool $absoluteOnly The flag to indicate whether only absolute uri is allowed
+ * @return bool The flag that indicates the validation result
+ */
+ public static function isValidUri(string $uri, bool $absoluteOnly = true): bool
+ {
+ // Validate absolute URI (requires a scheme like http, ftp, etc.)
+ if (filter_var($uri, FILTER_VALIDATE_URL) !== false) {
+ return true;
+ }
+
+ // If only absolute URIs are allowed, return false here
+ if ($absoluteOnly) {
+ return false;
+ }
+
+ // Validate relative URI: checks for allowed characters and proper percent-encoding
+ $pattern = '/^([a-zA-Z0-9._~!$&\'()*+,;=:@\/?#\[\]-]|%[0-9a-fA-F]{2})*$/';
+ return (bool) preg_match($pattern, $uri);
+ }
+
+ /**
+ * Tells if a string is a valid url
+ * @param string $url The url string to be exmained
+ * @return bool The flag that indicates the checking result
+ */
+ public static function isValidUrl(string $url): bool
+ {
+ // Validate URL structure using filter_var
+ if (filter_var($url, FILTER_VALIDATE_URL) === false) {
+ return false;
+ }
+
+ // Ensure the URL has a valid scheme
+ $scheme = parse_url($url, PHP_URL_SCHEME);
+ return $scheme !== null;
+ }
+
+ /**
+ * Lays out and prints the structure of a nested array for debugging
+ * @param mixed $array The array to be examined
+ * @param mixed $indent The indentation
+ * @return void
+ */
+ public function printNestedArray($array, $indent = 0): void
+ {
+ foreach ($array as $key => $value) {
+ // Add indentation for better readability of nested levels
+ echo str_repeat(" ", $indent);
+
+ if (is_array($value)) {
+ echo "Key: " . $key . " (Array):\n";
+ $this->printNestedArray($value, $indent + 1); // Recursively call for nested arrays
+ } else {
+ echo "Key: " . $key . ", Value: " . $value . "\n";
+ }
+ }
+ }
+
+ /**
+ * Returns the validation error(s)
+ * @return array The array of all the error message(s)
+ */
+ public function showErrors(): array
+ {
+ return $this->errors;
+ }
+
+ /**
+ * Saves with the error message explicitly returned for further examination
+ * @return array The array consisting of error messages
+ */
+ public function saveWithErrorMessage(): array
+ {
+ $errors = [];
+
+ try {
+ $this->save();
+ } catch (Exception $e) {
+ $errors = $this->showErrors();
+ }
+
+ return $errors;
+ }
+}
diff --git a/src/rocrate/ROCratePreviewGenerator.php b/src/rocrate/ROCratePreviewGenerator.php
new file mode 100644
index 0000000..149d4bb
--- /dev/null
+++ b/src/rocrate/ROCratePreviewGenerator.php
@@ -0,0 +1,541 @@
+buildTermUris($data['@context'] ?? []);
+ $entities = $generator->indexEntities($data['@graph'] ?? []);
+ $rootEntity = $generator->findRootEntity($entities);
+
+ // Generate HTML
+ $html = $generator->generateHTML($rootEntity, $entities, $termUris, $basePath);
+ file_put_contents(OUTPUT_HTML, $html);
+
+ echo "Successful Creation of Preview file";
+ }
+
+ /**
+ * Builds context term URIs
+ * @param mixed $context The context extracted from the metadata file
+ * @return array The term URIs
+ */
+ public function buildTermUris($context): array
+ {
+ $termUris = [];
+ if (is_array($context)) {
+ foreach ($context as $key => $value) {
+ if (is_string($value)) {
+ $termUris[$key] = $value;
+ } elseif (is_array($value) && isset($value['@id'])) {
+ $termUris[$key] = $value['@id'];
+ }
+ }
+ }
+ return $termUris;
+ }
+
+ /**
+ * Indices the entities using their Ids
+ * @param array $graph The entities extracted from the metadata file
+ * @return array The array of indiced entities
+ */
+ public function indexEntities(array $graph): array
+ {
+ $index = [];
+ foreach ($graph as $entity) {
+ $index[$entity['@id']] = $entity;
+ }
+ return $index;
+ }
+
+ /**
+ * Finds the root data entity
+ * @param array $entities The indiced entities
+ */
+ public function findRootEntity(array $entities)
+ {
+ foreach ($entities as $entityData) {
+ $conditionOne = str_contains($entityData['@id'], "ro-crate-metadata.json");
+ $conditionTwo = array_key_exists("conformsTo", $entityData);
+ if ($conditionOne && $conditionTwo) {
+ global $rootId;
+ $rootId = $entityData['about']['@id'];
+ break;
+ }
+ }
+
+ foreach ($entities as $entity) {
+ if (($entity['@id'] ?? '') === $rootId) {
+ return $entity;
+ }
+ }
+ return reset($entities) ?: [];
+ }
+
+ /**
+ * Generates the HTML file for the preview
+ * @param mixed $rootEntity The root data entity
+ * @param mixed $entities The indiced entities array
+ * @param mixed $termUris The term URIs
+ * @param mixed $basePath The base path
+ * @return bool|string The HTML file as a string
+ */
+ public function generateHTML($rootEntity, $entities, $termUris, $basePath): string
+ {
+ ob_start(); ?>
+
+
+
+
+
+ RO-Crate Preview: = htmlspecialchars($rootEntity['name'] ?? 'Untitled') ?>
+
+
+
+
+
+
';
+ foreach ($entity as $key => $value) {
+ $keyHtml = ROCratePreviewGenerator::renderKey($key, $termUris);
+ $valStr = ROCratePreviewGenerator::renderValue($value, $entities, $termUris, $basePath, $depth);
+
+ $values = explode(' %%$$%%$$** ', $valStr);
+
+ // if we can resolve the key from the default context, we attach [?] hyperlink
+ $contextData = json_decode(file_get_contents($basePath . "/context.jsonld"), true)['@context'];
+ if (array_key_exists($key, $contextData)) {
+ $resolvedKey = $contextData[$key];
+
+ if (is_array($values)) {
+ $keyFirst = "
$keyHtml [?] :";
+ foreach ($values as $valueHtml) {
+ // if value is id, we make it hyperlink and show name if name exists in the entity
+ $conditionOne = (!is_array($valueHtml)) && (strcmp($key, '@id') !== 0);
+ $conditionTwo = (array_key_exists($valueHtml, $entities));
+ if ($conditionOne && $conditionTwo) {
+ $temp = htmlspecialchars($entities[$valueHtml]['name'] ?? $valueHtml);
+ if (strcmp($temp, "") == 0) {
+ $temp = $valueHtml;
+ }
+ if (strcmp(substr($keyFirst, -1), ':') == 0) {
+ $html .= $keyFirst . " $temp
";
+ } else {
+ $html .= $keyFirst . " $temp ";
+ }
+ } elseif (ROCrate::isValidUri($valueHtml)) {
+ if (strcmp(substr($keyFirst, -1), ':') == 0) {
+ $html .= $keyFirst . " $valueHtml ";
+ } else {
+ $html .= $keyFirst . " $valueHtml ";
+ }
+ } else {
+ if (strcmp(substr($keyFirst, -1), ':') == 0) {
+ $html .= $keyFirst . " $valueHtml";
+ } else {
+ $html .= $keyFirst . " $valueHtml";
+ }
+ }
+
+ $keyFirst = ", ";
+ }
+ }
+ //else {
+ // $valueHtml = $values;
+ // // if value is id, we make it hyperlink and show name if name exists in the entity
+ // if ((!is_array($valueHtml)) && (strcmp($key, '@id') !== 0)
+ // && (array_key_exists($valueHtml, $entities))) {
+ // $temp = htmlspecialchars($entities[$valueHtml]['name'] ?? $valueHtml);
+ // $html .= "
$keyHtml :";
+ foreach ($values as $valueHtml) {
+ // if value is id, we make it hyperlink and show name if name exists in the entity
+ $conditionOne = (!is_array($valueHtml));
+ $conditionTwo = (strcmp($key, '@id') !== 0);
+ $conditionThree = (array_key_exists($valueHtml, $entities));
+ if ($conditionOne && $conditionTwo && $conditionThree) {
+ $temp = htmlspecialchars($entities[$valueHtml]['name'] ?? $valueHtml);
+ if (strcmp($temp, "") == 0) {
+ $temp = $valueHtml;
+ }
+ if (strcmp(substr($keyFirst, -1), ':') == 0) {
+ $html .= $keyFirst . " $temp