Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
5e12663
Attempt commit
Jun 16, 2025
cd0cecf
Second attempt commit
Jun 16, 2025
c2a9aea
Fixed the improper path of /var/www/html in launch.json
Jun 17, 2025
093d7b9
updated json data which may better describe the ro-crate as an object…
Jun 20, 2025
8ad50ed
Updated ROCrate.php and composer.json
Jul 7, 2025
b229bcb
Added a few basic test cases on ROCrate class
Jul 7, 2025
f184a2f
Added inline phpdoc documentation for main code, added some new unit …
Jul 14, 2025
6336eab
Remove some unnecessary comments in the ROCrate.php, add a README fil…
Jul 21, 2025
2bd7260
Fixed typo in README file
Jul 21, 2025
a8b1baf
Fixed more typo of README
Jul 21, 2025
bad0594
Fixed one more typo of README
Jul 21, 2025
e92b9c1
added validation
Jul 28, 2025
8c5a763
added class for ro-crate exception
Jul 28, 2025
f81d7db
Added more tests. Enable only uses of propertyadd/removePair methods …
Aug 4, 2025
10dd637
Added the preliminary html render with php doc, added some tests, upd…
Aug 11, 2025
c9e3216
Updated readme typo
Aug 11, 2025
97222e8
updated the .gitignore
Aug 13, 2025
e649168
again updated the .gitignore file
Aug 13, 2025
f6d0c80
commit to test if the .gitignore functions correctly
Aug 13, 2025
0e6b599
updated the name and license of composer.json
Aug 13, 2025
eebe1d6
updated docker-compose.yml file to enable running the test using comm…
Aug 13, 2025
c5f8ed5
updated the composer.json to include the phpcs so that we can run com…
Aug 13, 2025
15f863b
fixed phpcs warning about coding style
Aug 13, 2025
42ff906
fixed typo in phpdoc inline comment
Aug 13, 2025
16dd13d
fixed some of the issues suggested by the bot: e.g. string concatenat…
Aug 13, 2025
026aa39
fixed some potential issues suggested by the coderabbitai
Aug 15, 2025
59253db
updated index.php to create gigaDB example of metadata file and previ…
Aug 15, 2025
21093ed
fixed subtle typo in the GigaDB example and some issues suggested bu …
Aug 18, 2025
788261a
fixed minor formatting issue of the generated html suggested by coder…
Aug 18, 2025
79774c3
removed comment for development only
Aug 18, 2025
e06f07f
updated the missing thumbnail entity in index.php together with the e…
Aug 18, 2025
2f9fa5e
Added the usage guide, particularly using with GigaDB datasets
Aug 20, 2025
6796a72
Fixed a minor typo in Guide.md
Aug 20, 2025
adff89d
further fxied typo in Guide.md
Aug 20, 2025
5f32ebf
Further fixed a typo in Guide.md again.
Aug 20, 2025
023d0a6
updated the Guide.md to notify that the tool implementation and the g…
Aug 28, 2025
86a562a
updated the readme about downstream task's comaptibility
Aug 28, 2025
749ff87
Fixed the Install section of the README.md
Aug 28, 2025
0b0c7e6
fixed a typo in Guide.md
Aug 28, 2025
464119b
Fixed another typo in Guide.md
Aug 28, 2025
5620985
Updated the README.md and added a simple CHANGELOG.md.
Sep 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.vscode

vendor

.phpunit.result.cache
52 changes: 52 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- This **[CHANGELOG.md](https://github.com/gigascience/ro-crate-php/blob/Alex/CHANGELOG.md)** to document the essential changes of this project.
- Missing thumbnail entity in **[index.php](https://github.com/gigascience/ro-crate-php/blob/Alex/src/index.php)**.
- Usage guide, particularly for using with GigaDB datasets.
- GigaDB example of metadata file and preview using the dataset 102736 in assets directory in the repository.
- PHPCS for coding standard checks (PSR12) through composer.
- Docker-compose configuration for running tests.
- Preliminary HTML render with PHP documentation.
- Additional unit tests for the ROCrate class.
- RO-Crate exception class.
- Partial validation of RO-Crate metadata when saving the manipulated crate.
- Chaining capability for adding and removing entities.
- Inline PHPDoc documentation for main code, i.e. the ROCrate class.
- Basic unit test cases for ROCrate class.
- Contributing documentation.
- Code of conduct document.
- CODEOWNERS file with GigaScience developers as owners.

### Fixed

- Multiple typos in **[Guide.md](https://github.com/gigascience/ro-crate-php/blob/Alex/Guide.md)** and **[README.md](https://github.com/gigascience/ro-crate-php/blob/Alex/README.md)**.
- Minor formatting issues in generated HTML previews.
- String concatenation and variable naming issues.
- PHPCS warnings about coding style.
- PHPDoc inline comment typos.
- ISO 8601 DateTime validation's issues.

### Changed

- Update **[Guide.md](https://github.com/gigascience/ro-crate-php/blob/Alex/Guide.md)** to specify that it is primarily designed for the latest RO-Crate v1.2 but not earlier versions.
- Enhance generated HTML formatting with type hyperlinks to [schema.org](https://schema.org/) definitions as specified in the **[RO-Crate v1.2 context](https://www.researchobject.org/ro-crate/specification/1.2/context.jsonld)**.
- Abstract property-add/remove-pair methods to hide formatting details.
- Replace Person class with a generic class implementation.
- Update composer.json name and license information.
- Improve **[README.md](https://github.com/gigascience/ro-crate-php/blob/Alex/README.md)** with installation section updates and downstream task compatibility.

### Removed

- Person class to be replaced with generic implementation.
- Some unnecessary and development-only comments.

[unreleased]: https://github.com/gigascience/ro-crate-php/tree/Alex
84 changes: 84 additions & 0 deletions Guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@

# Usage Guide for ro-crate-php
Below are some notes to pay attention to when developers are using the tool to manipulate RO-Crate Metadata file. The note is GigaDB-oriented. This note is written to help ease the use of the tool to create or manipulate the RO-Crate Metadata file concerning GigaDB datasets by removing unnecessary and only emphasizing relevant technical details about the specific standard of RO-Crate 1.2. Please note that the implementation of the tool and this guide is 1.2-oriented, and some constraints in standard of RO-Crate 1.1 are no longer required in the standard of RO-Crate 1.2. To facilitate the use with other existing RO-Crate applications based on the standard of 1.1, a RO-Crate file has to be built in accordance with the requirements in the standard of 1.1.

---

## Overview
This is a PHP tool to create and manipulate Research Object Crate. Please refer to the repository's *[README.md](https://github.com/gigascience/ro-crate-php/tree/main)* for more details. Below are the high-level steps instructing the creation of the metadata file for a GigaDB dataset from scratch. The created file may not be perfect but ought to be able to provide sufficient description of the dataset. An example created following the flow is in the assets directory of the repository above.

Comment on lines +7 to +9
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Tighten overview; fix pluralization

Improve clarity and grammar; “Research Object Crate” → “Research Object Crates”.

-## Overview
-This is a PHP tool to create and manipulate Research Object Crate. Please refer to the repository's *[README.md](https://github.com/gigascience/ro-crate-php/tree/main)* for more details. Below are the high-level steps instructing the creation of the metadata file for a GigaDB dataset from scratch. The created file may not be perfect but ought to be able to provide sufficient description of the dataset. An example created following the flow is in the assets directory of the repository above.
+## Overview
+This is a PHP tool to create and manipulate Research Object Crates. See the repository *[README.md](https://github.com/gigascience/ro-crate-php/tree/main)* for details. Below are high‑level steps for creating a GigaDB dataset’s metadata from scratch. The result may not be perfect but should sufficiently describe the dataset. An example following this flow is in the repository’s assets directory.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Overview
This is a PHP tool to create and manipulate Research Object Crate. Please refer to the repository's *[README.md](https://github.com/gigascience/ro-crate-php/tree/main)* for more details. Below are the high-level steps instructing the creation of the metadata file for a GigaDB dataset from scratch. The created file may not be perfect but ought to be able to provide sufficient description of the dataset. An example created following the flow is in the assets directory of the repository above.
## Overview
This is a PHP tool to create and manipulate Research Object Crates. See the repository *[README.md](https://github.com/gigascience/ro-crate-php/tree/main)* for details. Below are high-level steps for creating a GigaDB dataset’s metadata from scratch. The result may not be perfect but should sufficiently describe the dataset. An example following this flow is in the repository’s assets directory.
🧰 Tools
🪛 LanguageTool

[grammar] ~7-~7: There might be a mistake here.
Context: ...n the standard of 1.1. --- ## Overview This is a PHP tool to create and manipul...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ... not be perfect but ought to be able to provide sufficient description of the dataset. ...

(QB_NEW_EN)

🤖 Prompt for AI Agents
In Guide.md around lines 7 to 9, the overview sentence is awkward and uses the
singular “Research Object Crate”; update it to be tighter and pluralized.
Replace the sentence "This is a PHP tool to create and manipulate Research
Object Crate." with a concise version such as "This PHP tool creates and
manipulates Research Object Crates." and ensure surrounding sentences flow
grammatically (adjust articles/verbs if needed).

**Version**: [1.0]
**Last Updated**: [2025-08-18]

---

## Note
The general rule is that we use the @id construct (<b>true</b> flag if using the add/removePropertyPair methods) when referring to another entity, we otherwise use a plain literal (<b>false</b> flag if using the add/removePropertyPair methods). There are exceptions for specific constructs not following the rules.

Another reminder is to add the entity to the crate after the creation of the entity.

Also, only one entity with the same ID has to be created.

In addition, name of an entity should be human-readable if it exists.

The metadata file always has ro-crate-metadata.json as the @id. The preview file has ro-crate-preview.html as the @id and filename. In detached package, i.e. the metadata file is not within the package, which is most likely for GigaDB, the filename ro-crate-metadata.json is renamed to xxxx-ro-crate-metadata.json, e.g. xxxx can be the dataset ID.

---

## Step 1
- **Initialization of the Crate**: Create the empty crate, then set the profile to specify the context version and the root data entity ID.- **Initialization of the Root Data Entity**: Set ID, name, description, datePublished, i.e. the date of first publication, and sdDatePublished, i.e. the date on which the current structured data was generated or published. The dates are is ISO 8601 standard, e.g. YYYY-MM-DD. For GigaDB, the dataset is most likely to be web-based, the ID has to be an absolute URI, e.g. **[https://gigadb.org/dataset/102736](https://gigadb.org/dataset/102736)**.
- **Specification of the Components**: Specify the ID of the files, dataset such as zip file using hasPart, possibly using the \# directory construct to collectively describe many files. Refer to **[<b>Step 2</b>](#step-2)** for handling the data entities of these files and datasets and potentially any entities derived from them. Note that metadata file and the preview file, if it exists, are specially treated and not included in hasPart.
- **Specification of the License**: Specify the ID of the license using license, e.g. *[https://creativecommons.org/publicdomain/zero/1.0/](https://creativecommons.org/publicdomain/zero/1.0/)* for the CC0 v1.0 license. Refer to **[<b>Step 3</b>](#step-3)** for handling the contextual entity of the license.
- **Specification of the Thumbnail**: Specify the ID of the thumbnail using thumbnail. The ID is recommended to be the corresponding downloadable PNG. Refer to **[<b>Step 4</b>](#step-4)** for handling the contextual entity of the thumbnail.
- **Specification of the Publisher and sdPublisher**: Specify the ID of the publisher and sdPublisher using publisher and sdPublisher, e.g. *[https://gigadb.org/](https://gigadb.org/)* for GigaDB being the publisher and sdPublisher. Refer to **[<b>Step 5</b>](#step-5)** for handling the contextual entity of the publisher and sdPublisher.
- **Specification of the Identifier and Cite-as**: Specify the identifier of the crate using identifier as an @id. As a special construct, we also include the identifier one by one as a plain string using cite-as. The identifier should be chosen to be persistent and resolvable in this way from a URI, which is commonly possible for a GigaDB dataset that has its doi. For example, it can be **[https://doi.org/10.4225/59/59672c09f4a4b](https://doi.org/10.4225/59/59672c09f4a4b)**. Refer to **[<b>Step 6</b>](#step-6)** for handling the contextual entity of the identifier.
- **Specification of Extra/Additional Information**: In case there is metadata that cannot be precisely described using existing properties, there is a special construct for it. Specify an exifData using a local identifier such as \#extraInfo as an @id. Refer to **[<b>Step 7</b>](#step-7)** for handling the contextual entity of the exifData. In a GigaDB dataset, information of the root dataset including Dataset type , Additional information , Additional information , Additional information , Additional information , Additional information , Additional information , Additional information , Additional information , Github links , Github links , Github links , Github links , Accessions (data not in GigaDB) and History can be wrapped by this construct. Note that this construct also works for other entities, e.g. Awardee and Award ID used with the organization entity for the funder, or Extra Information used with different file entities.
- **Specification of Citation**: In case the dataset cites publications like other datasets or papers, we have to include this information by specifying the ID of the publication using citation. Note that the ID has to be a URL (for example a DOI URL). In case of citing another dataset/crate, the ID should be chosen to be the @id value of the identifier property of that crate instead of the actual ID of that crate. Refer to **[<b>Step 8</b>](#step-8)** for handling the contextual entity of the citation of the publication.
- **Specification of Authors**: Specify the IDs of the author(s) one by one using author. For a GigaDB dataset, ORCID is usually picked as the ID for an author. For example, it may be **[https://orcid.org/0000-0001-9083-6757](https://orcid.org/0000-0001-9083-6757)**. Refer to **[<b>Step 9</b>](#step-9)** for handling the contextual entity of each of the author(s).
- **Specification of Funders**: Here the assumption that no information about an explicit associated research project is present is made. Specify the ID of the funder using funder, which happens to be the case for some of the GigaDB datasets. For a gigaDB dataset, the ID is often selected to be a ror, for instance, **[https://ror.org/011kf5r70](https://ror.org/011kf5r70)**. Refer to **[<b>Step 10</b>](#step-10)** for handling the contextual entity of the funder.
- **Specification of Keywords**: Specify the keyword(s) of the root dataset using keywords as a plain string that concatenates all keywords with comma as the delimiter. As a special construct, together with the use of keywords property, we have to specify the IDs of these keyword(s) one by one using about as @id's. Such ID is usually a url that explains the corresponding keyword, for example, **[https://nanoporetech.com/](https://nanoporetech.com/)** for the keyword of oxford nanopore technologies. Refer to **[<b>Step 11</b>](#step-11)** for handling the contextual entity of the about property.

## Step 2
- **File**: Create a File entity with the respective ID, which has to be an absolute URI for a web-based entity. For a GigaDB dataset, it is most likely web-based, the ID is often selected to be the url that directly downloads the file. Then, we set the name, contentSize and encodingFormat. Note that the contentSize is either in kB or MB. Also, note that the encodingFormat is a plain string xxx/yyy, for instance, text/csv. In some cases that a more informative encodingFormat of the form xxx/yyy followed by a **[PRONOM](https://www.nationalarchives.gov.uk/PRONOM/Default.aspx)** identifier, for example, ["application/pdf", {"@id": **["https://www.nationalarchives.gov.uk/PRONOM/fmt/19"]("https://www.nationalarchives.gov.uk/PRONOM/fmt/19")**}]. Additionally, we can include some extra information including data types and file attributes using the exifData construct.
- **Directory/Dataset/zip file**: Create a Dataset entity with the respective ID, which has to be an absolute URI. Such URI should resolve to a listing of the content of the directory/dataset/zip file. For a GigaDB dataset, it is most likely a web-based zip file, the ID is often selected to be the url that shows its description, for example, **[https://gigadb.org/dataset/view/id/102736/Files_page/4](https://gigadb.org/dataset/view/id/102736/Files_page/4)**. Then, we set the name, description, distribution and releaseDate. Note that the distribution is the url that downloads the content, for example, **[https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip](http://127.0.0.1:5501/assets/ro-crate-preview.html#https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip)**. Also, note that the releaseDate should be in the ISO 8601 format. Furthermore, we can include some extra information including data types and file attributes using the exifData construct..
- **Collective Construct with \#**: In case that we prefer describing some files or/and directories collectively, we create a Dataset entity with a local identifier as the ID, for example, \#other-files. Then, we set the name and description.
Comment on lines +42 to +44
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix JSON example and broken link; minor grammar

The PRONOM example formatting is broken and the distribution link points to localhost with a hash fragment.

-- **File**: Create a File entity with the respective ID, which has to be an absolute URI for a web-based entity. For a GigaDB dataset, it is most likely web-based, the ID is often selected to be the url that directly downloads the file. Then, we set the name, contentSize and encodingFormat. Note that the contentSize is either in kB or MB. Also, note that the encodingFormat is a plain string xxx/yyy, for instance, text/csv. In some cases that a more informative encodingFormat of the form xxx/yyy followed by a **[PRONOM](https://www.nationalarchives.gov.uk/PRONOM/Default.aspx)** identifier, for example, ["application/pdf", {"@id": **["https://www.nationalarchives.gov.uk/PRONOM/fmt/19"]("https://www.nationalarchives.gov.uk/PRONOM/fmt/19")**}]. Additionally, we can include some extra information including data types and file attributes using the exifData construct.
+- **File**: Create a File entity with an absolute URI @id (for GigaDB, typically a direct download URL). Then set name, contentSize, and encodingFormat. encodingFormat is a MIME type string (e.g., text/csv). If needed, you can pair it with a PRONOM identifier:
+  - encodingFormat: "application/pdf"
+  - encodingFormatIRI: https://www.nationalarchives.gov.uk/PRONOM/fmt/19
+  Additionally, include extra information (e.g., data types, file attributes) using the exifData construct.
 - **Directory/Dataset/zip file**: Create a Dataset entity with the respective ID, which has to be an absolute URI. Such URI should resolve to a listing of the content of the directory/dataset/zip file. For a GigaDB dataset, it is most likely a web-based zip file, the ID is often selected to be the url that shows its description, for example, **[https://gigadb.org/dataset/view/id/102736/Files_page/4](https://gigadb.org/dataset/view/id/102736/Files_page/4)**. Then, we set the name, description, distribution and releaseDate. Note that the distribution is the url that downloads the content, for example, **[https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip](http://127.0.0.1:5501/assets/ro-crate-preview.html#https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip)**. Also, note that the releaseDate should be in the ISO 8601 format. Furthermore, we can include some extra information including data types and file attributes using the exifData construct..
+- **Directory/Dataset/zip file**: Create a Dataset entity with an absolute URI @id that resolves to a listing of the content. For GigaDB, the @id is often the dataset page (e.g., **https://gigadb.org/dataset/view/id/102736/Files_page/4**). Then set name, description, distribution, and releaseDate. distribution should be the direct download URL (e.g., **https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip**). releaseDate must be ISO 8601. You may also include extra information using the exifData construct.
 - **Collective Construct with \#**: In case that we prefer describing some files or/and directories collectively, we create a Dataset entity with a local identifier as the ID, for example, \#other-files. Then, we set the name and description.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **File**: Create a File entity with the respective ID, which has to be an absolute URI for a web-based entity. For a GigaDB dataset, it is most likely web-based, the ID is often selected to be the url that directly downloads the file. Then, we set the name, contentSize and encodingFormat. Note that the contentSize is either in kB or MB. Also, note that the encodingFormat is a plain string xxx/yyy, for instance, text/csv. In some cases that a more informative encodingFormat of the form xxx/yyy followed by a **[PRONOM](https://www.nationalarchives.gov.uk/PRONOM/Default.aspx)** identifier, for example, ["application/pdf", {"@id": **["https://www.nationalarchives.gov.uk/PRONOM/fmt/19"]("https://www.nationalarchives.gov.uk/PRONOM/fmt/19")**}]. Additionally, we can include some extra information including data types and file attributes using the exifData construct.
- **Directory/Dataset/zip file**: Create a Dataset entity with the respective ID, which has to be an absolute URI. Such URI should resolve to a listing of the content of the directory/dataset/zip file. For a GigaDB dataset, it is most likely a web-based zip file, the ID is often selected to be the url that shows its description, for example, **[https://gigadb.org/dataset/view/id/102736/Files_page/4](https://gigadb.org/dataset/view/id/102736/Files_page/4)**. Then, we set the name, description, distribution and releaseDate. Note that the distribution is the url that downloads the content, for example, **[https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip](http://127.0.0.1:5501/assets/ro-crate-preview.html#https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip)**. Also, note that the releaseDate should be in the ISO 8601 format. Furthermore, we can include some extra information including data types and file attributes using the exifData construct..
- **Collective Construct with \#**: In case that we prefer describing some files or/and directories collectively, we create a Dataset entity with a local identifier as the ID, for example, \#other-files. Then, we set the name and description.
- **File**: Create a File entity with an absolute URI @id (for GigaDB, typically a direct download URL). Then set name, contentSize, and encodingFormat. encodingFormat is a MIME type string (e.g., text/csv). If needed, you can pair it with a PRONOM identifier:
- encodingFormat: "application/pdf"
- encodingFormatIRI: https://www.nationalarchives.gov.uk/PRONOM/fmt/19
Additionally, include extra information (e.g., data types, file attributes) using the exifData construct.
- **Directory/Dataset/zip file**: Create a Dataset entity with an absolute URI @id that resolves to a listing of the content. For GigaDB, the @id is often the dataset page (e.g., https://gigadb.org/dataset/view/id/102736/Files_page/4). Then set name, description, distribution, and releaseDate. distribution should be the direct download URL (e.g., https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/102001_103000/102736/BoostNano-master.zip). releaseDate must be ISO 8601. You may also include extra information using the exifData construct.
- **Collective Construct with \#**: In case that we prefer describing some files or/and directories collectively, we create a Dataset entity with a local identifier as the ID, for example, \#other-files. Then, we set the name and description.
🧰 Tools
🪛 LanguageTool

[grammar] ~42-~42: There might be a mistake here.
Context: ...attributes using the exifData construct. - Directory/Dataset/zip file: Create a D...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ...ve ID, which has to be an absolute URI. Such URI should resolve to a listing of the ...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ...attributes using the exifData construct.. - Collective Construct with #: In case ...

(QB_NEW_EN)

🤖 Prompt for AI Agents
In Guide.md around lines 42 to 44, the PRONOM example JSON/Markdown formatting
is broken and the distribution URL incorrectly points to localhost with a hash
fragment; fix by correcting the PRONOM snippet to show the encodingFormat as a
valid JSON-LD array (e.g. ["application/pdf", {"@id":
"https://www.nationalarchives.gov.uk/PRONOM/fmt/19"}]) so the link is rendered
properly, replace the localhost/disallowed hash-fragment distribution URL with
the actual public download URL (remove the 127.0.0.1 and the "#..." fragment),
and tidy minor grammar/typo issues (remove the duplicated period after exifData)
so the paragraph reads cleanly.


## Step 3
- **License Creation**: Create a contextual entity with the respective ID of type CreativeWork, then set the name and description of the license, where the description may have to be searched or recorded online.

## Step 4
- **Thumbnail Handling**: When the thumbnail is incidental to the root dataset, usually the case, we do not include it in the hasPart of the root data entity and creates a File entity with the respective ID.

## Step 5
- **Publisher and sdPublisher Handling**: Create an Organization entity with the respective ID, then set the name and description of the organization, where the name and the description may have to be searched or recorded online. Also, set the contactPoint with usually the email following *mailto:*, e.g. **[mailto:database@gigasciencejournal.com](mailto:database@gigasciencejournal.com)**. Then, create a contactPoint entity with this respective ID, and set the contactType, email and identifier. For the case of the example ID, the email and identifier can share a plain string database@gigasciencejournal.com, while the contactType may be a plain string saying the contact of the publisher.

## Step 6
- **Identifier Handling**: Create a contextual entity of type PropertyValue with the respective ID, then set the propertyID, value and url. For example, the propertyID is **[https://registry.identifiers.org/registry/doi](https://registry.identifiers.org/registry/doi)** given the ID of the identifier being a doi. In case of a doi's ID of **[https://doi.org/10.4225/59/59672c09f4a4b](https://doi.org/10.4225/59/59672c09f4a4b)**, the value is set to be a plain string of doi:10.5524/102736. The url is often chosen to be identical to the ID of the identifier.

Comment on lines +56 to +57
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix DOI example inconsistency and capitalization

The example DOI value mismatches the DOI URL; use the same DOI stem and capitalize “URL”.

-- **Identifier Handling**: Create a contextual entity of type PropertyValue with the respective ID, then set the propertyID, value and url. For example, the propertyID is **[https://registry.identifiers.org/registry/doi](https://registry.identifiers.org/registry/doi)** given the ID of the identifier being a doi. In case of a doi's ID of **[https://doi.org/10.4225/59/59672c09f4a4b](https://doi.org/10.4225/59/59672c09f4a4b)**, the value is set to be a plain string of doi:10.5524/102736. The url is often chosen to be identical to the ID of the identifier.
+- **Identifier Handling**: Create a contextual entity of type PropertyValue with the respective @id, then set propertyID, value, and URL. For example, propertyID can be **https://registry.identifiers.org/registry/doi** when the identifier is a DOI. If the identifier @id is **https://doi.org/10.4225/59/59672c09f4a4b**, set value to the plain string `doi:10.4225/59/59672c09f4a4b`. The URL is typically identical to the identifier’s @id.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Identifier Handling**: Create a contextual entity of type PropertyValue with the respective ID, then set the propertyID, value and url. For example, the propertyID is **[https://registry.identifiers.org/registry/doi](https://registry.identifiers.org/registry/doi)** given the ID of the identifier being a doi. In case of a doi's ID of **[https://doi.org/10.4225/59/59672c09f4a4b](https://doi.org/10.4225/59/59672c09f4a4b)**, the value is set to be a plain string of doi:10.5524/102736. The url is often chosen to be identical to the ID of the identifier.
- **Identifier Handling**: Create a contextual entity of type PropertyValue with the respective @id, then set propertyID, value, and URL. For example, propertyID can be **https://registry.identifiers.org/registry/doi** when the identifier is a DOI. If the identifier @id is **https://doi.org/10.4225/59/59672c09f4a4b**, set value to the plain string `doi:10.4225/59/59672c09f4a4b`. The URL is typically identical to the identifier’s @id.
🧰 Tools
🪛 LanguageTool

[grammar] ~56-~56: There might be a mistake here.
Context: ...n the ID of the identifier being a doi. In case of a doi's ID of **[https://doi.or...

(QB_NEW_EN)

🤖 Prompt for AI Agents
In Guide.md around lines 56-57, the DOI example has a mismatched DOI value and
uses lowercase "url"; update the example so the identifier URL and the DOI value
stem match (change the value to doi:10.4225/59/59672c09f4a4b to match
https://doi.org/10.4225/59/59672c09f4a4b) and capitalize "url" to "URL" in the
explanatory text.

## Step 7
- **exifData Handling**: Create a contextual entity with the respective ID of type PropertyValue, then set the name and value of the entity, where the name is the property name and the value is the property value as if such property existed in the context.

## Step 8
- **Citation Handling**: We will discuss the two cases when the publication is another dataset and a paper.
-- **Another Dataset/Crate**: Create a Publication entity of type CreativeWork with the respective ID, add an additional type of Dataset. Then, set the property conformsTo to be the version-less generic RO-Crate profile **[https://w3id.org/ro/crate](https://w3id.org/ro/crate)**. Note that we do not set hasPart and usually other properties for the entity representing the another crate, since its content and further metadata is available from its own RO-Crate Metadata Document.
-- **A Paper**: Create a Publication entity of type ScholarlyArticle with the respective ID. then set the name. Also, set the author, identifier, issn, journal, datePublished and creditText, if any. Note that author can has more than one value and datePublished should be in ISO 8601 format.

## Step 9
- **Author Handling**: Create a Person entity with the respective ID, then set the affiliation and the name. The affiliation should refer to an Organization entity. In case that such entity does not exist yet, we create an Organization entity with the respective ID, then set the name, where the name may have to be searched or recorded online. For a GigaDB dataset, a ror is often picked as the ID for the organization, for instance, **[https://ror.org/01ej9dk98](https://ror.org/01ej9dk98)**.

## Step 10
- **Funder Handling**: Create an Organization entity with the respective ID, then set the identifier, name and description. The identifier is always to be the same as the ID, and the description is Funding Body in this case. Additionally, we can use the exifData construct to include the information regarding the Awardee and the Award ID.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify “identifier is always the same as ID” guidance

Schema.org/Organization.identifier is typically a PropertyValue or URL; equating it to @id may be misleading. Please confirm intended modeling.


🌐 Web query:

Is it correct per schema.org to set Organization.identifier equal to the entity’s @id URL, or should identifier be a PropertyValue (or URL) that may differ from @id? Provide authoritative references.

💡 Result:

Short answer: Use @id for the canonical URI of the Thing; use identifier when you need to supply an explicit identifier value (text, URL, or a PropertyValue with propertyID/value) that may be different from @id (e.g., company registration numbers, DOIs, SKUs). Schema.org recommends using the built‑in URI mechanisms (itemid/@id) for canonical URIs but supports identifier/PropertyValue for additional or non‑URI identifiers. See schema.org docs. [1][2][3]

References

  • Schema.org Data model — notes on identifier vs built‑in URI (::prefer built‑in @id/itemid for canonical URIs; use identifier/PropertyValue for other identifier schemes). [1]
  • schema.org property "identifier" — types allowed and examples showing PropertyValue (propertyID + value) and URL/text usage. [2]
  • schema.org type "PropertyValue" — fields (propertyID, value) and guidance for representing identifier types. [3]

Sources
[1] https://schema.org/docs/datamodel.html
[2] https://schema.org/identifier
[3] https://schema.org/PropertyValue


Clarify Organization.identifier usage in Guide.md line 70
Replace “The identifier is always to be the same as the ID” with guidance that @id (itemid) serves as the canonical URI, while the identifier property is intended for additional or alternate identifiers (e.g., a PropertyValue, URL, or text) and may differ from @id (see schema.org Data model and identifier docs).

🤖 Prompt for AI Agents
In Guide.md around line 70, the sentence "The identifier is always to be the
same as the ID" is incorrect; update the guidance to state that the entity's @id
(itemid) is the canonical URI, while the identifier property is for additional
or alternate identifiers (e.g., PropertyValue, URL, or plain text) and may
differ from @id; replace the sentence with a concise note referencing
schema.org's data model/identifier semantics and give an example usage: use @id
as the canonical URI and populate identifier when you need external IDs or
alternate identifiers.


## Step 11
- **About Handling**: If the respective ID is an url, we create a contextual entity of type URL with the respective ID and set the name of the entity.

---

## Remark
There are other ways to create a RO-Crate Metadata document for a GigaDB dataset. This only serves as a rather minimal possible way to construct the document, where not all possible metadata of all entities are included. For manipulating an existing metadata document, we can similarly refer to these steps to look for missing parts.

---




Loading