From 0378b54b946f7352a4c3b5183abf35be645a96a9 Mon Sep 17 00:00:00 2001 From: Krasen Samardzhiev Date: Wed, 15 Oct 2025 09:49:10 +0100 Subject: [PATCH 1/2] change publishing options --- index.md | 4 +- myst.yml | 2 +- .../publish-pangeo.ipynb | 789 ------------------ pangeo/publishing-to-earthcode/publishing.md | 3 + 4 files changed, 7 insertions(+), 791 deletions(-) delete mode 100644 pangeo/publishing-to-earthcode/publish-pangeo.ipynb create mode 100644 pangeo/publishing-to-earthcode/publishing.md diff --git a/index.md b/index.md index ad092ef..add96a9 100644 --- a/index.md +++ b/index.md @@ -16,4 +16,6 @@ Looking how to upload data to the ESA Project Results Repository (PRR)? Start wi Looking how to contribute to the Open Science Catalog, Start with our [Open Science Catalog Tutorials](OSC/index.md) to learn how to add / change content in the metadata catalog, by enriching it with your research outcomes. -Looking for how to use the openEO to create workflows and experiments? Check out our [openEO Tutorials](openeo/index.md). \ No newline at end of file +Looking for how to use the openEO to create workflows and experiments? Check out our [openEO Tutorials](openeo/index.md). + +Looking for how to use the Pangeo deployment on EarthCODE? Check out our [Pangeo Tutorials](pangeo/index.md). \ No newline at end of file diff --git a/myst.yml b/myst.yml index f385f47..231cc0f 100644 --- a/myst.yml +++ b/myst.yml @@ -55,7 +55,7 @@ project: - file: pangeo/pangeo101/cloud-native-formats-101.ipynb - file: pangeo/pangeo101/dask101.ipynb - file: pangeo/burnt-area-example/pangeo_on_EarthCODE.ipynb - - file: pangeo/publishing-to-earthcode/publish-pangeo.ipynb + - file: pangeo/publishing-to-earthcode/publish.md # plugins: diff --git a/pangeo/publishing-to-earthcode/publish-pangeo.ipynb b/pangeo/publishing-to-earthcode/publish-pangeo.ipynb deleted file mode 100644 index af11852..0000000 --- a/pangeo/publishing-to-earthcode/publish-pangeo.ipynb +++ /dev/null @@ -1,789 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "69ed3139", - "metadata": { - "jp-MarkdownHeadingCollapsed": true - }, - "source": [ - "# Publish to EarthCODE\n", - "\n", - "## You can Publish to EarthCODE in Many Different Ways!\n", - "\n", - "EarthCODE provides a vibrant ecosystem of tools to publish your work automatically - these tools aim to fit the EO communities' preferred ways of working (code, cli or UI), and data formats and storage options (zarr, local netcdf files, etc..). \n", - "\n", - "In this workshop we will explore publishing to the EarthCODE catalog via `deep-code` which automates the key steps outlined below.\n", - "\n", - "The code for publishing via deep-code is available at this notebook: [deep-code publishing pangeo notebook](deep-code/deep-code/publish-pangeo.ipynb)\n", - "\n", - "`deep-code` is a lightweight python tool that comprises a command line interface(CLI) \n", - "and Python API providing utilities that aid integration of DeepESDL datasets, \n", - "experiments with EarthCODE.\n", - "\n", - "Find out more at: https://github.com/deepesdl/deep-code/tree/main\n", - "\n", - "You will need to first clone the deep-code repo and install it manually following the guide at https://github.com/deepesdl/deep-code/tree/main\n", - "\n", - "## deep_code usage\n", - "\n", - "`deep_code` provides a command-line tool called deep-code, which has several subcommands \n", - "providing different utility functions.\n", - "Use the --help option with these subcommands to get more details on usage.\n", - "\n", - "The CLI retrieves the Git username and personal access token from a hidden file named \n", - ".gitaccess. Ensure this file is located in the same directory where you execute the CLI\n", - "command.\n", - "\n", - "### deep-code generate-config\n", - "\n", - "Generates starter configuration templates for publishing to EarthCODE openscience \n", - "catalog.\n", - "\n", - "#### Usage\n", - "```\n", - "deep-code generate-config [OPTIONS]\n", - "```\n", - "\n", - "#### Options\n", - " --output-dir, -o : Output directory (default: current)\n", - "\n", - "#### Examples:\n", - "```\n", - "deep-code generate-config\n", - "deep-code generate-config -o ./configs\n", - "```\n", - "\n", - "### deep-code publish\n", - "\n", - "Publishes metadata of experiment, workflow and dataset to the EarthCODE open-science \n", - "catalog\n", - "\n", - "### Usage\n", - "```\n", - "deep-code publish DATASET_CONFIG WORKFLOW_CONFIG [--environment ENVIRONMENT]\n", - " ```\n", - "\n", - "#### Arguments\n", - " DATASET_CONFIG - Path to the dataset configuration YAML file\n", - " (e.g., dataset-config.yaml)\n", - "\n", - " WORKFLOW_CONFIG - Path to the workflow configuration YAML file\n", - " (e.g., workflow-config.yaml)\n", - "\n", - "#### Options\n", - " --environment, -e - Target catalog environment:\n", - " production (default) | staging | testing\n", - "\n", - "---\n", - "\n", - "\n", - "For this tutorial in the EDC environment we'll directly call the deepcode publish function" - ] - }, - { - "cell_type": "markdown", - "id": "649acd0f", - "metadata": {}, - "source": [ - "# Import Packages\n", - "\n", - "For this tutorial we will directly use the deep-code package functions, and not the CLI commands. Deep-code is under active development and will soon be available as a downloadable conda package" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "41bdebb9-a414-45aa-adf4-a20ad8bd44ed", - "metadata": {}, - "outputs": [], - "source": [ - "from deep_code.tools.publish import Publisher\n", - "from dotenv import load_dotenv\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "08e487a2", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Current Working Directory: /home/sunnydean/LPS25_Pangeo_x_EarthCODE_Workshop/publishing-to-earthcode/deep-code/deep-code\n" - ] - } - ], - "source": [ - "# Get the absolute path of the notebook\n", - "# Jupyter notebooks don’t have __file__, so usually you set it manually\n", - "notebook_path = os.path.abspath(\"publish-pangeo.ipynb\")\n", - "notebook_dir = os.path.dirname(notebook_path)\n", - "\n", - "# Change working directory\n", - "os.chdir(notebook_dir)\n", - "\n", - "# Confirm\n", - "print(\"Current Working Directory:\", os.getcwd())" - ] - }, - { - "cell_type": "markdown", - "id": "5c3cb725", - "metadata": {}, - "source": [ - "## Required Environment Variables\n", - "\n", - "To use deep-code to publish our data we will need to define a couple of environment variables and files.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "29ea2ab2", - "metadata": {}, - "source": [ - "## .gitaccess\n", - "First we need to give it access to our Github\n", - "\n", - "\n", - "### Creating a `.gitaccess` File for GitHub Authentication\n", - "\n", - "To enable deep-code to publish your work you must create a `.gitaccess` file with a GitHub personal access token (PAT) that grants repository access.\n", - "\n", - "### 1. Generate a GitHub Personal Access Token (PAT)\n", - "\n", - "1. Navigate to [GitHub → Settings → Developer settings → Personal access tokens](https://github.com/settings/tokens).\n", - "2. Click **“Generate new token”**.\n", - "3. Choose the following scopes to ensure full access:\n", - " - `repo` (Full control of repositories — includes fork, pull, push, and read)\n", - "4. Generate the token and **copy it immediately** — GitHub won't show it again.\n", - "\n", - "---\n", - "\n", - "### 2. Create the `.gitaccess` File\n", - "\n", - "Create a plain text file named `.gitaccess` in your project directory or home folder:\n", - "\n", - "```\n", - "github-username: your-git-user\n", - "github-token: personal access token\n", - "```\n", - "\n", - "Replace `your-git-user` and `your-personal-access-token` with your actual GitHub username and token.\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "ba1c533a", - "metadata": {}, - "source": [ - "# S3 Configuration for Public Data Access\n", - "\n", - "To use `deep-code`, your data must be publicly accessible. In this example, we use a public S3 bucket hosted at:\n", - "\n", - "[https://eu-west-2.console.aws.amazon.com/s3/buckets/pangeo-test-fires](https://eu-west-2.console.aws.amazon.com/s3/buckets/pangeo-test-fires), e.g. file: https://pangeo-test-fires.s3.eu-west-2.amazonaws.com/dnbr_dataset.zarr/.zattrs\n", - "\n", - "If your dataset is hosted in a public cloud location, simply configure the following environment variables to allow `deep-code` to access your data and automatically generate the appropriate EarthCODE options in a .env file. This will be loaded by load_dotenv() in the cell below\n", - "\n", - "```bash\n", - "S3_USER_STORAGE_BUCKET=pangeo-test-fires\n", - "AWS_DEFAULT_REGION=eu-west-2\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "d03120d0-22e9-4f2b-b3b9-efc8a9dc8010", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'pangeo-test-fires'" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "load_dotenv() # take environment variables\n", - "import os\n", - "os.environ.get(\"S3_USER_STORAGE_BUCKET\")" - ] - }, - { - "cell_type": "markdown", - "id": "2bec1f61", - "metadata": {}, - "source": [ - "# Uploading Data to a Public S3 Bucket with `xcube` - For Reference Only, We Recommend Uploading your Data to the ESA PRR!\n", - "\n", - "The cell bellow provides a quick walkthrough on how to create a publicly accessible S3 bucket and upload data to it using [`xcube`](https://xcube.readthedocs.io/) and `xarray`.\n", - "\n", - "---\n", - "## Step 1: Create a Public S3 Bucket\n", - "\n", - "1. Go to the [AWS S3 Console](https://s3.console.aws.amazon.com/s3/home).\n", - "2. Click **Create bucket**.\n", - "3. Enter a **unique bucket name**, e.g. `pangeo-test-fires`.\n", - "4. Choose your **AWS Region**, e.g. `eu-west-2 (London)`.\n", - "5. Under **Object Ownership**, choose **Bucket owner enforced (ACLs disabled)** — this is the recommended setting for using bucket policies without conflicting with ACLs.\n", - "6. Scroll down to **Block Public Access settings** and **uncheck all options** to allow public access:\n", - " - Uncheck:\n", - " - Block all public access\n", - " - Block public access to buckets and objects granted through new ACLs\n", - " - Block public access to buckets and objects granted through any ACLs\n", - " - Block public access to buckets and objects granted through new public bucket policies\n", - "7. Acknowledge the warning about making the bucket public.\n", - "8. Click **Create bucket** to finish.\n", - "\n", - "\n", - "### Configure Public Read Access\n", - "\n", - "To make objects in the bucket publicly accessible, apply the following **bucket policy**:\n", - "\n", - "#### Bucket Policy\n", - "\n", - "```json\n", - "{\n", - " \"Version\": \"2012-10-17\",\n", - " \"Statement\": [\n", - " {\n", - " \"Sid\": \"PublicReadGetObject\",\n", - " \"Effect\": \"Allow\",\n", - " \"Principal\": \"*\",\n", - " \"Action\": \"s3:GetObject\",\n", - " \"Resource\": \"arn:aws:s3:::pangeo-test-fires/*\" <---- replace with your bucket name\n", - " }\n", - " ]\n", - "}\n", - "\n", - "\n", - "Your bucket is now ready for public data access, any data you upload here is publically available.\n", - "\n", - "```\n", - "---\n", - "\n", - "## Step 2: Set Environment Variables\n", - "\n", - "To allow `xcube` to access and write to your S3 bucket in the cell below, define the following environment variables (to be loaded by loadenv()):\n", - "\n", - "```bash\n", - "S3_USER_STORAGE_BUCKET=pangeo-test-fires\n", - "AWS_DEFAULT_REGION=eu-west-2\n", - "S3_USER_STORAGE_KEY=\n", - "S3_USER_STORAGE_SECRET=\n", - "```\n", - "\n", - "Replace the placeholders with your actual AWS credentials. These are required for programmatic access and uploading data securely.\n", - "\n", - "## Step 3: Upload a Zarr Dataset Using xcube\n", - "\n", - "[`xcube`](https://xcube.readthedocs.io/) is a versatile Python library designed for working with spatiotemporal Earth observation data. It provides a unified interface to access, transform, analyze, and publish multidimensional datasets in cloud-optimized formats like Zarr. One of its key strengths is the ability to interact with a wide variety of storage backends — including local filesystems, object stores like Amazon S3, and remote services — using a consistent data store abstraction.\n", - "\n", - "In this context and the cell below, `xcube` is used to publish a local Zarr dataset to an S3 bucket, making it publicly accessible for further use in cloud-native geospatial workflows. This is particularly useful for EarthCODE applications or for distributing large EO datasets in an open and scalable manner.\n", - "\n", - "The code example below demonstrates how to:\n", - "\n", - "- Load a local Zarr dataset using `xarray`\n", - "- Configure an authenticated S3 data store through `xcube`\n", - "- Write the dataset into the specified bucket under a desired object key\n", - "\n", - "By using `xcube.core.store.new_data_store`, the upload process abstracts away the S3 APIs.\n" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "720dfba8-e392-48ce-945c-f9dc479de8e0", - "metadata": {}, - "outputs": [], - "source": [ - "# For reference only \n", - "\n", - "\n", - "# uploaded at: https://pangeo-test-fires.s3.eu-west-2.amazonaws.com/dnbr_dataset.zarr/.zattrs\n", - "\n", - "# import xarray as xr\n", - "# from xcube.core.store import new_data_store\n", - "\n", - "# # store data on s3\n", - "# root=\"pangeo-test-fires\"\n", - "\n", - "# # Path to the local Zarr dataset\n", - "# zarr_path = \"../../../wildfires/dnbr_dataset.zarr\"\n", - "\n", - "# # Open the Zarr dataset\n", - "# ds = xr.open_zarr(zarr_path)\n", - "\n", - "# ds\n", - "\n", - "# store = new_data_store(\n", - "# \"s3\",\n", - "# root=root,\n", - "# storage_options={\n", - "# \"anon\": False,\n", - "# \"key\": os.environ.get(\"S3_USER_STORAGE_KEY\"),\n", - "# \"secret\": os.environ.get(\"S3_USER_STORAGE_SECRET\"),\n", - "# \"client_kwargs\": {\n", - "# \"endpoint_url\": \"https://s3.eu-west-2.amazonaws.com\",\n", - "# \"region_name\": os.environ.get(\"AWS_DEFAULT_REGION\")\n", - "# }\n", - "# },\n", - " \n", - "# )\n", - "# store.write_data(ds, \"dnbr_dataset.zarr\", replace=True)" - ] - }, - { - "cell_type": "markdown", - "id": "d236af3a", - "metadata": {}, - "source": [ - "## Keeping Your Data Open via ESA Projects Results Repository\n", - "\n", - "For the above dataset the storage footprint is small and it will not be operationally used other than for this tutorials - but when hosting bigger datasets one needs to consider that there are costs involved.\n", - "\n", - "The **EarthCODE Projects Results Repository** offers a powerful, low-friction solution for sharing and preserving the outputs of your ESA-funded Earth observation projects. Instead of worrying about cloud infrastructure, data storage, or long-term access, you can rely on a professionally maintained, FAIR-aligned repository that ensures your results are accessible, reusable, and citable.\n", - "\n", - "### Key Benefits\n", - "\n", - "- **No infrastructure overhead**: You don’t need to host or maintain storage — we take care of it.\n", - "- **Long-term accessibility**: Results are stored and served from ESA-managed infrastructure, ensuring persistence and reliability.\n", - "- **Open science ready**: Your datasets are made publicly accessible in cloud-native formats (e.g., Zarr, STAC), supporting downstream use in notebooks, APIs, and platforms like `deep-code`.\n", - "- **FAIR-compliant**: All submissions are curated to meet Findable, Accessible, Interoperable, and Reusable standards.\n", - "- **DOI assignment**: We help you publish your results with globally recognized identifiers to support citation and traceability.\n", - "\n", - "### How to Contribute\n", - "\n", - "You can also choose to use the **ESA Projects Results Repository** — maintained by ESA — to store your project outcomes. The EarthCODE team will **fully support** you in doing this.\n", - "\n", - "If you would like to store your results and publish them through EarthCODE, simply get in touch with us at:\n", - "\n", - "📧 **[earth-code@esa.int](mailto:earth-code@esa.int)**\n", - "\n", - "We’re here to help make your data discoverable, reusable, and impactful.\n", - "\n", - "---\n", - "\n", - "### Looking Ahead\n", - "\n", - "In the near future, tools such as **`deep-code`** will include built-in support for uploading and registering your results as part of the publishing workflow — making it even easier to share your scientific contributions with the community.\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "fe46db24-b8d6-4570-8938-b1c7083a6c7c", - "metadata": {}, - "outputs": [], - "source": [ - "os.chdir(notebook_dir)" - ] - }, - { - "cell_type": "markdown", - "id": "d9a4863a-4c89-4867-991a-ef941f659551", - "metadata": {}, - "source": [ - "# Using `deep-code`\n", - "\n", - "Great — we’ve uploaded the data and made it publicly accessible. Now, to use `deep-code`, the final step is to define a few simple metadata entries in a YAML file — one for your **dataset** (the *product*) and another for your **code** (the *workflow*).\n", - "\n", - "These metadata files allow `deep-code` to automatically generate STAC Items that follow the [EarthCODE Open Science Catalog (OSC) convention](https://github.com/stac-extensions/osc), and submit a pull request to register them in the **Open Science Catalog**.\n", - "\n", - "### For Datasets (Products)\n", - "\n", - "When defining your dataset metadata, you'll provide key fields that describe **what** the dataset contains, **where** it is stored, and **how** it aligns with the Open Science Catalog.\n", - "\n", - "Here’s a breakdown of the required fields:\n", - "\n", - "```\n", - "dataset_id: The name of the dataset object within your S3 bucket\n", - "collection_id: A unique identifier for the dataset collection\n", - "osc_themes: [wildfires] Open Science theme (choose from https://opensciencedata.esa.int/themes/catalog)\n", - "documentation_link: Link to relevant documentation, publication, or handbook\n", - "access_link: Public S3 URL to the dataset\n", - "dataset_status: Status of the dataset: 'ongoing', 'completed', or 'planned'\n", - "osc_region: Geographical coverage, e.g. 'global'\n", - "cf_parameter: The main geophysical variable, ideally matching a CF standard name or OSC variable\n", - "```\n", - "\n", - "#### Notes\n", - "\n", - "- **`osc_themes`** must match one of the themes listed at: \n", - " [https://opensciencedata.esa.int/themes/catalog](https://opensciencedata.esa.int/themes/catalog)\n", - "\n", - "- **`cf_parameter`** should reference a well-established variable name, ideally from the Open Science Catalog or CF conventions. \n", - " You can explore examples by searching the EarthCODE metadata repository: \n", - " [Search for \"burned-area\" in EarthCODE metadata](https://github.com/search?q=repo%3AESA-EarthCODE%2Fopen-science-catalog-metadata+burned-area&type=code) or directly on the osc https://opensciencedata.esa.int/variables/catalog\n", - "\n", - "\n", - "\n", - "### For Workflows\n", - "\n", - "```\n", - "workflow_id: A unique identifier for your workflow\n", - "properties:\n", - " title: Human-readable title of the workflow\n", - " description: A concise summary of what the workflow does\n", - " keywords: Relevant scientific or technical keywords\n", - " themes: Thematic area(s) of focus (e.g. land, ocean, atmosphere) - see from above example\n", - " license: License type (e.g. MIT, Apache-2.0, CC-BY-4.0, proprietary)\n", - " jupyter_kernel_info:\n", - " name: Name of the execution environment or notebook kernel\n", - " python_version: Python version used\n", - " env_file: Link to the environment file (YAML) used to create the notebook environment\n", - "jupyter_notebook_url: Link to the source notebook (e.g. on GitHub)\n", - "contact:\n", - " name: Contact person's full name\n", - " organization: Affiliated institution or company\n", - " links:\n", - " rel: \"about\"\n", - " type: \"text/html\"\n", - " href: Link to homepage or personal/institutional profile\n", - "```\n", - "\n", - "\n", - "See the examples below:\n", - "\n", - "\n", - "\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "5726acb7-4c9a-4143-89be-d991ede967f1", - "metadata": {}, - "outputs": [], - "source": [ - "dataset_config=\"\"\"\n", - "dataset_id: dnbr_dataset.zarr\n", - "collection_id: pangeo-test\n", - "osc_themes:\n", - "- 'land'\n", - "documentation_link: https://www.sciencedirect.com/science/article/pii/S1470160X22004708#f0035\n", - "access_link: s3://pangeo-test-fires\n", - "dataset_status: completed\n", - "osc_region: global\n", - "cf_parameter:\n", - " - name: burned-area\n", - "\"\"\"\n", - "\n", - "with open(\"dataset_config.yaml\", 'w') as f:\n", - " f.write(dataset_config)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "560f6af4-4a19-4721-be3b-0dc4c1bec64d", - "metadata": {}, - "outputs": [], - "source": [ - "workflow_config=\"\"\"\n", - "workflow_id: \"dnbr_workflow_example\"\n", - "properties:\n", - " title: \"DNBR Workflow Example\"\n", - " description: \"Demonstrate how to fetch satellite Sentinel-2 data to generate burn severity maps for the assessment of the areas affected by wildfires.\"\n", - " keywords:\n", - " - Earth Science\n", - " themes:\n", - " - land\n", - " license: proprietary\n", - " jupyter_kernel_info:\n", - " name: Pange-Test-Notebook\n", - " python_version: 3.11\n", - " env_file: \"https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml\"\n", - "jupyter_notebook_url: \"https://github.com/pangeo-data/pangeo-openeo-BiDS-2023/blob/main/tutorial/examples/dask/wildfires_daskgateway.ipynb\"\n", - "contact:\n", - " - name: Dean Summers\n", - " organization: Lampata\n", - " links:\n", - " - rel: \"about\"\n", - " type: \"text/html\"\n", - " href: \"https://www.lampata.eu/\"\n", - "\"\"\"\n", - "\n", - "with open(\"workflow_config.yaml\", 'w') as f:\n", - " f.write(workflow_config)" - ] - }, - { - "cell_type": "markdown", - "id": "7ea49a62-0ba0-4836-b578-b0d8af624203", - "metadata": {}, - "source": [ - "> **Note**: Before `deep-code` can submit metadata to the Open Science Catalog via Git, you may need to configure your Git identity in the environment where you're running it:\n", - "\n", - "```bash\n", - "git config --global user.email \"your-email@example.com\"\n", - "git config --global user.name \"Your Name\"\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "65a4330c-a71d-4110-83f0-f8370031b44a", - "metadata": {}, - "outputs": [], - "source": [ - "!git config --global user.email \"dean@lampata.co.uk\"\n", - "!git config --global user.name \"Dean S\"" - ] - }, - { - "cell_type": "markdown", - "id": "ad264d6c-3077-430a-a0af-c36fb8973156", - "metadata": {}, - "source": [ - "## Calling deep-code\n", - "\n", - "For this tutorial in the EDC environment we'll directly call the deepcode publish function via the library code to make sure this code is easily reproducible (as deep-code is currently evolving and changing rapidly with more users publishing to EarthCODE!)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "88fb3bbe-293d-4286-8e98-45dc6bdbac5f", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "S3_USER_STORAGE_BUCKET=pangeo-test-fires\n" - ] - } - ], - "source": [ - "!printenv | grep S3_USER_STORAGE_BUCKET" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "a694fcf4-75b8-4def-b9b6-c49ce3d02f5c", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:root:Forking repository...\n", - "INFO:root:Repository forked to sunnydean/open-science-catalog-metadata-staging\n", - "INFO:root:Checking local repository...\n", - "INFO:root:Cloning forked repository...\n", - "Cloning into '/home/sunnydean/temp_repo'...\n", - "Updating files: 100% (1776/1776), done.\n", - "INFO:root:Repository cloned to /home/sunnydean/temp_repo\n", - "INFO:deep_code.tools.publish:Generating STAC collection...\n", - "INFO:deep_code.utils.dataset_stac_generator:Attempting to open dataset 'dnbr_dataset.zarr' with configuration: Public store\n", - "INFO:httpx:HTTP Request: GET https://raw.githubusercontent.com/IrishMarineInstitute/awesome-erddap/master/erddaps.json \"HTTP/1.1 200 OK\"\n", - "INFO:deep_code.utils.dataset_stac_generator:Successfully opened dataset 'dnbr_dataset.zarr' with configuration: Public store\n", - "INFO:deep_code.tools.publish:Variable catalog for burned-ha-mask does not exist. Creating...\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Enter GCMD keyword URL or a similar url for burned-ha-mask: https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth%20Science/436b098d-e4d9-4fbd-9ede-05675e111eee?gtm_keyword=BURNED%20AREA>m_scheme=Earth%20Science\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:deep_code.utils.dataset_stac_generator:Added GCMD link for burned-ha-mask catalog https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth%20Science/436b098d-e4d9-4fbd-9ede-05675e111eee?gtm_keyword=BURNED%20AREA>m_scheme=Earth%20Science.\n", - "INFO:deep_code.tools.publish:Variable catalog for delta-nbr does not exist. Creating...\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Enter GCMD keyword URL or a similar url for delta-nbr: https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth%20Science/436b098d-e4d9-4fbd-9ede-05675e111eee?gtm_keyword=BURNED%20AREA>m_scheme=Earth%20Science\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:deep_code.utils.dataset_stac_generator:Added GCMD link for delta-nbr catalog https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth%20Science/436b098d-e4d9-4fbd-9ede-05675e111eee?gtm_keyword=BURNED%20AREA>m_scheme=Earth%20Science.\n", - "INFO:deep_code.tools.publish:Generating OGC API Record for the workflow...\n", - "INFO:root:Creating new branch: add-new-collection-pangeo-test-20250622220728...\n", - "Switched to a new branch 'add-new-collection-pangeo-test-20250622220728'\n", - "INFO:deep_code.tools.publish:Adding products/pangeo-test/collection.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: products/pangeo-test/collection.json...\n", - "INFO:deep_code.tools.publish:Adding variables/burned-ha-mask/catalog.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: variables/burned-ha-mask/catalog.json...\n", - "INFO:deep_code.tools.publish:Adding variables/delta-nbr/catalog.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: variables/delta-nbr/catalog.json...\n", - "INFO:deep_code.tools.publish:Adding /home/sunnydean/temp_repo/variables/catalog.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: /home/sunnydean/temp_repo/variables/catalog.json...\n", - "INFO:deep_code.tools.publish:Adding /home/sunnydean/temp_repo/products/catalog.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: /home/sunnydean/temp_repo/products/catalog.json...\n", - "INFO:deep_code.tools.publish:Adding /home/sunnydean/temp_repo/projects/deep-earth-system-data-lab/collection.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: /home/sunnydean/temp_repo/projects/deep-earth-system-data-lab/collection.json...\n", - "INFO:deep_code.tools.publish:Adding workflows/dnbr_workflow_example/record.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: workflows/dnbr_workflow_example/record.json...\n", - "INFO:deep_code.tools.publish:Adding experiments/dnbr_workflow_example/record.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: experiments/dnbr_workflow_example/record.json...\n", - "INFO:deep_code.tools.publish:Adding experiments/catalog.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: experiments/catalog.json...\n", - "INFO:deep_code.tools.publish:Adding workflows/catalog.json to add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Adding new file: workflows/catalog.json...\n", - "INFO:root:Committing and pushing changes...\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[add-new-collection-pangeo-test-20250622220728 f397b6df] Add new dataset collection: pangeo-test and workflow/experiment: dnbr_workflow_example\n", - " 10 files changed, 491 insertions(+), 8 deletions(-)\n", - " create mode 100644 experiments/dnbr_workflow_example/record.json\n", - " create mode 100644 products/pangeo-test/collection.json\n", - " create mode 100644 variables/burned-ha-mask/catalog.json\n", - " create mode 100644 variables/delta-nbr/catalog.json\n", - " create mode 100644 workflows/dnbr_workflow_example/record.json\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "remote: \n", - "remote: Create a pull request for 'add-new-collection-pangeo-test-20250622220728' on GitHub by visiting: \n", - "remote: https://github.com/sunnydean/open-science-catalog-metadata-staging/pull/new/add-new-collection-pangeo-test-20250622220728 \n", - "remote: \n", - "To https://github.com/sunnydean/open-science-catalog-metadata-staging.git\n", - " * [new branch] add-new-collection-pangeo-test-20250622220728 -> add-new-collection-pangeo-test-20250622220728\n", - "INFO:root:Creating a pull request...\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "branch 'add-new-collection-pangeo-test-20250622220728' set up to track 'origin/add-new-collection-pangeo-test-20250622220728'.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:root:Pull request created: https://github.com/ESA-EarthCODE/open-science-catalog-metadata-staging/pull/129\n", - "INFO:deep_code.tools.publish:Pull request created: None\n", - "INFO:root:Cleaning up local repository...\n", - "INFO:deep_code.tools.publish:Pull request created: None\n" - ] - } - ], - "source": [ - "publisher = Publisher(\n", - " dataset_config_path=\"dataset_config.yaml\",\n", - " workflow_config_path=\"workflow_config.yaml\",\n", - " environment=\"staging\",\n", - ")\n", - "publisher.publish_all()\n", - "# gdm variable: \n", - "# https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth%20Science/436b098d-e4d9-4fbd-9ede-05675e111eee?gtm_keyword=BURNED%20AREA>m_scheme=Earth%20Science" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "e9b47068-44ad-4ca2-ace7-581772e4a5ad", - "metadata": {}, - "source": [ - "## Reviewing Your Submission\n", - "\n", - "Once `deep-code` completes the submission, it automatically opens a pull request in the EarthCODE Open Science Catalog staging repository. You can:\n", - "\n", - "- **Check the actual pull request generated by `deep-code`**: \n", - " e.g. [https://github.com/ESA-EarthCODE/open-science-catalog-metadata-staging/pull/112/files](https://github.com/ESA-EarthCODE/open-science-catalog-metadata-staging/pull/112/files) \n", - " This allows you to inspect exactly what metadata files were created — saving you the time and effort of writing and formatting them manually.\n", - "\n", - "- **Preview and edit your submission in the EarthCODE Staging Dashboard**: \n", - " [https://dashboard.earthcode-staging.earthcode.eox.at/](https://dashboard.earthcode-staging.earthcode.eox.at/) \n", - " This UI provides an intuitive way to browse, validate, and refine your submission before it's merged into the main Open Science Catalog.\n", - "\n", - "![stagingenv.png](../static/stagingenv.png)\n", - " \n", - "\n", - "Together, these tools streamline the publishing workflow and help ensure your data and workflows are cleanly documented and catalogued.\n" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "9f833a7c-7b11-402e-b8a7-d98c3cb62e70", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/home/sunnydean/LPS25_Pangeo_x_EarthCODE_Workshop/publishing-to-earthcode/deep-code/deep-code\n" - ] - } - ], - "source": [ - "# Change back to working directory\n", - "os.chdir(notebook_dir)\n", - "!pwd\n", - "!rm -rf /home/sunnydean/temp_repo/" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b032c552-daf6-4985-98d8-fffca7e462fb", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "earthcode-bids-earthcode-bids-edc_pangeo", - "language": "python", - "name": "conda-env-earthcode-bids-earthcode-bids-edc_pangeo-py" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.13.3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/pangeo/publishing-to-earthcode/publishing.md b/pangeo/publishing-to-earthcode/publishing.md new file mode 100644 index 0000000..12e5f9a --- /dev/null +++ b/pangeo/publishing-to-earthcode/publishing.md @@ -0,0 +1,3 @@ +# Publishing to EarthCODE + +To see all data and Open Science Catalog publishing options, go to the homepage - https://esa-earthcode.github.io/tutorials/ . \ No newline at end of file From d33bf86f02b77e3ebb252bb07c521d88c9223e47 Mon Sep 17 00:00:00 2001 From: Krasen Samardzhiev Date: Wed, 15 Oct 2025 10:00:27 +0100 Subject: [PATCH 2/2] changes --- myst.yml | 2 +- pangeo/burnt-area-example/pangeo_on_EarthCODE.ipynb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/myst.yml b/myst.yml index 231cc0f..dbcd6e9 100644 --- a/myst.yml +++ b/myst.yml @@ -55,7 +55,7 @@ project: - file: pangeo/pangeo101/cloud-native-formats-101.ipynb - file: pangeo/pangeo101/dask101.ipynb - file: pangeo/burnt-area-example/pangeo_on_EarthCODE.ipynb - - file: pangeo/publishing-to-earthcode/publish.md + - file: pangeo/publishing-to-earthcode/publishing.md # plugins: diff --git a/pangeo/burnt-area-example/pangeo_on_EarthCODE.ipynb b/pangeo/burnt-area-example/pangeo_on_EarthCODE.ipynb index 2d8fcb7..2ac58b8 100644 --- a/pangeo/burnt-area-example/pangeo_on_EarthCODE.ipynb +++ b/pangeo/burnt-area-example/pangeo_on_EarthCODE.ipynb @@ -11081,7 +11081,7 @@ "\n", "Now let's save our work! For intraoperability, we will save our data as a valid data cube in Zarr.\n", "\n", - "Keep in mind that EarthCODE provides different tooling that makes it easy to publish our data to the wider EO community on the EarthCODE Open Science Catalog (such as deep-code for publishing data cubes, we will see in the [**publishing guide**](../publishing-to-earthcode/deep-code/deep-code/publish-pangeo.ipynb)) by following common standards and using common file formats we ensure that there will be a tool to help us!\n", + "Keep in mind that EarthCODE provides different tooling that makes it easy to publish our data to the wider EO community on the EarthCODE Open Science Catalog (such as deep-code for publishing data cubes, we will see in the [**publishing guide**](https://esa-earthcode.github.io/tutorials/)) by following common standards and using common file formats we ensure that there will be a tool to help us!\n", "\n", "\n", "## Linting\n",