Skip to content

Implement GMPL → MUIO conversion pipeline (ClickSAND / UTOPIA to OSeMOSYS UI JSON)#108

Closed
parthdagia05 wants to merge 1 commit intoEAPD-DRB:mainfrom
parthdagia05:pr-17
Closed

Implement GMPL → MUIO conversion pipeline (ClickSAND / UTOPIA to OSeMOSYS UI JSON)#108
parthdagia05 wants to merge 1 commit intoEAPD-DRB:mainfrom
parthdagia05:pr-17

Conversation

@parthdagia05
Copy link
Copy Markdown

Feature: OSeMOSYS GMPL to MUIO JSON Transformation Pipeline ⚙️

Summary

This PR implements a complete, structured pipeline for transforming standard OSeMOSYS GMPL models (e.g., ClickSAND / UTOPIA format) into MUIO-compatible JSON case structures. It automates the conversion of standard OSeMOSYS .dat files into the structured JSON format required by the OSeMOSYS UI and Cloud platforms.

The implementation follows a modular 3-layer architecture to ensure clean separation between syntax parsing, semantic expansion, and schema transformation.


Motivation

Currently, two primary data presentation formats exist within the ecosystem:

  1. Standard OSeMOSYS GMPL: Used by ClickSAND and starter data kits.
  2. MUIO / OSeMOSYS Cloud: ZIP archives containing structured JSON records.

This PR bridges the gap by introducing a Python-based transformation engine capable of parsing standard GMPL, interpreting wildcard slice semantics, and generating exact MUIO JSON schemas.


Architecture

The pipeline flows through three distinct phases:

GMPL (.dat / .txt)Phase 1: GMPLParserPhase 2: SliceInterpreterPhase 3: MuioTransformerMUIO JSON

Phase 1 — GMPLParser (Syntax Layer)

Focuses on pure structural extraction of GMPL content without applying transformation logic.

  • Capabilities: Handles set definitions, multi-slice parameters, headerless tables, and mixed whitespace.
  • Robustness: Manages := placement variations, comment stripping (#), and end; terminations.
  • Dialects: Supports both Dense (UTOPIA-style) and Sparse (MUIO-style) formatting.

Phase 2 — SliceInterpreter (Semantic Layer)

Expands GMPL slice notation into normalized tuples.

  • Expansion: Converts wildcard slices (e.g., [REGION,*,FUEL,MODE,*]) into normalized mapping: (region, tech, fuel, mode, year) → value.
  • Features: Numeric coercion to float, default-value filtering, and sparse tuple representation.
  • Registry: Uses an explicit dimension registry to avoid length inference errors.

Phase 3 — MuioTransformer (Application Layer)

Transforms normalized tuples into the final MUIO JSON schema.

  • Mapping: Renames FUEL to COMMODITY and injects MUIO-specific sets (STORAGEINTRADAY, UDC, etc.).
  • Determinism: Generates deterministic IDs (e.g., TECHNOLOGYT_0, COMMODITYC_0).
  • Metadata: Generates genData.json and creates the scenario envelope (SC_0).

Validation & Testing

Comprehensive validation scripts are included to ensure transformation accuracy:

  • validate_parser.py | validate_interpreter.py | validate_transformer.py

Testing Fixtures:

  • UTOPIA: Verified against the dense canonical OSeMOSYS model.
  • MUIO-Sparse: Verified against sparse GMPL samples.

Assertions Covered: Correct slice expansion, ID determinism, set injection, and JSON record shape completeness. All checks pass.


Impact & Compatibility

  • Backward Compatible: No changes to existing solver execution or case workflows.
  • Purely Additive: Functions as a standalone engine for interoperability.
  • Integration Ready: Provides the foundation for future CLI wrappers or automated cloud upload tooling.

Future Extensions (Optional)

While the core engine is complete, a small wrapper can be added to:

  1. Write JSON files directly to disk.
  2. Package output into a standardized ZIP case archive.

Feedback on architecture, mapping completeness, or integration strategy is welcome.

Implements 3-layer architecture:
- GMPLParser (syntax extraction)
- SliceInterpreter (wildcard expansion to tuples)
- MuioTransformer (tuple → MUIO JSON schema)

Includes validation scripts and UTOPIA/MUIO fixtures.
@NamanmeetSingh
Copy link
Copy Markdown

@parthdagia05 Incredible work on this pipeline. 3,000 lines is a massive undertaking, and breaking it down into a strict 3-layer architecture (Parser -> Interpreter -> Transformer) is exactly the right move for maintainability.

Parsing GMPL wildcard slices ([REGION,*,FUEL,MODE,*]) is notoriously tricky, so separating the semantic expansion into its own SliceInterpreter phase is a brilliant design choice to prevent dimension length errors.

From the Track 1 (OG-CLEWS integration) perspective, this is exactly the kind of robust ingestion foundation the ecosystem needs. While my focus in PR #24 is on the execution side—translating the solver outputs into macroeconomic time-series vectors and managing the stateful converging loop—we rely heavily on the initial case data being structurally perfect.

Once the OG-Core integration matures, users will likely need to upload custom .dat files that include new macroeconomic parameters (like AnnualFixedOperatingCost or Trade profiles). Having your MuioTransformer layer in place means we have a clean, deterministic place to register those new dimensions into the JSON schema before the solver even spins up.

I will be keeping a close eye on this as it merges so we can ensure the ingestion schemas perfectly align with the convergence pipelines downstream.

@parthdagia05
Copy link
Copy Markdown
Author

@NamanmeetSingh Sorry for the late reply, and thanks a lot for the detailed feedback!
This pipeline is exactly trying to solve the “clean ingestion” problem you described: the GMPLParser just does structural parsing of the .dat files, the SliceInterpreter safely expands wildcard slices like [REGION,,FUEL,MODE,] into normalized tuples, and the MuioTransformer turns those into deterministic MUIO JSON (including things like FUEL → COMMODITY, injected sets, and stable IDs).
Your description of Track 1 in #24 really helped me see how this fits into the OG-CLEWS loop, especially the need for structurally perfect case data and room for future parameters like AnnualFixedOperatingCost or trade profiles. I’m happy to adjust the transformer/registry if you spot any schema mismatches once your convergence pipeline is further along.

@parthdagia05
Copy link
Copy Markdown
Author

@SeaCelo this PR is ready for a review.

@parthdagia05
Copy link
Copy Markdown
Author

Closing this to reduce noise. The scope here is broader than what fits the current narrow-fix pattern happy to break it down into smaller issue-linked PRs if any sub-pieces are still useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants