Scheduled carbon factor ingestion and parsing reference project with Python and .NET implementation options.
CarbonOps-Parser is a standalone public technical project for scheduled carbon factor source ingestion and parsing. It checks selected public emission factor sources, detects source version or hash changes, archives raw source files, parses source-specific structures, validates parsed records, and stores ingestion metadata and source-specific records in PostgreSQL.
The project is independent from carbonops-assistant. It is not a continuation, module, plugin, or dependency of that project.
CarbonOps-Parser is in early Phase 1. The repository currently emphasizes project documentation, architecture, schema contract notes, source support planning, and public contribution structure before parser implementation begins.
Implementation work is planned for two independent paths:
- Python in
src/python - .NET in
src/dotnet
Users who clone or fork the repository should be able to choose either implementation path.
Phase 1 focuses on scheduled ingestion and parsing for:
- GHG Protocol
- DEFRA/DESNZ
- IPCC EFDB
The intended Phase 1 workflow is:
- Read configuration.
- Validate the database provider.
- Connect to PostgreSQL.
- Check whether required tables exist.
- Create missing tables if needed.
- Initialize source schedules.
- Check source version and file hash.
- Download a source document when a new version or hash is detected.
- Archive the raw source file.
- Parse source-specific structures.
- Validate parsed records.
- Persist shared ingestion metadata and source-specific records.
- Store import summaries and validation issues.
source schedule
-> version/hash check
-> download when changed
-> raw file archive
-> source-specific parser
-> validation
-> PostgreSQL persistence
-> import summary and validation issues
Phase 1 uses shared ingestion metadata tables plus source-specific master/detail tables. It does not force GHG Protocol, DEFRA/DESNZ, and IPCC EFDB into one canonical factor table. A normalized or search-oriented projection may be considered in a later phase.
The Python implementation is planned first because it is practical for source discovery, spreadsheet inspection, parser mapping, validation, and data engineering workflows.
The initial Python source adapter contracts and in-memory registry live under src/carbonfactor_parser/source_adapters.
See src/python/README.md.
The .NET implementation is planned as an independent Worker Service path that follows the same conceptual workflow with .NET-oriented application structure.
See src/dotnet/README.md.
From a fresh checkout or local working copy:
git clone <REPOSITORY_URL> CarbonOps-Parser
cd CarbonOps-Parser
python -m pip install -e .Run the test suite if you want a quick local smoke check:
python -m pytestRun the checked-in DEFRA/DESNZ fixture through the local dry-run CLI:
carbonops-parser local-dry-run \
--local-path examples/fixtures/defra_desnz_minimal.csv \
--source-family defra_desnz \
--source-id defra-desnz-minimal-fixture \
--content-type text/csv \
--format-hint csvExpected summary:
status=success
parsed_record_count=2
normalization_record_count=2
persistence_input_record_count=2
ddl_preview_present=True
issue_count=0
Run the JSON variant:
carbonops-parser local-dry-run \
--local-path examples/fixtures/defra_desnz_minimal.csv \
--source-family defra_desnz \
--source-id defra-desnz-minimal-fixture \
--content-type text/csv \
--format-hint csv \
--jsonKey output fields:
status: dry-run outcome such assuccess,failed,unsupported, orno_recordsparsed_record_count: records parsed by the minimal local DEFRA/DESNZ fixture parsernormalization_record_count: records produced by the minimal fixture normalization mapperpersistence_input_record_count: records prepared asPersistenceInputddl_preview_present: whether review-only PostgreSQL DDL preview text is attachedissues: structured local loader, parser, normalization, or persistence-input issues
Optionally include PostgreSQL insert preview data in text output:
carbonops-parser local-dry-run \
--local-path examples/fixtures/defra_desnz_minimal.csv \
--source-family defra_desnz \
--source-id defra-desnz-minimal-fixture \
--content-type text/csv \
--format-hint csv \
--include-postgresql-previewTrimmed expected preview lines:
postgresql_preview_included=True
postgresql_preview_status=ready
postgresql_preview_only=True
postgresql_preview_sql_execution=False
postgresql_preview_database_connection=False
postgresql_preview_target_table=normalized_records
postgresql_preview_record_count=2
postgresql_preview_sql=INSERT INTO normalized_records (source_family, source_id, record_id, record_index, row_number, normalized_fields, source_reference, source_artifact_reference, source_checksum_sha256, parser_metadata, normalization_metadata, created_at, updated_at) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
postgresql_preview_issue_count=0
Run the JSON PostgreSQL preview variant:
carbonops-parser local-dry-run \
--local-path examples/fixtures/defra_desnz_minimal.csv \
--source-family defra_desnz \
--source-id defra-desnz-minimal-fixture \
--content-type text/csv \
--format-hint csv \
--json \
--include-postgresql-previewTrimmed expected JSON preview section:
{
"postgresql_persistence_preview": {
"included": true,
"preview_only": true,
"sql_execution": false,
"database_connection": false,
"status": "ready",
"target_table": "normalized_records",
"record_count": 2,
"ordered_columns": [
"source_family",
"source_id",
"record_id",
"record_index",
"row_number",
"normalized_fields",
"source_reference",
"source_artifact_reference",
"source_checksum_sha256",
"parser_metadata",
"normalization_metadata",
"created_at",
"updated_at"
],
"idempotency_key_fields": [
"source_family",
"source_id",
"record_id",
"source_artifact_reference",
"source_checksum_sha256"
],
"issues": []
}
}The postgresql_persistence_preview section is preview-only. It includes the
target table, ordered columns, parameter rows, record count, SQL text with
placeholders, and idempotency metadata, but it does not execute SQL or persist
records. No PostgreSQL server, database configuration, or credentials are
required.
This quickstart is local dry-run only. It does not connect to PostgreSQL, write records, execute SQL, run migrations, perform network calls, trigger source acquisition, load config files, or require credentials. It does not make production DEFRA/DESNZ correctness claims.
For boundary details, see Local Dry-Run CLI Boundary, Local File Normalized Persistence Dry-Run Boundary, PostgreSQL Persistence Preview Boundary, and Local Dry-Run Troubleshooting.
Run the lightweight Python test suite from the repository root:
python -m pytestPytest configuration is kept in pyproject.toml, including the src package import path used by the tests.
The carbonfactor_parser.source_adapters package exposes source adapter contracts and lightweight helpers for tests, prototypes, and implementation slices.
Hash source content without reading or downloading files:
from carbonfactor_parser.source_adapters import (
sha256_hex_from_bytes,
sha256_hex_from_text,
)
content_hash = sha256_hex_from_bytes(b"sample source content")
note_hash = sha256_hex_from_text("sample metadata note")Create and validate metadata for an existing local file:
from pathlib import Path
from carbonfactor_parser.source_adapters import (
SourceFamily,
build_source_document_from_file,
validate_source_document_metadata,
)
document = build_source_document_from_file(
source_family=SourceFamily.DEFRA_DESNZ,
source_name="Example local factor file",
file_path=Path("data/raw/example/source.csv"),
)
metadata_issues = validate_source_document_metadata(document)Create and validate an ingestion summary contract:
from carbonfactor_parser.source_adapters import (
SourceFamily,
create_ingestion_run_summary,
validate_ingestion_run_summary,
)
summary = create_ingestion_run_summary(
ingestion_id="example-run-001",
source_family=SourceFamily.DEFRA_DESNZ,
source_name="Example local factor file",
)
summary_issues = validate_ingestion_run_summary(summary)Use the artificial-only source acquisition validation pipeline with in-memory metadata:
from carbonfactor_parser import (
create_artificial_source_acquisition_metadata,
validate_and_summarize_artificial_source_acquisition_metadata,
)
metadata = create_artificial_source_acquisition_metadata(
source_family="artificial_source_acquisition",
logical_source_name="artificial-in-memory-source",
declared_content_type="text/csv",
checksum_sha256="a" * 64,
acquired_at_label="static-artificial-acquisition-label",
)
pipeline_result = validate_and_summarize_artificial_source_acquisition_metadata(
metadata,
)
issue_count = pipeline_result.summary.total_issue_countThis pipeline is limited to artificial metadata shape checks and deterministic summaries. It does not acquire real sources, read files, validate real source URLs, run parsers or normalization, check factor correctness, or provide compliance/legal or carbon accounting correctness. See docs/artificial-source-acquisition-validation-pipeline.md, docs/artificial-source-acquisition-module-recap.md, and examples/example_artificial_source_acquisition_validation_pipeline.py.
Use the carbonops-source-acquisition CLI for local source descriptor checks and acquisition flow previews.
- Default
runmode isnoopand offline. - HTTP mode is opt-in with
--client http. validatechecks local descriptor metadata only; it does not verify live URLs.run --dry-runplans targets only and does not acquire content or write files/manifests.- Parser execution and database persistence are outside this CLI boundary at this phase.
carbonops-source-acquisition validate
carbonops-source-acquisition list
carbonops-source-acquisition list --source-id defra_desnz
carbonops-source-acquisition run --dry-run --base-directory ./data/source-acquisition
carbonops-source-acquisition run --output-format json
carbonops-source-acquisition run --client http --source-id ghg_protocol
carbonops-source-acquisition run --client http --source-id ghg_protocol --persist-content --base-directory ./data/source-acquisitionFor boundary details, see:
- Source Acquisition CLI Boundary
- Source Acquisition Registry
- Source Acquisition HTTP Client Boundary
- Source Acquisition Parser Handoff Contract
See examples/example_acquisition_artifact_parser_input_mapping.py for a deterministic in-memory example of mapping acquisition artifact metadata into a future parser input boundary without executing a parser.
The parser package exposes ParserInputContract, create_parser_input_contract(), validate_parser_input_contract(), ParserFileContentInput, local parser file content loading helpers, parser file content validation helpers, parse_defra_desnz_file_content(), raw parsed record payload contracts, the ParserAdapter protocol, NoopParserAdapter, ArtificialParserAdapter, DefraDesnzParserAdapter, parser adapter registry helpers, parser execution planning and runner helpers, and parser execution result contracts for future parser adapter input handoff. The normalization package exposes parser execution handoff helpers, normalization input helpers for successful parser results with raw payloads, and a minimal DEFRA/DESNZ fixture normalization mapper. The persistence package exposes normalized result persistence input contracts, a logical PostgreSQL schema descriptor, a review-only DDL preview helper, a deterministic insert SQL builder, PostgreSQL persistence preview helpers, repository protocol/result contracts, an explicit caller-provided PostgreSQL options contract, a default-disabled PostgreSQL integration test boundary, and a PostgreSQL repository skeleton that returns unsupported results without database runtime behavior. The pipeline package exposes a local DEFRA/DESNZ fixture dry-run helper that composes those boundaries to produce PersistenceInput plus DDL preview metadata without DB or network behavior. These contracts keep acquisition metadata, already-loaded content, raw parser output, parser output metadata, normalization input, normalization handoff metadata, persistence input metadata, schema metadata, repository options metadata, integration test metadata, preview metadata, and repository result metadata separate; they do not include database connection behavior or full source-specific correctness claims.
Each Phase 1 source family will have its own schedule, source version/hash check, parser, validation rules, archive layout, and source-specific tables.
| Source family | Phase 1 role | Table group |
|---|---|---|
| GHG Protocol | Source-specific parser and workbook/tool mapping | ghg_* |
| DEFRA/DESNZ | First planned ingestion slice after discovery | defra_* |
| IPCC EFDB | Heterogeneous source discovery and parser mapping | ipcc_* |
See docs/source-support.md and docs/source-discovery.md.
The conceptual configuration model includes:
- Database provider and connection settings.
- Raw archive path.
- Source-specific enabled flags.
- Source-specific schedules with day, week, month, year, time, and timezone support.
Phase 1 implements only postgres as the database provider. mysql and mssql are recognized as conceptual provider names but are not implemented in Phase 1.
See docs/configuration-model.md.
The shared conceptual example lives at config/carbonops.config.example.yaml.
PostgreSQL is the Phase 1 persistence target. The model includes:
- Shared ingestion metadata tables:
carbon_sources,carbon_source_versions,carbon_import_runs,carbon_raw_files,carbon_validation_issues, andcarbon_job_locks. - DEFRA/DESNZ tables:
defra_categories,defra_subcategories,defra_factor_sets, anddefra_factor_values. - GHG Protocol tables:
ghg_tools,ghg_factor_sheets,ghg_factor_groups, andghg_factor_values. - IPCC EFDB tables:
ipcc_sectors,ipcc_categories,ipcc_references,ipcc_factor_records, andipcc_factor_values.
See docs/database-model.md, docs/database-startup.md, and database/postgres/README.md.
- Architecture
- Configuration Model
- Configuration Example
- Background Job Model
- Database Model
- Database Startup
- Ingestion Metadata Model
- Codex-Assisted Runs
- Engineering Standards
- Linux Service Setup
- Source Support
- Source Discovery
- Source Ingestion Boundaries
- Source Acquisition Boundary
- Source Acquisition CLI Boundary
- Source Acquisition Sequencing Checklist
- Local Source Acquisition Contract Boundary
- Local Source Acquisition Examples Boundary
- Local Source Manifest Boundary
- Local Source Manifest Examples Boundary
- Source Manifest Adapter Handoff Boundary
- Source Manifest Adapter Handoff Examples Boundary
- Source Acquisition Validation Boundary
- Source Acquisition Validation Examples Boundary
- Source Acquisition Error Taxonomy Boundary
- Source Acquisition Error Taxonomy Examples Boundary
- Source Acquisition Review Gate Boundary
- Source Acquisition Review Gate Examples Boundary
- Source Acquisition Implementation Readiness Boundary
- Source Acquisition Implementation Readiness Examples Boundary
- Source Acquisition Implementation Sequencing Checklist
- Source Acquisition Implementation Sequencing Examples Boundary
- Source Acquisition Parser Handoff Contract
- Artificial Source Acquisition Validation Pipeline
- Artificial Source Acquisition Module Recap
- Artificial Source Acquisition Phase Closure
- Artificial Manifest Metadata Boundaries
- Artificial Manifest Validation Summary
- Artificial Manifest Metadata Collection
- Artificial Manifest Collection Validation Summary
- Artificial Manifest Metadata Phase Recap
- Artificial Manifest Next Phase Option Matrix
- Artificial In-Memory Manifest Usage Example
- Artificial Manifest Usage Example Phase Recap
- Source Adapter Contract
- Source Adapter Execution Flow
- Source Adapter Error And Warning Handling
- Source Adapter Configuration Boundaries
- Source-Specific Adapter Skeleton Guidance
- DEFRA/DESNZ Adapter Skeleton Boundaries
- Parser Adapter Boundary
- Parser Execution Planning Boundary
- Parser Execution Result Boundary
- Parser Execution Runner Boundary
- Source-Specific Parser Adapter Boundary
- Parser File Content Input Boundary
- Local Parser File Content Loader Boundary
- Parser Execution Normalization Handoff Boundary
- Parsed Raw Record Payload Boundary
- Parser Handoff Boundary
- Parser Contract Boundaries
- Source-Specific Parser Skeleton Boundaries
- DEFRA/DESNZ Parser Skeleton Boundaries
- Real Format Parser Boundary
- Normalization Boundary
- Normalization Input Boundary
- DEFRA/DESNZ Minimal Normalization Mapping Boundary
- Local File Normalized Persistence Dry-Run Boundary
- Local Dry-Run CLI Boundary
- Local Dry-Run Troubleshooting
- Normalized Result Persistence Boundary
- PostgreSQL Persistence Schema Boundary
- PostgreSQL DDL Preview Boundary
- PostgreSQL Insert SQL Builder Boundary
- PostgreSQL Persistence Preview Boundary
- Persistence Repository Boundary
- PostgreSQL Implementation Safety Gate
- PostgreSQL Integration Test Boundary
- PostgreSQL Opt-In Integration Runbook
- PostgreSQL Config Contract Boundary
- PostgreSQL Repository Skeleton Boundary
- PostgreSQL Repository Implementation Planning Boundary
- PostgreSQL Runtime Persistence Implementation Plan
- PostgreSQL Driver Dependency Decision
- PostgreSQL Connection Session Contract Boundary
- PostgreSQL Execution Adapter Boundary
- PostgreSQL Transaction Policy Boundary
- PostgreSQL Idempotency Conflict Strategy Boundary
- PostgreSQL psycopg Session Adapter Boundary
- PostgreSQL Disabled Runtime Execution Adapter Boundary
- PostgreSQL Repository Disabled Execution Preview Boundary
- PostgreSQL Runtime Execution Gate Boundary
- PostgreSQL Runtime Readiness Checklist
- Parser To Normalization Handoff Boundary
- Parser To Normalization Integration Recap
- Source To Normalization Pipeline Recap
- Normalization Execution Boundary
- Normalization Result Summary Boundary
- Normalization Summary Builder Boundary
- Normalization Pipeline Recap
- Normalization Public API Recap
- Normalization Test Coverage Recap
- Normalization Deferred Implementation Roadmap
- Public Roadmap Checkpoint
- Milestone Checkpoint CO-037 To CO-049
- Governance Smoke Test Checkpoint
- Stabilization Checkpoint
- Production Readiness Gap Analysis
- Production Readiness Sequencing Roadmap
- Repository Navigation Guide
- Review Readiness Checklist
- Documentation Map Consistency Checklist
- Source Adapter Package Recap
- Roadmap
- Task Breakdown
- Limitations
- Public Safety
- PostgreSQL Database Notes
Near-term work moves from documentation polish to schema scripts, Python source discovery, PostgreSQL startup checks, raw archive handling, and the first DEFRA/DESNZ ingestion slice. The .NET Worker Service path follows as an independent implementation option.
See docs/roadmap.md and docs/task-breakdown.md.
Issues and pull requests are welcome for documentation, examples, parser mappings, source discovery, database schema notes, and implementation improvements.
CarbonOps-Parser does not:
- Calculate carbon inventories.
- Produce emissions reports.
- Replace source-owner documentation or source files.
- Guarantee source data correctness.
- Provide a deployment platform.
- Normalize all source families into one shared factor table during Phase 1.
CarbonOps-Parser is licensed under the Apache License 2.0.