Skip to content

Conversation

@ianktc
Copy link

@ianktc ianktc commented Dec 3, 2025

Not to necessarily merge, but to continue the discussion on the JSON schema for GTFS validation report diffs. Follows JSON schema version 2020-12. Although I don't think we will actually need a formally defined schema, it conveys an idea of the JSON structure (or somewhere to start).

Rough structure, where some of the "diff" types are described below:

{
	"summary": {
        "metadata": {
            "validator_version": "diff",
            "service_window":  "diff",
            "counts": {
                "agencies": "diff",
                "blocks": "diff",
                "routes": "diff",
                "shapes": "diff",
                "stops": "diff",
                "trips": "diff"
            },
            "features": "diff"
        },
        "compliance": {
            "totalNotices": "diff",
            "uniqueNotices": "diff",
            "uniqueErrorNotices": "diff",
            "uniqueWarningNotices": "diff",
            "uniqueInfoNotices": "diff"
        }
    },
    "notices": "same format as validator notices"
}

Diff types:

  • Added/removed (strings)
"features": {
    "added": ["fares"],
    "removed": []
}
  • Count (integer)
"counts": {
    "trips": {
        "diff": -20,
        "old_count": 200,
        "new_count": 190
    }
}
"compliance": {
    "totalNotices": {
        "diff": -100,
        "old_count": 200,
        "new_count": 100
    }
}
  • Old/new (strings)
"service_window": {
    "old_service_window": {
        "start_date": "20240101",
        "end_date": "20241231"
    },
    "new_service_window": {
        "start_date": "20250101",
        "end_date": "20251231"
    }
}
"validator_version": {
    "old_validator_version": "7.0",
    "new_validator_version": "7.1"
}

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ianktc ianktc changed the title GTFS Validator JSON Schema GTFS Validator Diff JSON Schema Dec 3, 2025
@ianktc ianktc requested a review from emmambd December 3, 2025 20:40
@ianktc ianktc self-assigned this Dec 3, 2025
@ianktc
Copy link
Author

ianktc commented Dec 3, 2025

Dropping this here cause unsure of where else to put it. Just some thoughts on schema design for the database:

New Table ValidationDiff

Columns:
- ID (PK)
- From_ID (FK - ValidationReport.ID)
- To_ID (FK - ValidationReport.ID)
- Diffed_At
- HTML Report
- JSON Report

Some optional columns to consider:
- Diff Unique Error Count
- Diff Unique Warning Count
- Diff Unique Info Count
- Diff Features

Constraints:
- From_ID != null
- To_ID != null
- From_ID != To_ID

New Table DatasetDiff

Columns:
- ID (PK)
- From_ID (FK - GtfsDataset.ID)
- To_ID (FK - GtfsDataset.ID)
- Diffed_At
- HTML Report
- JSON Report

Some optional columns to consider:
- % Files changed in the diff
- % fields added or removed
- % fields updated

Constraints:
- From_ID != null
- To_ID != null
- From_ID != To_ID

New Table DiffReport

Columns:
- ID (PK)
- ValidationDiff (FK - ValidationDiff.ID)
- DatasetDiff (FK - DatasetDiff.ID)
- Diffed_At
- HTML Report
- JSON Report

Some optional columns to consider:
- any of the optional columns of the DatasetDiff or ValidationDiff tables if they are omitted

Notes:
- DatasetDiff and ValidationReportDiff can be omitted, if the processing is done in two separate components and merged into a single diff report (for DiffReport table)
- mdb stable id is retrievable from ValidationReport.ID <-> ValidationReportGtfsDataset.DatasetID <-> GtfsDataset.ID
- can group validation diff reports based on mdb stable id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants