Add JSON Schema and CI validation for data.json by ksallee · Pull Request #20 · ves-tech/on-set

ksallee · 2026-05-12T19:58:54Z

Treats data.json as a spec — adds a JSON Schema, a validator, and a GitHub Actions workflow that regenerates data from doc_export.html and validates on every PR, so the parser and the committed artifact cannot silently drift.

While wiring this up the schema surfaced 43 empty-titled subsections in data.json caused by convert_doc.py appending a subsection for every <h2> element including styling-only ones. This PR includes a one-line skip for empty <h2>s and regenerates data.json; CI is green after that fix. The source Google Doc likely still contains those empty <h2>s — I don't have edit access there.

The schema is strict by default (additionalProperties: false at the top level and on data entries) so future field additions surface as CI failures and prompt a schema bump alongside the data change. Easy to relax to true if you'd rather have the schema document the shape without constraining it.

Files:

data/schema.json — draft-2020-12 schema
data/validate.py — local + CI validator (run via python data/validate.py)
.github/workflows/validate.yml — round-trip + validate on push to main and on every PR
data/convert_doc.py — skip empty <h2> elements
data/data.json — regenerated (43 phantom subsections removed)
requirements.txt — added jsonschema

Treats data.json as a spec: data/schema.json describes the shape convert_doc.py produces, data/validate.py runs the check locally, and a GitHub Actions workflow regenerates the data from doc_export.html and validates on every PR, so the parser and the committed artifact cannot silently drift. While wiring this up the schema surfaced 43 empty-titled subsections in data.json. They were coming from convert_doc.py appending a subsection for every <h2> including styling-only ones; this commit includes a one-line skip for empty <h2>s and regenerates data.json. The source Google Doc likely still contains those empty <h2>s. Schema is strict by default (additionalProperties: false at the top level and on data entries) so future field additions surface as CI failures and prompt a schema bump. Happy to relax if maintainers prefer the schema to document rather than constrain.

richardssam · 2026-05-25T16:00:37Z

This is a great addition. Does the change you made to convert_doc fix all the cases?
What I'm wondering is whether we should be doing the validation as we convert the doc to data.json (using the convert_doc.py script).

ksallee · 2026-05-26T16:08:59Z

This is a great addition. Does the change you made to convert_doc fix all the cases? What I'm wondering is whether we should be doing the validation as we convert the doc to data.json (using the convert_doc.py script).

Thanks!

The empty <h2> skip cleared all 43 violations the schema caught against the current doc_export.html. It doesn't guarantee every future export is clean, but that's exactly what the schema is for: next time the doc changes shape, CI fails and we either tighten the parser or bump the schema deliberately.

Happy to move the validation call into convert_doc.py so the check runs as part of the conversion anyone regenerating locally gets the same guardrail without remembering a second command. I'd keep validate.py as a standalone entry point too, so it can be pointed at any data.json independent of regeneration. Want me to add that in this PR?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JSON Schema and CI validation for data.json#20

Add JSON Schema and CI validation for data.json#20
ksallee wants to merge 1 commit into
ves-tech:mainfrom
ksallee:spike/data-json-spec

ksallee commented May 12, 2026

Uh oh!

richardssam commented May 25, 2026

Uh oh!

ksallee commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksallee commented May 12, 2026

Uh oh!

richardssam commented May 25, 2026

Uh oh!

ksallee commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ksallee commented May 26, 2026 •

edited

Loading