artifact-parser

A small, pluggable framework for turning the JSON artifacts that data tools spit out into typed, validated Python objects. Point it at a blob, get back a pydantic model — no manual key-spelunking, no guessing which schema version you're holding.

The framework is deliberately source-agnostic. Each plugin owns one family of artifacts and registers itself with a shared registry. The first one ships in the box: a full dbt-core parser (catalog, manifest, run-results, sources).

Install

uv add artifact-parser     # or: pip install artifact-parser

Quick start

The headline entry point sniffs any supported artifact and routes it to the right plugin — you don't have to know what you're holding:

import json
from artifact_parser import parse

artifact = json.loads(open("target/manifest.json").read())
model = parse(artifact)          # -> a ManifestV12 (or whatever version it is)
print(model.metadata.dbt_schema_version)

When you do know the artifact family, the dbt plugin's typed helpers are more precise (and give better editor autocomplete):

from artifact_parser.dbt import parse_manifest, parse_catalog

manifest = parse_manifest(json.loads(open("target/manifest.json").read()))
catalog = parse_catalog(json.loads(open("target/catalog.json").read()))

Hand it something it doesn't recognise and it tells you so, loudly, instead of returning a half-populated object:

from artifact_parser import parse, UnknownArtifactError

try:
    parse({"metadata": {"dbt_schema_version": "made-up/v99.json"}})
except UnknownArtifactError as exc:
    print(exc)   # No registered parser recognises this artifact. Tried: dbt.

Supported dbt artifacts

Artifact	Versions	Generic parser	Version-pinned parsers
`catalog`	v1	`parse_catalog`	`parse_catalog_v1`
`manifest`	v1–v12	`parse_manifest`	`parse_manifest_v1` … `_v12`
`run-results`	v1–v6	`parse_run_results`	`parse_run_results_v1` … `_v6`
`sources`	v1–v3	`parse_sources`	`parse_sources_v1` … `_v3`

Architecture

src/artifact_parser/
├── core/                 # the framework — no knowledge of any specific tool
│   ├── base.py           #   BaseArtifactModel (shared pydantic root)
│   ├── parser.py         #   ArtifactParser (the plugin contract)
│   ├── registry.py       #   ParserRegistry + the shared `registry` instance
│   └── exceptions.py     #   ArtifactParserError + friends
└── dbt/                  # the first plugin: dbt-core artifacts
    ├── plugin.py         #   DbtArtifactParser (implements ArtifactParser)
    ├── utils.py          #   schema-version sniffing
    ├── resources/        #   committed dbt-core JSON schemas (codegen input)
    └── generated/        #   droppable, rebuilt by `codegen dbt`
        ├── parser.py     #     parse_<artifact>[_vN] public API
        ├── version_map.py#     schema-version URL -> model class
        └── models/       #     typed pydantic models, one module per version

The generated code is walled off in generated/. You can rm -rf that whole directory and rebuild it with codegen dbt (the package still imports while it's gone — the dbt plugin just sits out until you regenerate).

The flow: a plugin answers "is this mine?" (can_parse) and "make it typed" (parse). The registry tries plugins in registration order and returns the first match. dbt registers itself on import, so parse(...) works out of the box.

Adding a new parser

The whole point of the core/ framework is that the second parser is cheap. By hand:

Create src/artifact_parser/<tool>/.
Define your models on BaseArtifactModel.
Implement ArtifactParser (name, can_parse, parse) in plugin.py.
Register it in the package __init__.py: registry.register(MyParser()).
Import your plugin from the top-level artifact_parser/__init__.py.

That's it — parse() now routes matching artifacts to your plugin.

Development

This project uses uv and Task. Common targets:

Goal	Task
Sync the environment	`task install`
Format + autofix	`task format`
Lint (format-check + ruff)	`task lint`
Run tests at 100% coverage	`task test`

task --list shows everything. The test suite enforces 100% coverage of the framework and dbt dispatch code (the generated dbt models are excluded — they're schema, not logic). Beyond the synthetic fixtures, real artifacts from a live dbt build live in tests/data/ and round-trip through the public parse() in tests/artifact_parser/dbt/test_roundtrip.py — the only tests that exercise populated nodes end to end.

One non-obvious rule the generator enforces: the generated models are relaxed to pydantic extra="ignore" (not the extra="forbid" dbt's schemas imply), because real artifacts carry fields the published schema omits. A strict model would reject a perfectly good manifest.json. See CLAUDE.md for the why.

CI

GitHub Actions back the same gates:

Workflow	What it does
`ci.yml`	Lint + 100%-coverage tests on Python 3.10–3.13, plus a codegen-in-sync job that fails if the committed `generated/` drifts from a fresh regen.
`schema-watch.yml`	Weekly (and on demand): probes dbt's published schemas, regenerates, and opens a PR if a new version appeared.
`release.yml`	Build + coverage gate, then PyPI Trusted Publishing on a published Release (or TestPyPI via manual dispatch).

Action versions and Python deps are kept current by Dependabot.

Agentic setup

This repo is wired for Claude Code: a project CLAUDE.md, a parser-author subagent that owns src/, slash commands (/test, /codegen), secret-blocking and post-edit lint hooks, and the context7 MCP for pulling fresh library docs. See CLAUDE.md for the full tour. It will not write your code for you, but it tries hard to keep you from shipping a failing coverage gate.

Support

If this saved you from hand-spelunking a manifest.json, consider fuelling the next release:

☕ Buy me a coffee

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.claude		.claude
.github		.github
docs/design		docs/design
src		src
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
Taskfile.yml		Taskfile.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

artifact-parser

Install

Quick start

Supported dbt artifacts

Architecture

Adding a new parser

Development

CI

Agentic setup

Support

About

Uh oh!

Releases 1

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

artifact-parser

Install

Quick start

Supported dbt artifacts

Architecture

Adding a new parser

Development

CI

Agentic setup

Support

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors

Uh oh!

Languages