Parent PRD
#33
What to build
Introduce the ParsedArtifact model as the single parse output per Document (see PRD §Implementation Decisions — ParsedArtifact fields). This model stores the raw Docling output, postprocessed text, a metadata snapshot for audit, and the parser config used. No pipeline rewiring; models, migrations, and minimal admin only.
Acceptance criteria
Blocked by
User stories addressed
Reference by number from the parent PRD:
- User story 12 (one ParsedArtifact per Document, enforced in schema)
- User story 13 (ParsedArtifact stores docling JSON, postprocessed text, metadata_extracted, parser_config)
- User story 14 (metadata reconciliation audit trail via metadata_extracted)
Parent PRD
#33
What to build
Introduce the
ParsedArtifactmodel as the single parse output perDocument(see PRD §Implementation Decisions — ParsedArtifact fields). This model stores the raw Docling output, postprocessed text, a metadata snapshot for audit, and the parser config used. No pipeline rewiring; models, migrations, and minimal admin only.Acceptance criteria
ParsedArtifactmodel exists with:OneToOneField → Document(enforced at DB level),docling_outputJSONField (raw Docling JSON),postprocessed_textTextField,metadata_extractedJSONField (snapshot of parser-extracted metadata, used for audit and reconciliation),parser_configJSONField (Docling parameters and model versions).ParsedArtifactperDocument.ParsedArtifactis registered in Django admin (minimal).ParsedArtifactfor the sameDocumentraises an integrity error, all four fields are writable and retrievable.Blocked by
Documentmodel must exist)User stories addressed
Reference by number from the parent PRD: