Summary
Add semantic role classification to DAG nodes in the verify module: Source, Input, Processing, Output, Claim.
Current State
file_hashes table tracks files but doesn't classify them by pipeline role. The 3 states (VERIFIED, FAILED, UNKNOWN) describe integrity but not function.
Proposed Changes
- Add
role column to file_hashes schema: source | input | process | output | claim
- Auto-infer role from file type/location (
.py scripts → source/process, .csv/.npy → input/output)
- Surface role in Mermaid DAG visualization (shape/color per class)
- Enable role-based severity analysis in BPV output
Class Definitions
| Class |
Description |
Examples |
| Source |
Data acquisition scripts |
01_source.py |
| Input |
Raw data, configuration |
source.csv, config.yaml |
| Processing |
Transform/analysis scripts |
02_preprocess.py |
| Output |
Intermediate/final data |
clean_A.csv, results.csv |
| Claim |
Paper assertions |
p=0.003 (L.42), Figure 1 |
Motivation
- Enables severity analysis: Source-level tampering → full invalidation; Output-level → specific Claims
- Framework-agnostic vocabulary readers can map to their own pipelines
- Aligns verify module with figrecipe Schematic's node classes (figrecipe#95)
Summary
Add semantic role classification to DAG nodes in the verify module: Source, Input, Processing, Output, Claim.
Current State
file_hashestable tracks files but doesn't classify them by pipeline role. The 3 states (VERIFIED, FAILED, UNKNOWN) describe integrity but not function.Proposed Changes
rolecolumn tofile_hashesschema:source | input | process | output | claim.pyscripts → source/process,.csv/.npy→ input/output)Class Definitions
01_source.pysource.csv,config.yaml02_preprocess.pyclean_A.csv,results.csvp=0.003 (L.42),Figure 1Motivation