Skip to content

[FEATURE]: Add Diagnostic Verification Reporting (HTML, Markdown, and detailed failure diagnostics) #87

@abhranshu

Description

@abhranshu

Feature and its Use Cases

Currently, when a user runs verify_dataset.py, the verification report is limited to:

An ASCII table printed to the terminal showing PASS/FAIL per check
An optional raw JSON file via the --json flag
This works for quick checks, but it falls short in two critical areas:

Problem 1: No shareable, readable reports
The ASCII table only makes sense in a terminal. If a researcher wants to share verification results with a team, attach them to a paper, or post them in a GitHub issue/PR — there's no way to do it. The JSON output is machine-readable but not human-friendly.

Problem 2: No diagnostic details when verification fails
When a check fails (e.g., processed_sha256 shows FAIL), the report only shows the expected vs actual hash. It gives zero guidance on why it failed. The user is left guessing:

Did the environment change? Which packages differ?
Did the processed output change? Where exactly does it diverge?
Is it a Python version issue? A platform issue?

Additional Context

No response

Code of Conduct

  • I have joined the Discord server and will post updates there
  • I have searched existing issues to avoid duplicates

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions