Skip to content

docs(validation): harden docs validation pipeline after Docusaurus migration (#1174) #1226

@planetf1

Description

@planetf1

Context

PR #1174 (Docusaurus migration) is a good opportunity to audit the validation pipeline and close several gaps that exist across tiers — editor/pre-commit/CI scripts/Docusaurus build. This issue tracks follow-on hardening work once #1174 merges.


1. Fix validate_anchor_collisions algorithm (correctness bug)

validate_anchor_collisions in tooling/docs-autogen/validate.py currently imports mintlify_anchor from test_mintlify_anchors.py to generate anchor slugs for collision detection. After the Docusaurus migration this is the wrong algorithm — Docusaurus uses a different slug function — so the check will produce false positives and misses.

Since onBrokenAnchors: 'throw' in docusaurus.config.ts already catches real collisions at build time with the correct algorithm, the validate.py check is now both redundant and incorrect. Options:

  • Update the slug function to match Docusaurus's algorithm, or
  • Retire the check entirely in favour of the build gate

2. Extend RST docstring detection and promote to a hard fail

validate_rst_docstrings() currently only detects RST double-backtick notation (``Symbol``) and is warning-only (explicitly excluded from overall_passed). It misses the :param:, :type:, :returns:, :rtype:, .. note::, and .. deprecated:: constructs that are the more common RST style patterns in practice.

Two changes needed:

  • Extend the detection patterns to cover the full set of RST constructs
  • Promote the check to a hard fail by including it in overall_passed

Additionally, add a pre-commit grep hook on mellea/**/*.py so contributors get fast local feedback before CI:

- id: no-rst-docstrings
  name: "No RST-style docstrings"
  entry: bash -c 'grep -rn ":param \|:type \|:returns:\|:rtype:\|^\s*\.\. note::\|^\s*\.\. deprecated::" "$@" && echo "RST docstring syntax found — use Google style (Args:/Returns:)" && exit 1 || exit 0'
  language: system
  types: [python]
  files: ^mellea/

Note: ruff's D rules with convention = "google" enforce section structure (presence/absence of Args:, Returns:) but do not and cannot detect RST content markup — so this genuinely needs a separate check at every tier.


3. Harden CI soft gates in docs-publish.yml

All validate.py steps except the docstring quality gate use continue-on-error: true, meaning failures appear in the job summary but do not block deploy. With the Mintlify orphan-branch deployment complexity removed by #1174, there is less justification for soft gates.

Steps to harden (remove continue-on-error: true):

  • validate_api — source link staleness, import path drift, stale files, and examples catalogue errors all currently reach production silently
  • audit_coverage — below-threshold API coverage currently does not block deploy

markdownlint could also be hardened; this is lower priority since the Docusaurus build would still succeed even with markdown style issues.


4. Extend markdownlint to .mdx files

The markdownlint gate currently covers only docs/docs/**/*.md. Authored .mdx pages and the generated API docs (all .mdx) receive no markdown linting at any tier below the Docusaurus build compile step.

Pre-commit hook should include .mdx:

files: ^docs/docs/.*\.(md|mdx)$

CI should similarly update the glob:

npx markdownlint-cli "docs/docs/**/*.{md,mdx}" --config docs/docs/.markdownlint.json

Note: some markdownlint rules may need adjusting for .mdx (JSX syntax in inline components). The existing .markdownlint.json already disables MD033 (HTML) which covers most JSX cases.


6. External link rot detection

Neither onBrokenLinks: 'throw' nor any validate.py check covers external URLs. Dead external links are invisible until users report them.

Add a scheduled workflow (weekly) using a link checker (e.g. lychee) against the built site or the source .md/.mdx files, with results posted as a summary or issue. This is lower priority than items 1–4 and worth a separate implementation discussion.


Sequencing

Items 1–4 are all low-effort and could ship as a single follow-on PR after #1174 merges. Item 6 warrants its own scoped issue once the others are addressed.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions