Skip to content

feat: docsfy - AI-powered documentation generator#2

Open
myakove wants to merge 21 commits intomainfrom
feature/docsfy-implementation-plan
Open

feat: docsfy - AI-powered documentation generator#2
myakove wants to merge 21 commits intomainfrom
feature/docsfy-implementation-plan

Conversation

@myakove
Copy link
Contributor

@myakove myakove commented Mar 4, 2026

Summary

Implements docsfy, an AI-powered documentation generator that creates polished static HTML documentation from GitHub repositories. It uses Claude, Gemini, or Cursor CLI as AI backends to analyze codebases and produce comprehensive, browseable documentation served via a FastAPI web application.

Key Features

  • Multi-provider AI support - Claude, Gemini, and Cursor CLI backends with pluggable architecture
  • Automated doc generation - Analyzes repository structure, plans documentation, and generates pages concurrently
  • Static HTML rendering - Jinja2-based templates with dark/light theme toggle and full-text search
  • llms.txt generation - Produces LLM-optimized documentation summaries
  • REST API - FastAPI endpoints to generate, serve, browse, and download docs
  • Container-ready - Dockerfile and docker-compose for one-command deployment
  • Local repo support - Generate docs from local repositories in addition to remote GitHub URLs
  • SQLite storage - Persistent project metadata and status tracking

Architecture Overview

docsfy/
  config.py          - Pydantic-settings configuration
  models.py          - Request/response Pydantic models
  ai_cli.py          - AI CLI provider abstraction (Claude, Gemini, Cursor)
  json_parser.py     - Multi-strategy JSON response parser
  repository.py      - Git repository cloning (shallow clone support)
  storage.py         - SQLite storage layer
  generator.py       - Doc planner + concurrent page generator
  renderer.py        - HTML renderer with Jinja2 templates
  app.py             - FastAPI application with all API endpoints
  prompts/           - AI prompt templates (planner, page generation)
  templates/         - HTML/CSS/JS templates with theming and search

Flow: API request --> clone repo --> AI plans doc structure --> concurrent page generation --> render HTML --> serve/download

Commits

Documentation & Planning

  • docs: add docsfy implementation plan with 13 TDD tasks

Project Setup

  • feat: project scaffolding with build config, linters, and container setup

Core Modules

  • feat: add configuration module with pydantic-settings
  • feat: add pydantic models for requests, doc plans, and project status
  • feat: add SQLite storage layer for project metadata
  • feat: add AI CLI provider module with claude, gemini, and cursor support
  • feat: add multi-strategy JSON response parser for AI CLI output
  • feat: add repository cloning with shallow clone support
  • feat: add AI prompt templates for planner and page generation
  • feat: add documentation generator with planner and concurrent page generation
  • feat: add HTML renderer with Jinja2 templates, dark/light theme, and search
  • feat: add FastAPI application with all API endpoints

Tests

  • test: add integration test for full generate-serve-download flow

Fixes & Improvements

  • fix: resolve all pre-commit hook issues
  • fix: remove gcloud and cursor volume mounts from docker-compose
  • fix: add uv.lock and copy it in Dockerfile for frozen sync
  • fix: optimize Dockerfile layer caching - CLI installs cached independently of code changes
  • feat: add llms.txt generation, on-this-page TOC, strip AI preamble, fix HTML rendering
  • feat: add UI improvements, ai-cli-runner migration, local repo support, llms.txt fixes, minimal README
  • chore: remove all Mintlify references
  • fix: address all code review findings - security, bugs, error handling

Stats

  • 45 files changed, 6,778 insertions, 1 deletion

Test Plan

  • Verify docker compose up builds and starts the application
  • Test doc generation via POST /api/generate with a public GitHub repo
  • Verify generated docs are browseable at /docs/{project}/
  • Test download endpoint returns a valid tar.gz archive
  • Verify dark/light theme toggle works correctly
  • Test search functionality across generated pages
  • Verify llms.txt is generated and accessible
  • Run pytest / tox for unit and integration tests
  • Test with each AI provider (Claude, Gemini, Cursor) if credentials available
  • Verify local repository support works with mounted volumes

Summary by CodeRabbit

New Features

  • AI-powered documentation generator – Generates production-quality static HTML documentation sites from repositories using multiple AI providers (Claude, Gemini, Cursor).
  • Web API – HTTP endpoints for generating docs, managing projects, downloading generated documentation, and serving live documentation.
  • Repository support – Handles both local and remote Git repositories with automatic cloning and caching.
  • Interactive documentation – Generated docs include search functionality, theme toggle, syntax highlighting, and auto-generated navigation.
  • Docker deployment – Containerized setup with docker-compose for easy deployment.

Chores

  • Project scaffolding, testing infrastructure, pre-commit hooks, and linting configuration.

myakove added 21 commits March 4, 2026 11:09
@myakove-bot
Copy link
Collaborator

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: Disabled for this repository
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: All label categories are enabled (default configuration)

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /automerge - Enable automatic merging when all requirements are met (maintainers and approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. Status Checks: All required status checks must pass
  3. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)
  4. Verified: PR must be marked as verified

📊 Review Process

Approvers and Reviewers

Approvers:

Reviewers:

Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
  • automerge

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is automatically removed on each new commit
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

This pull request introduces docsfy, a complete new AI-powered documentation generator project. It establishes infrastructure (Docker, configuration tools), a FastAPI backend with database storage, AI provider integration, repository handling, markdown-to-HTML rendering with static assets, Jinja templating, and comprehensive test coverage across all modules.

Changes

Cohort / File(s) Summary
Configuration & Development Setup
.env.example, .flake8, .gitleaks.toml, .pre-commit-config.yaml, pyproject.toml, tox.toml
Environment defaults for AI providers, linting rules (flake8, mypy), secret scanning, pre-commit hooks, project metadata, dependencies, build system, and testing configuration.
Containerization & Deployment
Dockerfile, docker-compose.yaml
Multi-stage Docker build with Python 3.12, OpenShift compatibility, non-root user setup, CLI tool installation, and docker-compose service with health checks and volume mounting.
Documentation
README.md, docs/plans/2026-03-04-docsfy-*.md
Project overview and quick-start guide; implementation plan with 13 sequential tasks detailing module development and expected control flow.
Core Application Logic
src/docsfy/main.py, src/docsfy/generator.py, src/docsfy/storage.py
FastAPI endpoints for health, status, generation, project management, and downloads; asynchronous documentation generation with AI-driven planning and parallel page rendering; SQLite-backed project storage with metadata tracking.
Supporting Backend Modules
src/docsfy/config.py, src/docsfy/models.py, src/docsfy/ai_client.py, src/docsfy/json_parser.py, src/docsfy/prompts.py, src/docsfy/repository.py
Settings management with environment-based configuration; Pydantic data models with validation for requests and project metadata; re-exported AI provider interfaces; JSON extraction and parsing utilities; prompt templates for planning and page generation; Git repository cloning and info retrieval.
HTML Rendering & Templating
src/docsfy/renderer.py, src/docsfy/templates/index.html, src/docsfy/templates/page.html
Site generation with Markdown-to-HTML conversion, navigation helpers, search/LLMS indexes, and sitemap generation; Jinja templates for index and documentation pages with sidebar, search, theme toggle, and TOC support.
Frontend Static Assets
src/docsfy/static/style.css, src/docsfy/static/callouts.js, src/docsfy/static/codelabels.js, src/docsfy/static/copy.js, src/docsfy/static/github.js, src/docsfy/static/scrollspy.js, src/docsfy/static/search.js, src/docsfy/static/theme.js
Comprehensive CSS with light/dark theming; JavaScript modules for callout styling, code block labels, copy-to-clipboard, GitHub star fetching, scroll spy TOC tracking, search modal UI, and theme persistence.
Test Suite
tests/test_config.py, tests/test_models.py, tests/test_ai_client.py, tests/test_json_parser.py, tests/test_prompts.py, tests/test_repository.py, tests/test_storage.py, tests/test_generator.py, tests/test_renderer.py, tests/test_main.py, tests/test_integration.py
Unit and integration tests covering configuration, request models, AI provider exports, JSON parsing, prompt generation, repository operations, database CRUD, page generation workflows, HTML rendering, API endpoints, and end-to-end generation-to-download flows.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FastAPI
    participant Repository
    participant Generator
    participant AI_CLI
    participant Renderer
    participant Storage
    participant Database

    Client->>FastAPI: POST /api/generate<br/>(repo_url or repo_path)
    FastAPI->>Storage: save_project(name, status="generating")
    Storage->>Database: INSERT project
    Database-->>Storage: ✓
    Storage-->>FastAPI: ✓
    
    FastAPI->>Repository: clone_repo or get_local_repo_info
    Repository-->>FastAPI: (repo_path, commit_sha)
    
    FastAPI->>Generator: run_planner(repo_path, ...)
    Generator->>AI_CLI: call_ai_cli(planner_prompt)
    AI_CLI-->>Generator: JSON plan response
    Generator->>Generator: parse_json_response
    Generator-->>FastAPI: plan dict
    
    FastAPI->>Generator: generate_all_pages(repo_path, plan, ...)
    Generator->>AI_CLI: call_ai_cli(page_prompt) [concurrent]
    AI_CLI-->>Generator: markdown content
    Generator->>Generator: _strip_ai_preamble
    Generator-->>FastAPI: {slug: markdown}
    
    FastAPI->>Renderer: render_site(plan, pages, output_dir)
    Renderer->>Renderer: render_index, render_page
    Renderer->>Renderer: build_search_index, build_llms_txt
    Renderer-->>FastAPI: ✓ (HTML files, CSS, JS)
    
    FastAPI->>Storage: update_project_status(status="ready", page_count, ...)
    Storage->>Database: UPDATE project
    Database-->>Storage: ✓
    Storage-->>FastAPI: ✓
    
    Client->>FastAPI: GET /docs/project/index.html
    FastAPI-->>Client: rendered HTML with navigation & search
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops through code with glee,
Docs built by Claude, Gemini!
Static sites so bright,
With search and themes of night,
Docsfy makes docs go free! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the primary purpose of the changeset: implementing 'docsfy', an AI-powered documentation generator. It is concise, clear, and directly reflects the main feature being introduced across the 45 files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/docsfy-implementation-plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/plans/2026-03-04-docsfy-design.md (1)

112-116: ⚠️ Potential issue | 🟡 Minor

Same terminal escape code artifact in AI_MODEL default.

The claude-opus-4-6[1m] value contains what appears to be a terminal escape sequence. This should be corrected to match the intended model name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-03-04-docsfy-design.md` around lines 112 - 116, The AI_MODEL
default contains a stray terminal escape artifact "claude-opus-4-6[1m]"; update
the default value for AI_MODEL to the correct model identifier (e.g.,
"claude-opus-4-6") wherever it's defined in this document so the table entry
reads `AI_MODEL | claude-opus-4-6` and remove the "[1m]" sequence.
🟡 Minor comments (8)
src/docsfy/static/codelabels.js-5-5 (1)

5-5: ⚠️ Potential issue | 🟡 Minor

Language regex is too restrictive for real-world class names.

/language-(\w+)/ misses identifiers containing -, +, or #, so some code blocks won’t get labels.

Suggested fix
-    var match = classes.match(/language-(\w+)/);
+    var match = classes.match(/language-([a-z0-9#+-]+)/i);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/static/codelabels.js` at line 5, The regex in the classes.match
call is too narrow and misses language identifiers with characters like '-',
'+', or '#'; update the pattern used in the classes.match invocation (the
variable match assignment in src/docsfy/static/codelabels.js) to allow those
characters (e.g., include - + # alongside word chars in the capture group) so
code blocks with names containing hyphens or symbols are correctly detected and
labeled.
README.md-17-25 (1)

17-25: ⚠️ Potential issue | 🟡 Minor

Quick Start is platform-specific and slightly misleading.

docker compose up runs foreground by default, and open is macOS-only. Consider docker compose up -d plus cross-platform browser guidance.

Suggested doc tweak
-# Run
-docker compose up
+# Run
+docker compose up -d
@@
-# Browse docs
-open http://localhost:8000/docs/repo/
+# Browse docs
+# macOS: open, Linux: xdg-open, Windows: start
+http://localhost:8000/docs/repo/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 17 - 25, Update the Quick Start commands to avoid
platform-specific/misleading instructions: change the instructions that
currently use "docker compose up" to recommend "docker compose up -d" (or note
foreground behavior) and replace the macOS-only "open
http://localhost:8000/docs/repo/" with cross-platform guidance (e.g., mention
using the URL directly or platform commands like "xdg-open" on Linux and "start"
on Windows), and keep the existing curl POST example as-is for generating docs.
src/docsfy/static/search.js-74-76 (1)

74-76: ⚠️ Potential issue | 🟡 Minor

Guard search against malformed index entries.

If an entry is missing title or content, toLowerCase() throws and search stops rendering.

💡 Proposed fix
-    var matches = index.filter(function(item) {
-      return item.title.toLowerCase().includes(q) || item.content.toLowerCase().includes(q);
-    }).slice(0, 10);
+    var matches = index.filter(function(item) {
+      var titleText = item && typeof item.title === 'string' ? item.title.toLowerCase() : '';
+      var contentText = item && typeof item.content === 'string' ? item.content.toLowerCase() : '';
+      return titleText.includes(q) || contentText.includes(q);
+    }).slice(0, 10);

@@
-      var contentIdx = m.content.toLowerCase().indexOf(q);
+      var rawContent = typeof m.content === 'string' ? m.content : '';
+      var contentIdx = rawContent.toLowerCase().indexOf(q);
       if (contentIdx >= 0) {
         var start = Math.max(0, contentIdx - 40);
-        var end = Math.min(m.content.length, contentIdx + q.length + 60);
-        var snippet = (start > 0 ? '...' : '') + m.content.substring(start, end) + (end < m.content.length ? '...' : '');
+        var end = Math.min(rawContent.length, contentIdx + q.length + 60);
+        var snippet = (start > 0 ? '...' : '') + rawContent.substring(start, end) + (end < rawContent.length ? '...' : '');
         preview.textContent = snippet;
       }

Also applies to: 91-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/static/search.js` around lines 74 - 76, The current index.filter
callback uses item.title.toLowerCase() and item.content.toLowerCase() which will
throw if title or content are missing; update the filter in the search logic
(the index.filter callback that produces matches) to safely coerce title/content
to strings (e.g., let title = (item.title || '').toLowerCase(); let content =
(item.content || '').toLowerCase()) and then use title.includes(q) ||
content.includes(q); apply the same defensive check to the other search
occurrence around the code that performs the second filter (the similar block at
lines 91-95) so all searches tolerate malformed entries.
src/docsfy/static/github.js-9-13 (1)

9-13: ⚠️ Potential issue | 🟡 Minor

Regex truncates valid GitHub repo names with dots.

Line 9 captures repo as ([^/.]+), so owner/my.repo becomes my. That breaks the API request for a valid repo name.

💡 Proposed fix
-  var match = repoUrl.match(/github\.com[/:]([^/]+)\/([^/.]+)/);
+  var match = repoUrl.match(/github\.com[/:]([^/]+)\/([^/?#]+?)(?:\.git)?(?:[/?#]|$)/i);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/static/github.js` around lines 9 - 13, The current regex in the
repoUrl.match call uses ([^/.]+) which stops at a dot and truncates valid repo
names; update the pattern used in the github.com match (the repoUrl.match
invocation and its resulting match handling for owner and repo) to allow dots in
repo names and optionally strip a trailing .git (e.g. match the owner with
([^/]+) and the repo with ([^/]+)(?:\.git)?), then assign owner = match[1] and
repo = match[2] as before so full repo names like owner/my.repo are preserved.
tests/test_config.py-42-42 (1)

42-42: ⚠️ Potential issue | 🟡 Minor

Use a specific exception type in pytest.raises.

Line 42 should assert ValidationError instead of Exception to avoid hiding unrelated failures.

💡 Proposed fix
 import pytest
+from pydantic import ValidationError
@@
-        with pytest.raises(Exception):
+        with pytest.raises(ValidationError):
             Settings()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_config.py` at line 42, Replace the generic exception assertion in
the test (the "with pytest.raises(Exception):" block) with a specific
ValidationError by changing it to "with pytest.raises(ValidationError):" and
ensure the test file imports ValidationError (e.g., "from pydantic import
ValidationError" or the project's ValidationError class) so the test only
catches the intended validation failure.
docs/plans/2026-03-04-docsfy-implementation-plan.md-18-21 (1)

18-21: ⚠️ Potential issue | 🟡 Minor

Local filesystem paths won't work for other developers.

Lines 18-21 and 79-81 reference paths like /home/myakove/git/pr-test-oracle/... which are specific to one developer's machine. Consider either embedding the actual content or referencing a public repository/URL.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-03-04-docsfy-implementation-plan.md` around lines 18 - 21,
The plan references local filesystem paths for artifacts
(.pre-commit-config.yaml, .flake8, tox.toml, .gitleaks.toml) which won't resolve
for other devs; replace those local path references in the docs/plans entry with
either the actual content (inline the files or paste their contents into the
repo under the same names) or point to a stable public location (a project repo
URL or gist) where the files can be fetched, and update the bullet lines to
reference the new repository/URLs or the relative paths within this repo instead
of /home/... paths.
docs/plans/2026-03-04-docsfy-implementation-plan.md-131-132 (1)

131-132: ⚠️ Potential issue | 🟡 Minor

Terminal escape code artifact in AI_MODEL default value.

The value claude-opus-4-6[1m] appears to contain a terminal escape code artifact ([1m] is ANSI bold). This appears in multiple places in the plan (lines 131, 265, 322) and will cause the AI CLI to use an invalid model name.

🐛 Proposed fix
-AI_MODEL=claude-opus-4-6[1m]
+AI_MODEL=claude-opus-4-20250514

Or use the intended model name without the escape sequence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-03-04-docsfy-implementation-plan.md` around lines 131 - 132,
The AI_MODEL default contains a terminal escape artifact
("claude-opus-4-6[1m]"); remove the ANSI fragment and replace all occurrences
with the intended model name (e.g., "claude-opus-4-6") wherever AI_MODEL is
defined or referenced in this document (notably the instances matching the shown
diff), ensuring other related variables like AI_CLI_TIMEOUT remain unchanged;
search for "AI_MODEL" and replace any value containing "[1m]" with the clean
model string.
tests/test_models.py-35-36 (1)

35-36: ⚠️ Potential issue | 🟡 Minor

Use ValidationError instead of broad Exception assertions in validation tests.

pytest.raises(Exception) can pass on unrelated errors and hide regressions. The GenerateRequest model uses Pydantic v2 validators that raise ValueError, which Pydantic wraps in ValidationError before propagating to the caller. Use ValidationError specifically for these assertions.

Suggested tightening
+from pydantic import ValidationError
...
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         GenerateRequest(repo_url="not-a-url")
...
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         GenerateRequest()
...
-    with pytest.raises(Exception):
+    with pytest.raises(ValidationError):
         GenerateRequest(
             repo_url="https://github.com/org/repo.git", repo_path="/some/path"
         )

Also applies to: 78-79, 85-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_models.py` around lines 35 - 36, Replace broad Exception
assertions in the tests with Pydantic's ValidationError: change
pytest.raises(Exception) to pytest.raises(ValidationError) when instantiating
GenerateRequest (and the other failing cases noted for the same model), and
import ValidationError from pydantic at the top of tests/test_models.py so the
tests assert the specific validation error raised by GenerateRequest's Pydantic
validators.
🧹 Nitpick comments (8)
tox.toml (1)

13-26: Avoid nesting uv run inside tox unless isolation tradeoff is intentional.

The use of uv run --extra dev at lines 17-24 shifts environment management to uv rather than tox. While pytest-xdist is confirmed present in the dev extra (supporting -n auto), this architecture creates a dependency on project extras rather than explicit tox-managed deps, which can complicate environment reproducibility. Consider a tox-native approach:

Suggested tox-native approach
 [env.unittests]
-deps = ["uv"]
+deps = [".[dev]"]
 commands = [
   [
-    "uv",
-    "run",
-    "--extra",
-    "dev",
     "pytest",
     "-n",
     "auto",
     "tests",
   ],
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tox.toml` around lines 13 - 26, The tox env 'env.unittests' currently shells
out to "uv run --extra dev pytest -n auto" which delegates dependency management
to uv; change it to a tox-native setup by removing the "uv run" invocation and
instead declare explicit deps like "pytest" and "pytest-xdist" in the
env.unittests deps list and run "pytest -n auto tests" directly in the commands
array; update the env.unittests block (look for the deps/commands entries) so
tox installs and controls test runner dependencies rather than relying on the
project's dev extra.
tests/test_repository.py (1)

23-27: Strengthen mock assertions for subprocess call arguments.

Current success test checks return values only. Consider asserting expected subprocess args (including timeout and -- once added) to lock in safety behavior.

💡 Suggested assertion pattern
-from unittest.mock import MagicMock, patch
+from unittest.mock import ANY, MagicMock, call, patch
@@
     with patch("docsfy.repository.subprocess.run") as mock_run:
@@
         repo_path, sha = clone_repo("https://github.com/org/repo.git", tmp_path)
+        assert mock_run.call_args_list[0] == call(
+            ["git", "clone", "--depth", "1", "--", "https://github.com/org/repo.git", str(tmp_path / "repo")],
+            capture_output=True,
+            text=True,
+            timeout=ANY,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_repository.py` around lines 23 - 27, Add assertions that the
patched subprocess.run was called with the exact expected arguments to lock
behavior: after the test exercise of docsfy.repository functions that invokes
subprocess.run, assert mock_run.assert_any_call(...) (or inspect
mock_run.call_args_list) includes the expected argv list containing the command
elements and the '--' separator and that timeout kwarg is present with the
expected value; reference the patched symbol mock_run (from
patch("docsfy.repository.subprocess.run")) and the subprocess invocation in the
repository code to check both positional argv contents and keyword args
(timeout) rather than only return values.
.pre-commit-config.yaml (1)

34-34: Pin the VCS dependency to an immutable ref.

git+https://github.com/RedHatQE/flake8-plugins.git tracks a moving target on the default branch. Pin a commit SHA or tag to ensure reproducible builds and reduce supply-chain risk.

💡 Proposed fix
-          [git+https://github.com/RedHatQE/flake8-plugins.git, flake8-mutable]
+          [git+https://github.com/RedHatQE/flake8-plugins.git@<commit-sha>#egg=flake8-plugins, flake8-mutable]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml at line 34, The listed VCS dependency
git+https://github.com/RedHatQE/flake8-plugins.git for the hook flake8-mutable
is unpinned and should be fixed to an immutable ref; update the entry in
.pre-commit-config.yaml that contains the string
"git+https://github.com/RedHatQE/flake8-plugins.git" (and the hook name
"flake8-mutable") to reference a specific tag or commit SHA (e.g., append
@<tag-or-sha> or set rev: to a SHA) so the pre-commit hook is pinned and
reproducible.
src/docsfy/storage.py (2)

49-78: SQL query construction is safe but pattern could be improved.

Ruff flags potential SQL injection (S608), but this is a false positive since fields only contains hardcoded column names. However, the dynamic SQL pattern could be clearer. Consider adding a comment to document this safety guarantee for future maintainers.

💡 Optional: Add safety comment
 async def update_project_status(
     name: str,
     status: str,
     last_commit_sha: str | None = None,
     page_count: int | None = None,
     error_message: str | None = None,
     plan_json: str | None = None,
 ) -> None:
     async with aiosqlite.connect(DB_PATH) as db:
+        # Fields list contains only hardcoded column names - safe from SQL injection
+        # All user values are parameterized via the values list
         fields = ["status = ?", "updated_at = CURRENT_TIMESTAMP"]
         values: list[str | int | None] = [status]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/storage.py` around lines 49 - 78, The dynamic SQL in
update_project_status builds the fields list from hardcoded column names which
is safe but trigged an S608 false positive; add a brief comment above the
fields/values construction (near the symbols fields, values and DB_PATH in
update_project_status) stating that fields are only populated with predetermined
column names (no user-controlled input) and therefore safe from SQL injection,
or alternatively replace the implicit appends with an explicit
allowed_columns/column-to-placeholder mapping to make the guarantee obvious to
future maintainers and linters.

8-10: Module-level variables may cause issues in tests.

DB_PATH, DATA_DIR, and PROJECTS_DIR are computed at module import time from environment variables. The test fixture directly mutates these module attributes, which works but is fragile. This is acceptable for the current test setup, but be aware of potential issues if tests run in parallel or if the module is reimported.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/storage.py` around lines 8 - 10, DB_PATH, DATA_DIR, and
PROJECTS_DIR are computed at import time which makes tests that mutate module
attributes fragile; change them to be computed lazily by replacing the
module-level constants with accessor functions (e.g., get_db_path(),
get_data_dir(), get_projects_dir()) or properties that read os.getenv() on each
call, update all call sites and tests to use these accessors, and ensure tests
set environment variables (or monkeypatch the accessors) before calling the
accessors so parallel/reimport scenarios no longer rely on mutable module state.
src/docsfy/static/style.css (1)

628-633: Consider using complex :not() pseudo-class notation for Stylelint compliance.

Stylelint flags the chained simple :not() selectors. The complex notation is more readable and future-proof.

♻️ Proposed refactor
-blockquote:not(.callout-note):not(.callout-warning):not(.callout-tip) {
+blockquote:not(.callout-note, .callout-warning, .callout-tip) {
     border-left: 4px solid var(--border-primary);
     padding: 1rem 1.25rem;
     margin: 1.5rem 0;
     color: var(--text-secondary);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/static/style.css` around lines 628 - 633, The selector
"blockquote:not(.callout-note):not(.callout-warning):not(.callout-tip)" is
flagged by Stylelint for chained :not() usage; replace the chained simple :not()
pseudo-classes with a single complex :not() containing the comma-separated list
of the three callout classes so Stylelint passes and the rule remains
equivalent, updating the block where the selector is defined in style.css (the
blockquote selector) and ensuring spacing and variable usage (border-left,
padding, margin, color) remain unchanged.
tests/test_renderer.py (1)

56-73: Strengthen search-index.json test beyond file existence.

Consider asserting valid JSON structure and at least one expected entry (slug, title, or searchable content) so regressions in index serialization are caught.

Suggested test hardening
+import json
...
 def test_search_index_generated(tmp_path: Path) -> None:
@@
     render_site(plan=plan, pages=pages, output_dir=output_dir)
-    assert (output_dir / "search-index.json").exists()
+    index_path = output_dir / "search-index.json"
+    assert index_path.exists()
+    payload = json.loads(index_path.read_text())
+    assert isinstance(payload, list)
+    assert any(item.get("slug") == "intro" for item in payload)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_renderer.py` around lines 56 - 73, Update the
test_search_index_generated test to not only check existence of
search-index.json but to open and json.load the file produced by render_site,
assert it is valid JSON (list or dict as expected by your renderer), and assert
at least one entry contains the expected page data (e.g., an entry with slug
"intro" and/or title "Intro" or searchable content substring). Use the same
output_dir / "search-index.json" path, call json.loads or json.load on that
file, and add assertions on the structure and presence of the expected
keys/values to catch serialization regressions.
src/docsfy/templates/index.html (1)

43-44: Expose sidebar state to assistive technologies.

The toggle should maintain aria-expanded and reference the controlled element for better accessibility.

Suggested accessibility update
-                <button class="sidebar-toggle" id="sidebar-toggle" aria-label="Toggle sidebar">
+                <button class="sidebar-toggle" id="sidebar-toggle" aria-label="Toggle sidebar" aria-controls="sidebar" aria-expanded="false">
@@
             toggle.addEventListener('click', function() {
-                sidebar.classList.toggle('open');
+                var isOpen = sidebar.classList.toggle('open');
+                toggle.setAttribute('aria-expanded', isOpen ? 'true' : 'false');
                 if (overlay) overlay.classList.toggle('open');
             });

Also applies to: 123-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/templates/index.html` around lines 43 - 44, The sidebar toggle
button with id "sidebar-toggle" must expose its state to ATs: add an
aria-controls attribute pointing to the controlled sidebar element id (e.g.,
"sidebar") and ensure the button maintains an accurate aria-expanded boolean
that is updated when the toggle runs; locate the toggle element (id
"sidebar-toggle") and the sidebar element (class or id "sidebar") and update the
toggle handler (e.g., the click listener or function that shows/hides the
sidebar) to set button.setAttribute('aria-expanded', String(isOpen)) and
button.setAttribute('aria-controls', sidebarId) whenever the visibility changes
so screen readers can detect the relationship and current state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.env.example:
- Line 3: The AI_MODEL value in .env.example is malformed
("claude-opus-4-6[1m"); replace it with a valid Claude model ID (for example
"claude-opus-4-20250514" or the newer "claude-opus-4-1-20250805") so the
AI_MODEL environment variable uses a correct, complete model identifier without
ANSI sequences.

In @.gitleaks.toml:
- Around line 4-7: The allowlist entry under the [allowlist] section currently
exempts all Python test files via the paths list ('''tests/.*\.py''') which is
too broad; instead, narrow the scope by removing the blanket paths pattern and
migrate exemptions to rule-level allowlists for specific known false positives
(or list explicit filenames rather than a directory-wide regex). Update the
config to remove or tighten the '''tests/.*\.py''' path entry and add
rule-specific allowlist entries referencing the offending rule IDs or exact test
fixture filenames so only known benign files are exempted.

In `@src/docsfy/config.py`:
- Line 17: The default ai_model string is invalid; remove the ANSI suffix and
replace the unsupported model id by setting the ai_model variable (ai_model:
str) to a supported Anthropic model identifier such as
"claude-sonnet-4-20250514" so API calls will succeed (i.e., change ai_model from
"claude-opus-4-6[1m]" to "claude-sonnet-4-20250514").

In `@src/docsfy/generator.py`:
- Around line 67-68: Reject or sanitize unsafe AI-controlled slugs before using
them to construct filesystem paths: validate the `slug` used when creating
`cache_file = cache_dir / f"{slug}.md"` (and the other occurrences around the
`use_cache` checks at the later blocks) to ensure it contains only allowed
characters (e.g., alphanumerics, hyphen, underscore), does not contain path
separators like "..", "/", or "\" and does not start with a dot; alternatively
resolve the resulting path and assert it is inside `cache_dir` (e.g., compare
resolved parents) before any open/write operations, and raise/return an error
for invalid slugs so no file can be written outside `cache_dir`.

In `@src/docsfy/main.py`:
- Around line 275-283: The current code builds the entire tar.gz in memory using
BytesIO and returns a StreamingResponse, which risks high memory usage; instead,
create the archive on disk (e.g., using a temporary Path) in a background thread
(use loop.run_in_executor or FastAPI's async_to_sync pattern) by calling
tarfile.open(out_path, mode="w:gz") and tar.add(site_dir, arcname=name), then
return a FileResponse pointing at that out_path with the same
Content-Disposition header; ensure you delete the temp file after transfer or
use a temp file that is cleaned up automatically.
- Around line 75-108: The generate route is adding project_name into the
_generating set before validating it and not removing it if downstream
save_project or task creation fails; update the generate function to first
validate/normalize project_name (e.g., disallow spaces/special chars or run
existing route-name validation) and reject with HTTPException if invalid, then
add to _generating only after validation, and wrap the save_project call and
asyncio.create_task invocation in try/except/finally so that on any exception
you remove project_name from _generating and re-raise or return a
500/appropriate HTTPException; reference the generate function, the _generating
set, save_project, and _run_generation/asyncio.create_task when implementing
these changes.

In `@src/docsfy/renderer.py`:
- Around line 170-174: The render_site function currently uses
output_dir.mkdir(..., exist_ok=True) and assets_dir.mkdir(..., exist_ok=True)
which leaves previous build artifacts; modify render_site to remove any existing
output_dir contents before creating directories (e.g., check if
output_dir.exists() and call shutil.rmtree(output_dir) then recreate output_dir
and assets_dir), ensuring you import shutil at the top and keep the same
variable names (render_site, output_dir, assets_dir) so stale .html/.md/assets
are cleaned prior to rendering.
- Around line 188-208: The loop in render_page usage writes files using slug
directly (variables slug and output_dir) which permits path traversal; fix by
validating/sanitizing slug before any write: reject or normalize slugs that are
absolute, contain path separators or ".." (e.g. ensure slug == Path(slug).name
and matches a safe regex like r'^[A-Za-z0-9._-]+$'), then compute target =
(output_dir / safe_slug).resolve() and assert
str(target).startswith(str(output_dir.resolve())) before calling write_text for
both f"{slug}.html" and f"{slug}.md"; update the code around render_page and the
two write_text calls to use safe_slug/target and raise/log an error for invalid
slugs.
- Around line 31-42: The _md_to_html function returns HTML created by
python-markdown which can contain raw unsafe HTML; update _md_to_html to
sanitize both content_html and toc_html with Bleach before returning: add bleach
to dependencies, then call bleach.clean on content_html and toc_html using a
whitelist that preserves expected markdown HTML (allow tags needed for headings,
paragraphs, lists, links, images, code blocks, pre, span with classes for
codehilite) and allow attributes like href/src/alt/class/title and rel on links;
also use bleach.linkify or set rel="noopener noreferrer" on links if desired;
ensure the sanitized strings are what _md_to_html returns so the template's {{
content | safe }} no longer exposes stored XSS.

In `@src/docsfy/repository.py`:
- Line 23: The log currently prints the raw repo_url which can contain userinfo
(credentials) — update the logging in repository.py so logger.info does not
include sensitive userinfo from repo_url; parse repo_url (e.g., via
urllib.parse.urlparse) and redact or remove the username:password portion before
logging (keep repo_path in the message), then log the sanitized URL instead of
the raw repo_url in the logger.info call.

In `@src/docsfy/static/copy.js`:
- Around line 18-26: The Clipboard API call can throw synchronously when
unavailable; before calling navigator.clipboard.writeText(text) check that
navigator.clipboard exists and window.isSecureContext is true, and if not,
invoke the existing fallback copy routine instead of calling writeText;
otherwise proceed to call navigator.clipboard.writeText(text). Ensure you
reference the same button variable (btn) and preserve the success and error
handling (setting btn.textContent and btn.classList) in the Promise path while
routing to the fallback path immediately when the guard fails.

In `@src/docsfy/static/style.css`:
- Around line 1055-1070: The CSS rules for .copy-btn, pre:hover .copy-btn and
.copy-btn:hover reference undefined custom properties (--border-color and
--accent-color); update those rules to use the existing variables
(--border-primary and --accent) or add matching variable definitions. Locate the
selectors .copy-btn, pre:hover .copy-btn and .copy-btn:hover in
src/docsfy/static/style.css and either replace --border-color with
--border-primary and --accent-color with --accent, or add :root declarations for
--border-color and --accent-color mapping to the existing values so the button
borders, background and color resolve correctly.
- Around line 1075-1117: The CSS uses undefined variables (--border-color,
--accent-color, and --text-secondary) in selectors like .page-nav,
.page-nav-link, .page-nav-link:hover, .page-nav-label, and .page-nav-title; fix
by adding default fallbacks or defining those variables at the root (e.g.,
:root) so the styles render predictably — update the stylesheet to either
declare --border-color, --accent-color, and --text-secondary with appropriate
values or change usages to var(--border-color, <fallback>), var(--accent-color,
<fallback>), and var(--text-secondary, <fallback>) in the .page-nav and related
rules (page-nav, page-nav-link, page-nav-link:hover, page-nav-label,
page-nav-title).

In `@src/docsfy/static/theme.js`:
- Around line 3-15: Wrap all accesses to localStorage in try-catch to avoid
SecurityError/QuotaExceededError: when reading the initial theme, guard the call
to localStorage.getItem('theme') (the variable stored) with try-catch and treat
failures as "no stored theme" so the existing prefers-color-scheme fallback
runs; likewise, inside the toggle click handler, wrap
localStorage.setItem('theme', next) in try-catch so toggling still updates
data-theme even if storage write fails. Update the code around the stored
variable and the toggle.addEventListener callback to catch and ignore storage
exceptions (optionally log) without breaking theme application.

In `@src/docsfy/templates/page.html`:
- Line 77: The template currently renders AI-generated HTML with the Jinja2
|safe filter for variables content and toc, which bypasses autoescaping and
permits XSS; to fix, sanitize the markdown-generated HTML before passing it to
the template by updating the markdown-to-HTML pipeline (e.g., the function
_md_to_html or whichever converter is used in generate_page) to call a sanitizer
like bleach.clean on both the converted content_html and toc_html (specifying
allowed_tags/attributes and strip=True), then either remove |safe from the
template or rename the sanitized values to content_sanitized/toc_sanitized and
use those in page.html to ensure only cleaned HTML is rendered.

In `@tests/test_config.py`:
- Around line 12-13: The tests are nondeterministic because Settings() still
reads .env files even when patch.dict(os.environ, {}, clear=True) is used;
update each test that currently calls Settings() (the instances created
alongside patch.dict(os.environ, ..., clear=True)) to instantiate Settings with
_env_file=None (e.g., Settings(_env_file=None)) so the Settings class won’t load
any .env file and the environment is fully controlled by the patched os.environ;
keep the patch.dict(...) usage but replace bare Settings() calls with
Settings(_env_file=None).

In `@tox.toml`:
- Around line 1-3: Replace the legacy tox keys: change the "skipsdist" setting
to the canonical "no_package" and rename "envlist" to "env_list" so the tox.toml
uses tox 4 standard keys (update the entries for "skipsdist" and "envlist"
accordingly).

---

Outside diff comments:
In `@docs/plans/2026-03-04-docsfy-design.md`:
- Around line 112-116: The AI_MODEL default contains a stray terminal escape
artifact "claude-opus-4-6[1m]"; update the default value for AI_MODEL to the
correct model identifier (e.g., "claude-opus-4-6") wherever it's defined in this
document so the table entry reads `AI_MODEL | claude-opus-4-6` and remove the
"[1m]" sequence.

---

Minor comments:
In `@docs/plans/2026-03-04-docsfy-implementation-plan.md`:
- Around line 18-21: The plan references local filesystem paths for artifacts
(.pre-commit-config.yaml, .flake8, tox.toml, .gitleaks.toml) which won't resolve
for other devs; replace those local path references in the docs/plans entry with
either the actual content (inline the files or paste their contents into the
repo under the same names) or point to a stable public location (a project repo
URL or gist) where the files can be fetched, and update the bullet lines to
reference the new repository/URLs or the relative paths within this repo instead
of /home/... paths.
- Around line 131-132: The AI_MODEL default contains a terminal escape artifact
("claude-opus-4-6[1m]"); remove the ANSI fragment and replace all occurrences
with the intended model name (e.g., "claude-opus-4-6") wherever AI_MODEL is
defined or referenced in this document (notably the instances matching the shown
diff), ensuring other related variables like AI_CLI_TIMEOUT remain unchanged;
search for "AI_MODEL" and replace any value containing "[1m]" with the clean
model string.

In `@README.md`:
- Around line 17-25: Update the Quick Start commands to avoid
platform-specific/misleading instructions: change the instructions that
currently use "docker compose up" to recommend "docker compose up -d" (or note
foreground behavior) and replace the macOS-only "open
http://localhost:8000/docs/repo/" with cross-platform guidance (e.g., mention
using the URL directly or platform commands like "xdg-open" on Linux and "start"
on Windows), and keep the existing curl POST example as-is for generating docs.

In `@src/docsfy/static/codelabels.js`:
- Line 5: The regex in the classes.match call is too narrow and misses language
identifiers with characters like '-', '+', or '#'; update the pattern used in
the classes.match invocation (the variable match assignment in
src/docsfy/static/codelabels.js) to allow those characters (e.g., include - + #
alongside word chars in the capture group) so code blocks with names containing
hyphens or symbols are correctly detected and labeled.

In `@src/docsfy/static/github.js`:
- Around line 9-13: The current regex in the repoUrl.match call uses ([^/.]+)
which stops at a dot and truncates valid repo names; update the pattern used in
the github.com match (the repoUrl.match invocation and its resulting match
handling for owner and repo) to allow dots in repo names and optionally strip a
trailing .git (e.g. match the owner with ([^/]+) and the repo with
([^/]+)(?:\.git)?), then assign owner = match[1] and repo = match[2] as before
so full repo names like owner/my.repo are preserved.

In `@src/docsfy/static/search.js`:
- Around line 74-76: The current index.filter callback uses
item.title.toLowerCase() and item.content.toLowerCase() which will throw if
title or content are missing; update the filter in the search logic (the
index.filter callback that produces matches) to safely coerce title/content to
strings (e.g., let title = (item.title || '').toLowerCase(); let content =
(item.content || '').toLowerCase()) and then use title.includes(q) ||
content.includes(q); apply the same defensive check to the other search
occurrence around the code that performs the second filter (the similar block at
lines 91-95) so all searches tolerate malformed entries.

In `@tests/test_config.py`:
- Line 42: Replace the generic exception assertion in the test (the "with
pytest.raises(Exception):" block) with a specific ValidationError by changing it
to "with pytest.raises(ValidationError):" and ensure the test file imports
ValidationError (e.g., "from pydantic import ValidationError" or the project's
ValidationError class) so the test only catches the intended validation failure.

In `@tests/test_models.py`:
- Around line 35-36: Replace broad Exception assertions in the tests with
Pydantic's ValidationError: change pytest.raises(Exception) to
pytest.raises(ValidationError) when instantiating GenerateRequest (and the other
failing cases noted for the same model), and import ValidationError from
pydantic at the top of tests/test_models.py so the tests assert the specific
validation error raised by GenerateRequest's Pydantic validators.

---

Nitpick comments:
In @.pre-commit-config.yaml:
- Line 34: The listed VCS dependency
git+https://github.com/RedHatQE/flake8-plugins.git for the hook flake8-mutable
is unpinned and should be fixed to an immutable ref; update the entry in
.pre-commit-config.yaml that contains the string
"git+https://github.com/RedHatQE/flake8-plugins.git" (and the hook name
"flake8-mutable") to reference a specific tag or commit SHA (e.g., append
@<tag-or-sha> or set rev: to a SHA) so the pre-commit hook is pinned and
reproducible.

In `@src/docsfy/static/style.css`:
- Around line 628-633: The selector
"blockquote:not(.callout-note):not(.callout-warning):not(.callout-tip)" is
flagged by Stylelint for chained :not() usage; replace the chained simple :not()
pseudo-classes with a single complex :not() containing the comma-separated list
of the three callout classes so Stylelint passes and the rule remains
equivalent, updating the block where the selector is defined in style.css (the
blockquote selector) and ensuring spacing and variable usage (border-left,
padding, margin, color) remain unchanged.

In `@src/docsfy/storage.py`:
- Around line 49-78: The dynamic SQL in update_project_status builds the fields
list from hardcoded column names which is safe but trigged an S608 false
positive; add a brief comment above the fields/values construction (near the
symbols fields, values and DB_PATH in update_project_status) stating that fields
are only populated with predetermined column names (no user-controlled input)
and therefore safe from SQL injection, or alternatively replace the implicit
appends with an explicit allowed_columns/column-to-placeholder mapping to make
the guarantee obvious to future maintainers and linters.
- Around line 8-10: DB_PATH, DATA_DIR, and PROJECTS_DIR are computed at import
time which makes tests that mutate module attributes fragile; change them to be
computed lazily by replacing the module-level constants with accessor functions
(e.g., get_db_path(), get_data_dir(), get_projects_dir()) or properties that
read os.getenv() on each call, update all call sites and tests to use these
accessors, and ensure tests set environment variables (or monkeypatch the
accessors) before calling the accessors so parallel/reimport scenarios no longer
rely on mutable module state.

In `@src/docsfy/templates/index.html`:
- Around line 43-44: The sidebar toggle button with id "sidebar-toggle" must
expose its state to ATs: add an aria-controls attribute pointing to the
controlled sidebar element id (e.g., "sidebar") and ensure the button maintains
an accurate aria-expanded boolean that is updated when the toggle runs; locate
the toggle element (id "sidebar-toggle") and the sidebar element (class or id
"sidebar") and update the toggle handler (e.g., the click listener or function
that shows/hides the sidebar) to set button.setAttribute('aria-expanded',
String(isOpen)) and button.setAttribute('aria-controls', sidebarId) whenever the
visibility changes so screen readers can detect the relationship and current
state.

In `@tests/test_renderer.py`:
- Around line 56-73: Update the test_search_index_generated test to not only
check existence of search-index.json but to open and json.load the file produced
by render_site, assert it is valid JSON (list or dict as expected by your
renderer), and assert at least one entry contains the expected page data (e.g.,
an entry with slug "intro" and/or title "Intro" or searchable content
substring). Use the same output_dir / "search-index.json" path, call json.loads
or json.load on that file, and add assertions on the structure and presence of
the expected keys/values to catch serialization regressions.

In `@tests/test_repository.py`:
- Around line 23-27: Add assertions that the patched subprocess.run was called
with the exact expected arguments to lock behavior: after the test exercise of
docsfy.repository functions that invokes subprocess.run, assert
mock_run.assert_any_call(...) (or inspect mock_run.call_args_list) includes the
expected argv list containing the command elements and the '--' separator and
that timeout kwarg is present with the expected value; reference the patched
symbol mock_run (from patch("docsfy.repository.subprocess.run")) and the
subprocess invocation in the repository code to check both positional argv
contents and keyword args (timeout) rather than only return values.

In `@tox.toml`:
- Around line 13-26: The tox env 'env.unittests' currently shells out to "uv run
--extra dev pytest -n auto" which delegates dependency management to uv; change
it to a tox-native setup by removing the "uv run" invocation and instead declare
explicit deps like "pytest" and "pytest-xdist" in the env.unittests deps list
and run "pytest -n auto tests" directly in the commands array; update the
env.unittests block (look for the deps/commands entries) so tox installs and
controls test runner dependencies rather than relying on the project's dev
extra.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a047a6c9-c6ca-44e9-bdf8-c9a414ed8eeb

📥 Commits

Reviewing files that changed from the base of the PR and between c6db255 and a8f6b3c.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (44)
  • .env.example
  • .flake8
  • .gitleaks.toml
  • .pre-commit-config.yaml
  • Dockerfile
  • README.md
  • docker-compose.yaml
  • docs/plans/2026-03-04-docsfy-design.md
  • docs/plans/2026-03-04-docsfy-implementation-plan.md
  • pyproject.toml
  • src/docsfy/__init__.py
  • src/docsfy/ai_client.py
  • src/docsfy/config.py
  • src/docsfy/generator.py
  • src/docsfy/json_parser.py
  • src/docsfy/main.py
  • src/docsfy/models.py
  • src/docsfy/prompts.py
  • src/docsfy/renderer.py
  • src/docsfy/repository.py
  • src/docsfy/static/callouts.js
  • src/docsfy/static/codelabels.js
  • src/docsfy/static/copy.js
  • src/docsfy/static/github.js
  • src/docsfy/static/scrollspy.js
  • src/docsfy/static/search.js
  • src/docsfy/static/style.css
  • src/docsfy/static/theme.js
  • src/docsfy/storage.py
  • src/docsfy/templates/index.html
  • src/docsfy/templates/page.html
  • tests/__init__.py
  • tests/test_ai_client.py
  • tests/test_config.py
  • tests/test_generator.py
  • tests/test_integration.py
  • tests/test_json_parser.py
  • tests/test_main.py
  • tests/test_models.py
  • tests/test_prompts.py
  • tests/test_renderer.py
  • tests/test_repository.py
  • tests/test_storage.py
  • tox.toml

@@ -0,0 +1,21 @@
# AI Configuration
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Anthropic Claude model IDs supported environment variable 2025 2026

💡 Result:

Environment variables (Claude Code) and what model IDs you can set (2025–2026)

1) ANTHROPIC_MODEL (directly sets the model)

Claude Code supports setting a full Anthropic model ID via ANTHROPIC_MODEL. The Claude Help Center lists these supported model IDs (at least): claude-opus-4-1-20250805, claude-opus-4-20250514, claude-sonnet-4-20250514, claude-3-7-sonnet-20250219, claude-3-5-haiku-20241022. [1]

2) Alias-mapping env vars (set what sonnet / opus / haiku mean)

Claude Code also supports environment variables that control which full model names the aliases map to: ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL, plus CLAUDE_CODE_SUBAGENT_MODEL for subagents. [2]
(And ANTHROPIC_SMALL_FAST_MODEL is deprecated in favor of ANTHROPIC_DEFAULT_HAIKU_MODEL.) [2]

3) If you mean “supported model IDs” for the Anthropic API (not just Claude Code)

Use the Models API (GET /v1/models) to retrieve the current list of available model IDs; the response returns model id values (example shown: claude-sonnet-4-20250514). [3]

4) Deprecations/retirements that affect whether an ID still works

For example, Claude Sonnet 3.5 model IDs claude-3-5-sonnet-20240620 and claude-3-5-sonnet-20241022 were deprecated and scheduled for retirement (Oct 22, 2025), with a recommendation to migrate to claude-sonnet-4-20250514. [4]

Sources: [1] [2] [3] [4]


Replace the malformed default AI_MODEL value with a valid Claude model ID.

claude-opus-4-6[1m] contains a trailing ANSI escape sequence ([1m) and an incomplete model identifier. Claude model IDs require date suffixes; the correct format is claude-opus-4-20250514 (or claude-opus-4-1-20250805 for the latest version).

Suggested fix
-AI_MODEL=claude-opus-4-6[1m]
+AI_MODEL=claude-opus-4-20250514
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
AI_MODEL=claude-opus-4-6[1m]
AI_MODEL=claude-opus-4-20250514
🧰 Tools
🪛 dotenv-linter (4.0.0)

[warning] 3-3: [UnorderedKey] The AI_MODEL key should go before the AI_PROVIDER key

(UnorderedKey)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.env.example at line 3, The AI_MODEL value in .env.example is malformed
("claude-opus-4-6[1m"); replace it with a valid Claude model ID (for example
"claude-opus-4-20250514" or the newer "claude-opus-4-1-20250805") so the
AI_MODEL environment variable uses a correct, complete model identifier without
ANSI sequences.

Comment on lines +4 to +7
[allowlist]
paths = [
'''tests/.*\.py''',
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Narrow the GitLeaks allowlist scope.

This allowlist exempts all Python test files from secret scanning, which can mask real leaked credentials in test code/fixtures. Prefer rule-level allowlists for specific known false positives instead of path-wide exclusions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitleaks.toml around lines 4 - 7, The allowlist entry under the [allowlist]
section currently exempts all Python test files via the paths list
('''tests/.*\.py''') which is too broad; instead, narrow the scope by removing
the blanket paths pattern and migrate exemptions to rule-level allowlists for
specific known false positives (or list explicit filenames rather than a
directory-wide regex). Update the config to remove or tighten the
'''tests/.*\.py''' path entry and add rule-specific allowlist entries
referencing the offending rule IDs or exact test fixture filenames so only known
benign files are exempted.

)

ai_provider: str = "claude"
ai_model: str = "claude-opus-4-6[1m]"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/docsfy/config.py | head -30

Repository: myk-org/docsfy

Length of output: 800


🌐 Web query:

Anthropic Claude valid model IDs API documentation latest 2024 2025

💡 Result:

Where to get the currently valid Claude model IDs (authoritative)

  • Use the Models API: GET /v1/models to list all models your API key can use; the returned data[].id values are the valid model IDs. Example docs + sample response are in Anthropic’s “List Models” reference. [1]
  • To validate/resolve a specific ID (or alias), use GET /v1/models/{model_id} (“Get a Model”). [2]

2024–2025 model IDs explicitly referenced in Anthropic docs / release notes

  • Claude Sonnet 3.5 (deprecated Aug 13, 2025; retired Oct 22, 2025):
    • claude-3-5-sonnet-20240620
    • claude-3-5-sonnet-20241022 [3]
  • Claude Sonnet 4 (recommended migration target in the same deprecation notice):
    • claude-sonnet-4-20250514 [3]
  • Claude Opus 3 (snapshot ID from 2024; retired Jan 5, 2026—so no longer valid today):
    • claude-3-opus-20240229 [4]

“Latest” note about documentation support in late 2024

  • Anthropic’s release notes state the Models API became generally available on Dec 17, 2024, enabling querying available models, validating model IDs, and resolving aliases. [3]

Sources:
[1] Anthropic docs — List Models (GET /v1/models) https://platform.claude.com/docs/en/api/models/list
[2] Anthropic docs — Get a Model (GET /v1/models/{model_id}) https://platform.claude.com/docs/en/api/models/retrieve
[3] Anthropic docs — API release notes (includes Sonnet 3.5 deprecation + retirement date; migration to Sonnet 4 claude-sonnet-4-20250514; Models API GA Dec 17, 2024) https://platform.claude.com/docs/en/release-notes/overview
[4] Claude Developer Platform release notes — Opus 3 retirement (claude-3-opus-20240229) https://platform.claude.com/docs/en/release-notes/overview


Default ai_model is invalid and will fail API calls.

Line 17 contains "claude-opus-4-6[1m]" which has two problems:

  1. The [1m] suffix is an ANSI escape sequence, not a valid model identifier
  2. Even without it, claude-opus-4-6 is not a supported Anthropic Claude model ID

Current valid model IDs per Anthropic documentation include claude-sonnet-4-20250514. Update to a supported model identifier.

💡 Proposed fix
-    ai_model: str = "claude-opus-4-6[1m]"
+    ai_model: str = "claude-sonnet-4-20250514"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/config.py` at line 17, The default ai_model string is invalid;
remove the ANSI suffix and replace the unsupported model id by setting the
ai_model variable (ai_model: str) to a supported Anthropic model identifier such
as "claude-sonnet-4-20250514" so API calls will succeed (i.e., change ai_model
from "claude-opus-4-6[1m]" to "claude-sonnet-4-20250514").

Comment on lines +67 to +68
cache_file = cache_dir / f"{slug}.md"
if use_cache and cache_file.exists():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reject unsafe page slugs before filesystem writes.

slug is AI-controlled and used directly in file paths. A value like ../../tmp/pwn can escape cache_dir and write outside intended storage.

🛡️ Proposed fix
+import re as _re
 ...
 MAX_CONCURRENT_PAGES = 5
+_SLUG_RE = _re.compile(r"^[A-Za-z0-9][A-Za-z0-9._-]*$")

+def _cache_file_for_slug(cache_dir: Path, slug: str) -> Path:
+    if not _SLUG_RE.fullmatch(slug):
+        msg = f"Invalid page slug: {slug!r}"
+        raise ValueError(msg)
+    cache_root = cache_dir.resolve()
+    cache_file = (cache_root / f"{slug}.md").resolve()
+    cache_file.relative_to(cache_root)
+    return cache_file
 ...
-    cache_file = cache_dir / f"{slug}.md"
+    cache_file = _cache_file_for_slug(cache_dir, slug)

Also applies to: 87-88, 107-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/generator.py` around lines 67 - 68, Reject or sanitize unsafe
AI-controlled slugs before using them to construct filesystem paths: validate
the `slug` used when creating `cache_file = cache_dir / f"{slug}.md"` (and the
other occurrences around the `use_cache` checks at the later blocks) to ensure
it contains only allowed characters (e.g., alphanumerics, hyphen, underscore),
does not contain path separators like "..", "/", or "\" and does not start with
a dot; alternatively resolve the resulting path and assert it is inside
`cache_dir` (e.g., compare resolved parents) before any open/write operations,
and raise/return an error for invalid slugs so no file can be written outside
`cache_dir`.

Comment on lines +75 to +108
@app.post("/api/generate", status_code=202)
async def generate(request: GenerateRequest) -> dict[str, str]:
settings = get_settings()
ai_provider = request.ai_provider or settings.ai_provider
ai_model = request.ai_model or settings.ai_model
project_name = request.project_name

if project_name in _generating:
raise HTTPException(
status_code=409,
detail=f"Project '{project_name}' is already being generated",
)

_generating.add(project_name)

await save_project(
name=project_name,
repo_url=request.repo_url or request.repo_path or "",
status="generating",
)

asyncio.create_task(
_run_generation(
repo_url=request.repo_url,
repo_path=request.repo_path,
project_name=project_name,
ai_provider=ai_provider,
ai_model=ai_model,
ai_cli_timeout=request.ai_cli_timeout or settings.ai_cli_timeout,
force=request.force,
)
)

return {"project": project_name, "status": "generating"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden generation admission: validate project names and unwind _generating on enqueue failure.

Right now enqueue can accept a project name that later fails route validation (e.g., local directory names with spaces), and _generating can get stuck if save_project/create_task fails after insertion.

✅ Proposed fix
 `@app.post`("/api/generate", status_code=202)
 async def generate(request: GenerateRequest) -> dict[str, str]:
     settings = get_settings()
     ai_provider = request.ai_provider or settings.ai_provider
     ai_model = request.ai_model or settings.ai_model
-    project_name = request.project_name
+    project_name = _validate_project_name(request.project_name)

     if project_name in _generating:
         raise HTTPException(
             status_code=409,
             detail=f"Project '{project_name}' is already being generated",
         )

     _generating.add(project_name)
-
-    await save_project(
-        name=project_name,
-        repo_url=request.repo_url or request.repo_path or "",
-        status="generating",
-    )
-
-    asyncio.create_task(
-        _run_generation(
-            repo_url=request.repo_url,
-            repo_path=request.repo_path,
-            project_name=project_name,
-            ai_provider=ai_provider,
-            ai_model=ai_model,
-            ai_cli_timeout=request.ai_cli_timeout or settings.ai_cli_timeout,
-            force=request.force,
-        )
-    )
+    try:
+        await save_project(
+            name=project_name,
+            repo_url=request.repo_url or request.repo_path or "",
+            status="generating",
+        )
+        asyncio.create_task(
+            _run_generation(
+                repo_url=request.repo_url,
+                repo_path=request.repo_path,
+                project_name=project_name,
+                ai_provider=ai_provider,
+                ai_model=ai_model,
+                ai_cli_timeout=request.ai_cli_timeout or settings.ai_cli_timeout,
+                force=request.force,
+            )
+        )
+    except Exception:
+        _generating.discard(project_name)
+        raise

     return {"project": project_name, "status": "generating"}
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 96-106: Store a reference to the return value of asyncio.create_task

(RUF006)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/main.py` around lines 75 - 108, The generate route is adding
project_name into the _generating set before validating it and not removing it
if downstream save_project or task creation fails; update the generate function
to first validate/normalize project_name (e.g., disallow spaces/special chars or
run existing route-name validation) and reject with HTTPException if invalid,
then add to _generating only after validation, and wrap the save_project call
and asyncio.create_task invocation in try/except/finally so that on any
exception you remove project_name from _generating and re-raise or return a
500/appropriate HTTPException; reference the generate function, the _generating
set, save_project, and _run_generation/asyncio.create_task when implementing
these changes.

Comment on lines +1075 to +1117
.page-nav {
display: flex;
justify-content: space-between;
gap: 1rem;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid var(--border-color);
}

.page-nav-link {
display: flex;
flex-direction: column;
padding: 1rem 1.25rem;
border: 1px solid var(--border-color);
border-radius: 8px;
text-decoration: none;
transition: all 0.15s ease;
max-width: 50%;
}

.page-nav-link:hover {
border-color: var(--accent-color);
box-shadow: 0 2px 8px rgba(79, 70, 229, 0.1);
}

.page-nav-next {
text-align: right;
margin-left: auto;
}

.page-nav-label {
font-size: 0.75rem;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--text-secondary);
margin-bottom: 0.25rem;
}

.page-nav-title {
font-size: 0.95rem;
font-weight: 600;
color: var(--accent-color);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Additional undefined CSS variable references in page navigation styles.

The .page-nav and related classes also reference undefined --border-color and --accent-color variables.

🐛 Proposed fix for page navigation styles
 .page-nav {
     display: flex;
     justify-content: space-between;
     gap: 1rem;
     margin-top: 3rem;
     padding-top: 2rem;
-    border-top: 1px solid var(--border-color);
+    border-top: 1px solid var(--border-primary);
 }

 .page-nav-link {
     display: flex;
     flex-direction: column;
     padding: 1rem 1.25rem;
-    border: 1px solid var(--border-color);
+    border: 1px solid var(--border-primary);
     border-radius: 8px;
     text-decoration: none;
     transition: all 0.15s ease;
     max-width: 50%;
 }

 .page-nav-link:hover {
-    border-color: var(--accent-color);
+    border-color: var(--accent);
     box-shadow: 0 2px 8px rgba(79, 70, 229, 0.1);
 }
 .page-nav-title {
     font-size: 0.95rem;
     font-weight: 600;
-    color: var(--accent-color);
+    color: var(--accent);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.page-nav {
display: flex;
justify-content: space-between;
gap: 1rem;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid var(--border-color);
}
.page-nav-link {
display: flex;
flex-direction: column;
padding: 1rem 1.25rem;
border: 1px solid var(--border-color);
border-radius: 8px;
text-decoration: none;
transition: all 0.15s ease;
max-width: 50%;
}
.page-nav-link:hover {
border-color: var(--accent-color);
box-shadow: 0 2px 8px rgba(79, 70, 229, 0.1);
}
.page-nav-next {
text-align: right;
margin-left: auto;
}
.page-nav-label {
font-size: 0.75rem;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--text-secondary);
margin-bottom: 0.25rem;
}
.page-nav-title {
font-size: 0.95rem;
font-weight: 600;
color: var(--accent-color);
}
.page-nav {
display: flex;
justify-content: space-between;
gap: 1rem;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid var(--border-primary);
}
.page-nav-link {
display: flex;
flex-direction: column;
padding: 1rem 1.25rem;
border: 1px solid var(--border-primary);
border-radius: 8px;
text-decoration: none;
transition: all 0.15s ease;
max-width: 50%;
}
.page-nav-link:hover {
border-color: var(--accent);
box-shadow: 0 2px 8px rgba(79, 70, 229, 0.1);
}
.page-nav-next {
text-align: right;
margin-left: auto;
}
.page-nav-label {
font-size: 0.75rem;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--text-secondary);
margin-bottom: 0.25rem;
}
.page-nav-title {
font-size: 0.95rem;
font-weight: 600;
color: var(--accent);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/static/style.css` around lines 1075 - 1117, The CSS uses undefined
variables (--border-color, --accent-color, and --text-secondary) in selectors
like .page-nav, .page-nav-link, .page-nav-link:hover, .page-nav-label, and
.page-nav-title; fix by adding default fallbacks or defining those variables at
the root (e.g., :root) so the styles render predictably — update the stylesheet
to either declare --border-color, --accent-color, and --text-secondary with
appropriate values or change usages to var(--border-color, <fallback>),
var(--accent-color, <fallback>), and var(--text-secondary, <fallback>) in the
.page-nav and related rules (page-nav, page-nav-link, page-nav-link:hover,
page-nav-label, page-nav-title).

Comment on lines +3 to +15
var stored = localStorage.getItem('theme');
if (stored) {
document.documentElement.setAttribute('data-theme', stored);
} else if (window.matchMedia && window.matchMedia('(prefers-color-scheme: dark)').matches) {
document.documentElement.setAttribute('data-theme', 'dark');
}
if (toggle) {
toggle.addEventListener('click', function() {
var current = document.documentElement.getAttribute('data-theme');
var next = current === 'dark' ? 'light' : 'dark';
document.documentElement.setAttribute('data-theme', next);
localStorage.setItem('theme', next);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the file mentioned in the review
fd -type f -name "theme.js" | grep -E "docsfy|static"

Repository: myk-org/docsfy

Length of output: 228


🏁 Script executed:

# If the above doesn't find it, let's search more broadly
find . -type f -name "theme.js" 2>/dev/null | head -20

Repository: myk-org/docsfy

Length of output: 86


🏁 Script executed:

# Also check the repository structure to understand the context
git ls-files | grep -E "(theme|static)" | head -20

Repository: myk-org/docsfy

Length of output: 287


🏁 Script executed:

# Read the theme.js file to verify the current code state
cat -n ./src/docsfy/static/theme.js

Repository: myk-org/docsfy

Length of output: 858


🌐 Web query:

localStorage access restrictions browser security contexts where it throws exceptions

💡 Result:

When localStorage access is restricted and may throw

Per the HTML spec, window.localStorage throws a SecurityError DOMException when either (a) the document’s origin is opaque, or (b) the access violates a user-agent policy decision (for example, storage is disabled / persistence not allowed). [1]

1) Opaque originSecurityError

Common ways to end up with an opaque origin (often exposed as location.origin === "null") include:

  • Sandboxed <iframe> without origin retention (i.e., sandboxing that results in a unique/opaque origin) — then storage access is blocked by virtue of being opaque. [1]
  • Navigations/URLs that create opaque origins (commonly discussed as data: and some sandboxed/initial about:* cases), which again triggers the spec’s “opaque origin ⇒ throw” rule. [1]

2) Browser privacy / policy blocks (especially in embedded/3rd-party contexts) → SecurityError

  • Firefox “Storage Access Policy”: if a resource is classified as a tracker and loaded in a third-party context, Window.localStorage read/write attempts throw SecurityError. [2]
  • More broadly, browsers that block “third-party cookies and other site storage” may restrict access in third-party iframes; the Storage Access API exists specifically so embedded cross-site content can request access that would otherwise be denied. [3]

3) Quota exceeded on writeQuotaExceededError

  • localStorage.setItem(...) can throw QuotaExceededError when the origin hits its storage limit. [4][5]

What to do defensively

  • Treat localStorage as “may be unavailable” and wrap both the getter and writes in try/catch (because failure can happen at access-time, not just at setItem). This follows directly from the spec-defined throws (SecurityError) [1] and the quota behavior (QuotaExceededError). [4][5]

Sources: [1] WHATWG HTML Standard (Web Storage) • [2] MDN Storage Access Policy (Firefox) • [3] MDN Storage Access API • [4] MDN Storage.setItem() • [5] MDN Storage quotas & eviction criteria


Guard localStorage access to prevent runtime crashes.

localStorage.getItem() and localStorage.setItem() throw SecurityError in restricted contexts (opaque origins, Firefox tracker blocking, private browsing mode, disabled storage) and QuotaExceededError when storage quota is exceeded. These exceptions will break theme initialization and toggling. Wrap both calls in try-catch blocks:

Suggested fix
-  var stored = localStorage.getItem('theme');
+  var stored = null;
+  try {
+    stored = window.localStorage.getItem('theme');
+  } catch (_) {}
@@
-      localStorage.setItem('theme', next);
+      try {
+        window.localStorage.setItem('theme', next);
+      } catch (_) {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var stored = localStorage.getItem('theme');
if (stored) {
document.documentElement.setAttribute('data-theme', stored);
} else if (window.matchMedia && window.matchMedia('(prefers-color-scheme: dark)').matches) {
document.documentElement.setAttribute('data-theme', 'dark');
}
if (toggle) {
toggle.addEventListener('click', function() {
var current = document.documentElement.getAttribute('data-theme');
var next = current === 'dark' ? 'light' : 'dark';
document.documentElement.setAttribute('data-theme', next);
localStorage.setItem('theme', next);
});
var stored = null;
try {
stored = window.localStorage.getItem('theme');
} catch (_) {}
if (stored) {
document.documentElement.setAttribute('data-theme', stored);
} else if (window.matchMedia && window.matchMedia('(prefers-color-scheme: dark)').matches) {
document.documentElement.setAttribute('data-theme', 'dark');
}
if (toggle) {
toggle.addEventListener('click', function() {
var current = document.documentElement.getAttribute('data-theme');
var next = current === 'dark' ? 'light' : 'dark';
document.documentElement.setAttribute('data-theme', next);
try {
window.localStorage.setItem('theme', next);
} catch (_) {}
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/static/theme.js` around lines 3 - 15, Wrap all accesses to
localStorage in try-catch to avoid SecurityError/QuotaExceededError: when
reading the initial theme, guard the call to localStorage.getItem('theme') (the
variable stored) with try-catch and treat failures as "no stored theme" so the
existing prefers-color-scheme fallback runs; likewise, inside the toggle click
handler, wrap localStorage.setItem('theme', next) in try-catch so toggling still
updates data-theme even if storage write fails. Update the code around the
stored variable and the toggle.addEventListener callback to catch and ignore
storage exceptions (optionally log) without breaking theme application.

<article class="article">
<h1 class="article-title">{{ title }}</h1>
<div class="article-body">
{{ content | safe }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Step 1: Find all uses of |safe filter in templates
echo "=== Finding |safe usage in templates ==="
rg -n "safe" src/docsfy/templates --type html -B2 -A2

echo ""
echo "=== Searching for sanitization-related code ==="
rg -n "bleach|sanitize|clean|html|escape" src/docsfy -i --type py -A3 -B1

echo ""
echo "=== Looking for renderer.py ==="
find . -name "renderer.py" -type f 2>/dev/null

Repository: myk-org/docsfy

Length of output: 5542


🏁 Script executed:

cat -n src/docsfy/renderer.py | head -100

Repository: myk-org/docsfy

Length of output: 3672


🏁 Script executed:

# Find where render_page is called
rg -n "render_page|render_site" src/docsfy -B3 -A3

# Check for input validation on markdown content
rg -n "markdown_content|md_content" src/docsfy -B2 -A2 | head -60

Repository: myk-org/docsfy

Length of output: 3562


🏁 Script executed:

# Find generate_all_pages definition and usage
rg -n "def generate_all_pages|generate_all_pages" src/docsfy -B2 -A5

# Check generator.py to see where markdown comes from
head -150 src/docsfy/generator.py | tail -100

Repository: myk-org/docsfy

Length of output: 4345


🏁 Script executed:

# Let me check if there's any markdown configuration that might strip HTML
rg -n "markdown|Markdown" src/docsfy/generator.py -B3 -A3

# Also check what markdown library version and extensions are used
grep -r "markdown" setup.py pyproject.toml requirements.txt 2>/dev/null || echo "No dependency files found in expected locations"
find . -name "*.txt" -o -name "pyproject.toml" -o -name "setup.py" | head -10

Repository: myk-org/docsfy

Length of output: 153


🏁 Script executed:

cat pyproject.toml | grep -A 50 "\[project\]" | grep -E "dependencies|markdown|bleach|sanitize"

Repository: myk-org/docsfy

Length of output: 122


Sanitize AI-generated HTML before rendering as raw content.

content and toc are rendered with |safe filter, bypassing Jinja2's autoescaping. Since both originate from AI-generated markdown (via generate_page), malicious or jailbroken AI output can inject stored XSS into served documentation affecting all users. The Python markdown library does not sanitize HTML—it preserves raw tags passed through markdown.

Add sanitization using a library like bleach before rendering:

from bleach import clean

def _md_to_html(md_text: str) -> tuple[str, str]:
    """Convert markdown to HTML. Returns (content_html, toc_html)."""
    md = markdown.Markdown(...)
    content_html = md.convert(md_text)
    # Sanitize the generated HTML
    content_html = clean(content_html, tags=allowed_tags, strip=True)
    toc_html = clean(getattr(md, "toc", ""), tags=allowed_tags, strip=True)
    return content_html, toc_html

Then remove |safe from the template, or rename context variables to content_sanitized and toc_sanitized to clarify the contract.

Also applies to: 102-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/docsfy/templates/page.html` at line 77, The template currently renders
AI-generated HTML with the Jinja2 |safe filter for variables content and toc,
which bypasses autoescaping and permits XSS; to fix, sanitize the
markdown-generated HTML before passing it to the template by updating the
markdown-to-HTML pipeline (e.g., the function _md_to_html or whichever converter
is used in generate_page) to call a sanitizer like bleach.clean on both the
converted content_html and toc_html (specifying allowed_tags/attributes and
strip=True), then either remove |safe from the template or rename the sanitized
values to content_sanitized/toc_sanitized and use those in page.html to ensure
only cleaned HTML is rendered.

Comment on lines +12 to +13
with patch.dict(os.environ, {}, clear=True):
settings = Settings()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the relevant files
fd -t f "config.py|test_config.py" --type f

Repository: myk-org/docsfy

Length of output: 99


🏁 Script executed:

# Check the structure
git ls-files | grep -E "(config|test)" | head -20

Repository: myk-org/docsfy

Length of output: 373


🏁 Script executed:

# Examine the Settings class definition
cat -n src/docsfy/config.py

Repository: myk-org/docsfy

Length of output: 800


🏁 Script executed:

# Look at the test file
cat -n tests/test_config.py

Repository: myk-org/docsfy

Length of output: 1533


🏁 Script executed:

# Check if .env file exists
ls -la .env 2>&1 || echo "No .env file found"
fd -t f ".env" --max-one-result 2>&1 || true

Repository: myk-org/docsfy

Length of output: 362


🏁 Script executed:

# Check pydantic-settings version and documentation on env_file parameter
rg -n "pydantic" pyproject.toml setup.py requirements.txt 2>/dev/null | head -10

Repository: myk-org/docsfy

Length of output: 100


🏁 Script executed:

# Search for any env_file configuration overrides or _env_file usage
rg -n "_env_file|env_file" src/ tests/

Repository: myk-org/docsfy

Length of output: 165


🌐 Web query:

pydantic-settings BaseSettings disable env_file loading at instantiation

💡 Result:

To disable .env / env_file loading for a specific instantiation (even if your BaseSettings class has model_config = SettingsConfigDict(env_file=...)), pass _env_file=None:

from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_file=".env")

s = Settings(_env_file=None)  # disables dotenv file loading for this instance

Pydantic explicitly documents that _env_file passed at instantiation overrides model_config['env_file'], and that passing None means “do not load any env file”. [1][2]


Tests are non-deterministic because .env files bypass patch.dict.

patch.dict(os.environ, ..., clear=True) only clears process env vars. Settings() still loads .env if present, so local files can alter these tests unexpectedly.

Apply to lines 12-13, 30-31, and 41-43:

Fix
-        settings = Settings()
+        settings = Settings(_env_file=None)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_config.py` around lines 12 - 13, The tests are nondeterministic
because Settings() still reads .env files even when patch.dict(os.environ, {},
clear=True) is used; update each test that currently calls Settings() (the
instances created alongside patch.dict(os.environ, ..., clear=True)) to
instantiate Settings with _env_file=None (e.g., Settings(_env_file=None)) so the
Settings class won’t load any .env file and the environment is fully controlled
by the patched os.environ; keep the patch.dict(...) usage but replace bare
Settings() calls with Settings(_env_file=None).

Comment on lines +1 to +3
skipsdist = true

envlist = ["unused-code", "unittests"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

tox 4 tox.toml configuration canonical keys skipsdist envlist no_package env_list

💡 Result:

In tox 4 using TOML (tox.toml or pyproject.toml’s tool.tox), the canonical (preferred) keys are the snake_case ones:

  • env_list (canonical) — legacy alias: envlist. Defines the default environments to run. [2]
  • no_package (canonical) — legacy alias: skipsdist. When true, tox skips the packaging/build step (useful for apps vs libraries). [2]
  • Related but different: skip_install (per-env) skips installing your project into that env (it does not control whether tox builds packages globally). [2]

If you want a quick “what’s canonical vs legacy alias” list (including envlist → env_list, skipsdist → no_package, etc.), tox-toml-fmt documents the alias normalization it applies for tox 4 TOML configs. [1]


Use canonical tox TOML keys for consistency with tox 4 standards.

At Line 1 and Line 3, replace legacy key aliases with canonical names. In tox 4 tox.toml, use no_package instead of skipsdist and env_list instead of envlist.

Proposed change
-skipsdist = true
+no_package = true

-envlist = ["unused-code", "unittests"]
+env_list = ["unused-code", "unittests"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tox.toml` around lines 1 - 3, Replace the legacy tox keys: change the
"skipsdist" setting to the canonical "no_package" and rename "envlist" to
"env_list" so the tox.toml uses tox 4 standard keys (update the entries for
"skipsdist" and "envlist" accordingly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants