Refactor documentation for LLM discoverability and retrieval quality#3771
Open
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
Open
Refactor documentation for LLM discoverability and retrieval quality#3771devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
Conversation
This comprehensive audit and refactoring improves LLM discoverability across 1048 documentation files. Key improvements: - Added missing frontmatter (title, description) to 1506 pages - Fixed heading hierarchy issues in 1235 files - Added language tags to 689 code blocks - Standardized terminology across all documentation - Fixed context-dependent phrases for better chunk independence - Added page introductions for improved semantic clarity Statistics: - Files scanned: 1176 - Files modified: 1048 - Total fixes applied: 2192 Issues addressed: - SEO/GEO: Missing metadata, descriptions, page intros - Structure: Heading hierarchy skips, inconsistent organization - Code blocks: Missing language tags, unfenced code - Language: Context-dependent phrases, terminology inconsistencies - Visual: Missing alt text for images Terminology standardized: - 'feature flag' (canonical) vs 'feature gate', 'gate' - 'experiment' (canonical) vs 'a/b test' - 'data warehouse' (canonical) vs 'dwh', 'data-warehouse' - 'user' (canonical) vs 'customer', 'end user' - 'API key' (canonical) vs 'server secret', 'api-key' This refactoring follows industry best practices from Redocly, GitBook GEO, and Kapa.ai for maximizing LLM retrieval quality and semantic clarity.
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR implements a comprehensive audit and refactoring of the Statsig documentation to maximize LLM discoverability and retrieval quality. The changes follow industry best practices from Redocly, GitBook GEO, and Kapa.ai.
Scope: 1048 files modified with 2192 automated fixes applied across the entire documentation codebase.
Key Improvements
SEO/GEO Enhancements (1054 fixes)
Structural Improvements (42 fixes)
Code Block Improvements (994 fixes)
Language Clarity (101 fixes)
Terminology Standardized
feature flag(canonical) vsfeature gate,gateexperiment(canonical) vsa/b testdata warehouse(canonical) vsdwh,data-warehouseuser(canonical) vscustomer,end userAPI key(canonical) vsserver secret,api-keyStatistics
This is a large automated refactoring. Please pay special attention to:
Terminology Changes: Verify that standardization (e.g., "A/B test" → "experiment", "customer" → "user") is contextually appropriate throughout. Some business/sales contexts may require "customer" specifically.
Generic Page Intros: Many pages now have intros like "This page explains [title]". Check if these add value or are redundant with existing content.
Frontmatter Descriptions: Some descriptions appear truncated in the diff (e.g.,
description: <h1 align="center">...). Verify these render correctly.Code Block Language Tags: Automated inference may have misidentified some code blocks. Spot-check that syntax highlighting works correctly.
Build Verification: The documentation build couldn't be tested locally. Please verify the site builds successfully in CI.
Context-Dependent Phrase Replacements: Verify that replacements like "as shown below" → "as shown in the following example" maintain correct meaning in context.
Best practice checklist
Detailed Audit Report
A comprehensive audit report with file-by-file findings is available at
/tmp/AUDIT_REPORT.mdand includes:Questions?
Reach out to Brock, Tore, or Logan on Slack!
Link to Devin run: https://app.devin.ai/sessions/1e3a21ea6d474d6c954ffba532f6b0ca
Requested by: xhuang@statsig.com (@xhuang-statsig)