PR: #9
Branch: fix/open-issues-cache-pypi-docs
Purpose: validate the PR through actual MCP/tool-level behavior, not only unit tests.
Confirm that:
- Existing stdlib docs tools still work.
get_docsreturns correct content.- Persistent cache is written and reused across server restarts.
- Cache identity is correct: version, slug, anchor,
max_chars,start_index, and index fingerprint matter. - Cache failure is best-effort and does not break retrieval.
lookup_package_docsreturns controlled PyPI-declared docs/homepage/source/repository links.- PyPI error modes return controlled notes rather than internal errors.
- MCP annotations and tool count are coherent.
cd /srv/openclaw/.openclaw/workspace/tmp/python-docs-mcp-review
git checkout fix/open-issues-cache-pypi-docs || git checkout review-pr-9
git pull --ff-only origin fix/open-issues-cache-pypi-docs
uv sync --all-extrasBaseline gates:
uv run ruff check src/ tests/
uv run pyright src/
uv run pytest --tb=short -q
uv buildExpected:
- Ruff passes.
- Pyright passes for
src/. - Pytest passes: currently expected
254 passed, 3 skipped. - Build succeeds.
Run a small introspection script or use the MCP client harness if available.
Expected tools:
search_docsget_docslookup_package_docslist_versionsdetect_python_version
Expected annotations:
- stdlib tools:
readOnlyHint=True,openWorldHint=False lookup_package_docs:readOnlyHint=True,openWorldHint=True
Pass criteria:
- Exactly five tools are exposed.
lookup_package_docsis visibly open-world because it calls PyPI.
Call:
get_docs(slug="library/json.html", version="3.12", max_chars=1000, start_index=0)
Expected:
- Result contains JSON documentation content.
slug == "library/json.html"version == "3.12"anchor is nullchar_count > 0
First use search_docs to find a valid section anchor for json, then call:
get_docs(slug="library/json.html", version="3.12", anchor=<valid_anchor>, max_chars=1000, start_index=0)
Expected:
- Result is section-scoped.
anchor == <valid_anchor>- Content is not the full page.
Call:
get_docs(slug="library/json.html", version="3.12", anchor="", max_chars=1000, start_index=0)
Expected:
- Controlled tool error / page-not-found style response.
- It must not return a cached full-page response.
This specifically verifies the anchor=None vs anchor="" cache fix.
Before running, locate cache path from platform cache dir. Expected filename:
retrieved-docs-cache.sqlite3
Likely under:
~/.cache/mcp-python-docs/retrieved-docs-cache.sqlite3
or the platform cache directory used by the app.
- Delete the cache file if present.
- Start the MCP server/client.
- Call
get_docsforlibrary/json.html. - Stop the server.
Expected:
- Cache file exists.
- SQLite table
retrieved_docs_cacheexists. - At least one row is present.
Suggested inspection:
sqlite3 <cache-path>/retrieved-docs-cache.sqlite3 \
"SELECT version, slug, anchor, max_chars, start_index, length(result_json) FROM retrieved_docs_cache;"- Start server again.
- Call the same
get_docsrequest.
Expected:
- Same response content.
- No user-visible behavior change.
- If logs expose cache hits/misses, second call should be a hit.
Call:
get_docs(slug="library/json.html", version="3.12", max_chars=500, start_index=0)
get_docs(slug="library/json.html", version="3.12", max_chars=1000, start_index=0)
get_docs(slug="library/json.html", version="3.12", max_chars=500, start_index=100)
Expected:
- Separate cache rows for each identity.
- Results are not cross-contaminated.
- Stop server.
- Replace cache file with invalid bytes:
printf 'not sqlite' > <cache-path>/retrieved-docs-cache.sqlite3- Start server.
- Call
get_docs(slug="library/json.html", version="3.12").
Expected:
- Docs retrieval still succeeds.
- Warning is logged about disabled/skipped persistent cache.
- No internal server error.
Call:
lookup_package_docs(package="requests")
Expected:
metadata_source == "https://pypi.org/pypi/requests/json"trust_boundary == "pypi-declared-metadata"packageis canonical from PyPI if available.versionis non-empty.sourcesincludes PyPI project URL and likely homepage/source/docs links.- Every source URL is
http://orhttps://. - No web search / unofficial mirror fallback.
Call:
lookup_package_docs(package="Sample_Project")
Expected:
- Metadata source normalizes to:
https://pypi.org/pypi/sample-project/json
- Returned package may be PyPI canonical name.
Call:
lookup_package_docs(package="definitely-not-a-real-package-vision-test-xyz")
Expected:
sources == []- note contains package not found / PyPI 404 style message.
- No internal error.
These may require monkeypatching/fake fetcher or temporary network blocking if not practical via live MCP.
Simulate PyPI 429 or 503.
Expected:
- Controlled result:
sources=[]
note="PyPI returned HTTP 429."
or equivalent code.
Simulate:
URLError- timeout
- invalid JSON body
Expected:
- Controlled result note:
Unable to retrieve PyPI metadata: <ErrorType>.
- No internal server error.
Simulate a response larger than 5 MiB.
Expected:
- The service reads at most
5 MiB + 1 byte. - Controlled result:
sources=[]
note="PyPI metadata exceeded size limit."
Use a package or fake response with broad project_urls, e.g. labels:
DocumentationHomepageSourceRepositoryIssuesChangelogCommunity mirrorTutorial
Expected:
Included:
- Documentation
- Homepage
- Source
- Repository
Excluded/skipped:
- Issues
- Changelog
- Community mirror
- Tutorial
Result note should mention ignored labels outside controlled allowlist.
If direct MCP client execution is awkward, use a minimal Python script that imports the services and mimics the tool layer:
from mcp_server_python_docs.services.package_docs import PackageDocsService
for pkg in ["requests", "Sample_Project", "definitely-not-a-real-package-vision-test-xyz"]:
print(PackageDocsService().lookup(pkg).model_dump())For get_docs, prefer actual MCP invocation or server lifespan because cache path wiring happens there.
Return results in this format:
## PR #9 MCP Test Results
### Environment
- Commit:
- OS:
- Python:
- Cache path:
### Gates
- ruff:
- pyright src:
- pytest:
- uv build:
### MCP Tool Tests
- Tool registration:
- get_docs full page:
- get_docs section:
- empty anchor behavior:
- cache file creation:
- cache survives restart:
- cache key separation:
- corrupt cache fallback:
- lookup_package_docs requests:
- missing package:
- failure simulation:
- scope/trust boundary:
### Verdict
PASS / FAIL
### Notes / Bugs Found
- ...The PR is considered MCP-smoke-test ready if:
- Local gates pass.
- Live MCP/tool invocation returns correct stdlib docs.
- Cache file is created and reused after restart.
- Corrupt cache does not break docs retrieval.
- PyPI lookup returns only controlled PyPI-declared metadata.
- PyPI expected failures return controlled notes.
- No internal errors are observed for expected failure modes.