Guidance for AI agents (Cursor, Claude Code, Copilot, etc.) working in this repository.
- ALL tests must pass (unit, doctest, lint, type checks) before declaring work complete.
- Do not describe code as "working" if any test fails.
- Fix regressions rather than disabling or skipping tests unless explicitly approved.
gp-libs is the shared tooling stack used across the git-pull ecosystem. This repository, cihai-cli, is a command-line interface built on top of the cihai library to explore the Unihan (CJK) character database. Key abilities:
- Lookup CJK characters with
cihai info <char>and YAML-formatted output. - Reverse search definitions with
cihai reverse <term>. - Bootstraps and queries the Unihan dataset via
cihai/unihan-etl. - Provides a small, typed argparse-based CLI (
src/cihai_cli/cli.py) exposed as thecihaientry point.
This project uses:
- Python 3.10+
- uv for dependency and task execution
- ruff for linting/formatting
- mypy with strict settings
- pytest (+ doctests) for testing
- gp-libs for shared docs/testing helpers
- Sphinx (Furo) for documentation
# Install dependencies (editable)
uv pip install --editable .
uv pip sync
# Install with dev extras
uv pip install --editable . -G devjust test # or: uv run pytest
uv run pytest tests/test_cli.py # single file
uv run pytest tests/test_cli.py::test_info_command # single test
just start # run tests then watch with pytest-watcher
uv run ptw . # standalone watcher (doctests enabled by default)just ruff # uv run ruff check .
just ruff-format # uv run ruff format .
uv run ruff check . --fix --show-fixes
just mypy # strict type checkingjust build-docs # build Sphinx HTML in docs/_build
just start-docs # autobuild + livereload
just design-docs # update CSS/JS assetsuv run ruff format .uv run pytestuv run ruff check . --fix --show-fixesuv run mypyuv run pytest(verify clean)
src/cihai_cli/cli.py: argparse entrypoint, implementsinfoandreversecommands, logging setup.src/cihai_cli/__about__.py: package metadata (__version__).- Tests:
tests/(unit) plus doctests insrc/anddocs/. - Docs:
docs/Sphinx project (Furo theme).
- Pytest with doctests enabled (
addoptsinpyproject.toml). - Prefer real
cihai/unihan_etlintegration over heavy mocking; reuse fixtures where present. - Watch mode:
uv run ptw .(used injust start). - Coverage via
pytest-cov; configuration inpyproject.toml. - Prefer fixtures over mocks (
server,session, etc. when available); usetmp_pathovertempfile,monkeypatchoverunittest.mock.
from __future__ import annotationsrequired; enforced by ruff.- Namespace imports for stdlib/typing (
import typing as t); third-party packages may usefrom X import Y. - Docstrings follow NumPy style (see
tool.ruff.lint.pydocstyle). - Python target version: 3.10 (
tool.ruff.target-version). - Keep CLI output human-friendly YAML; avoid breaking existing flags/args.
- Doctests: keep concise, narrative Examples blocks; move complex flows to
tests/examples/.
These rules guide future logging changes; existing code may not yet conform.
- Use
logging.getLogger(__name__)in every module - Add
NullHandlerin library__init__.pyfiles - Never configure handlers, levels, or formatters in library code — that's the application's job
Pass structured data on every log call where useful for filtering, searching, or test assertions.
Core keys (stable, scalar, safe at any log level):
| Key | Type | Context |
|---|---|---|
unihan_field |
str |
UNIHAN field name |
unihan_source_file |
str |
source data file path |
unihan_record_count |
int |
records processed |
cihai_command |
str |
CLI command name |
Heavy/optional keys (DEBUG only, potentially large):
| Key | Type | Context |
|---|---|---|
unihan_stdout |
list[str] |
subprocess stdout lines (truncate or cap; %(unihan_stdout)s produces repr) |
unihan_stderr |
list[str] |
subprocess stderr lines (same caveats) |
Treat established keys as compatibility-sensitive — downstream users may build dashboards and alerts on them. Change deliberately.
snake_case, not dotted;unihan_prefix- Prefer stable scalars; avoid ad-hoc objects
- Heavy keys (
unihan_stdout,unihan_stderr) are DEBUG-only; consider companionunihan_stdout_lenfields or hard truncation (e.g.stdout[:100])
logger.debug("msg %s", val) not f-strings. Two rationales:
- Deferred string interpolation: skipped entirely when level is filtered
- Aggregator message template grouping:
"Running %s"is one signature grouped ×10,000; f-strings make each line unique
When computing val itself is expensive, guard with if logger.isEnabledFor(logging.DEBUG).
Increment for each wrapper layer so %(filename)s:%(lineno)d and OTel code.filepath point to the real caller. Verify whenever call depth changes.
For objects with stable identity (Dataset, Reader, Exporter), use LoggerAdapter to avoid repeating the same extra on every call. Lead with the portable pattern (override process() to merge); merge_extra=True simplifies this on Python 3.13+.
| Level | Use for | Examples |
|---|---|---|
DEBUG |
Internal mechanics, data I/O | Field parsing, record transformation steps |
INFO |
Data lifecycle, user-visible operations | Download completed, export finished, database bootstrapped |
WARNING |
Recoverable issues, deprecation, user-actionable config | Missing optional field, deprecated data format |
ERROR |
Failures that stop an operation | Download failed, parse error, database write failed |
Config discovery noise belongs in DEBUG; only surprising/user-actionable config issues → WARNING.
- Lowercase, past tense for events:
"download completed","parse error" - No trailing punctuation
- Keep messages short; put details in
extra, not the message string
- Use
logger.exception()only insideexceptblocks when you are not re-raising - Use
logger.error(..., exc_info=True)when you need the traceback outside anexceptblock - Avoid
logger.exception()followed byraise— this duplicates the traceback. Either add context viaextrathat would otherwise be lost, or let the exception propagate
Assert on caplog.records attributes, not string matching on caplog.text:
- Scope capture:
caplog.at_level(logging.DEBUG, logger="cihai_cli.cli") - Filter records rather than index by position:
[r for r in caplog.records if hasattr(r, "unihan_field")] - Assert on schema:
record.unihan_record_count == 100not"100 records" in caplog.text caplog.record_tuplescannot access extra fields — always usecaplog.records
- f-strings/
.format()in log calls - Unguarded logging in hot loops (guard with
isEnabledFor()) - Catch-log-reraise without adding new context
print()for diagnostics- Logging secret env var values (log key names only)
- Non-scalar ad-hoc objects in
extra - Requiring custom
extrafields in format strings without safe defaults (missing keys raiseKeyError)
When writing documentation (README, CHANGES, docs/), follow these rules for code blocks:
One command per code block. This makes commands individually copyable. For sequential commands, either use separate code blocks or chain them with && or ; and \ continuations (keeping it one logical command).
Put explanations outside the code block, not as comments inside.
Good:
Run the tests:
$ uv run pytestRun with coverage:
$ uv run pytest --covBad:
# Run the tests
$ uv run pytest
# Run with coverage
$ uv run pytest --covThese rules apply to shell commands in documentation (README, CHANGES, docs/), not to Python doctests.
Use console language tag with $ prefix. This distinguishes interactive commands from scripts and enables prompt-aware copy in many terminals.
Good:
$ uv run pytestBad:
uv run pytestSplit long commands with \ for readability. Each flag or flag+value pair gets its own continuation line, indented. Positional parameters go on the final line.
Good:
$ pipx install \
--suffix=@next \
--pip-args '\--pre' \
--force \
'cihai-cli'Bad:
$ pipx install --suffix=@next --pip-args '\--pre' --force 'cihai-cli'- Lean on
pytest -k <pattern> -vvfor focused failures. - For CLI behavior, run
uv run cihai info 好oruv run cihai reverse library. - If Unihan DB is missing, CLI bootstraps automatically; avoid altering that flow unless required.
- Stuck in loops? Pause, minimize to a minimal repro, document exact errors, and restate the hypothesis before another attempt.
Commit subjects: Scope(type[detail]): concise description
Body template:
why: Reason or impact.
what:
- Key technical changes
- Single topic only
Guidelines:
- Subject ≤50 chars; body lines ≤72 chars; imperative mood.
- One topic per commit; separate subject and body with a blank line.
- Mark breaking changes with
BREAKING:and include related issue refs when relevant.
Common commit types:
- feat: New features or enhancements
- fix: Bug fixes
- refactor: Code restructuring without functional change
- docs: Documentation updates
- chore: Maintenance (dependencies, tooling, config)
- test: Test-related updates
- style: Code style and formatting
- py(deps): Dependencies
- py(deps[dev]): Dev dependencies
- ai(rules[AGENTS]): AI rule updates
- ai(claude[rules]): Claude Code rules (CLAUDE.md)
- ai(claude[command]): Claude Code command changes
- For
notes/**/*.md, keep content concise and well-structured (headings, bullets, code fences). - Use clear link text
[Title](mdc:URL)and avoid redundancy; follow llms.txt style when possible.
- Project docs: https://cihai-cli.git-pull.com
- Library docs: https://cihai.git-pull.com
- Unihan dataset: https://www.unicode.org/charts/unihan.html
- Shared tooling: https://github.com/gp-libs/gp-libs