A small set of local-first Python CLIs that demonstrate production-grade patterns for using LLMs in developer tools:
- Reliable model calls (timeouts, retries/backoff, clear error taxonomy)
- Structured outputs (strict JSON, schema validation, auto-repair when the model returns malformed JSON)
- Hallucination resistance (multi-stage generation → critique → deterministic formatting)
- Privacy-minded defaults (LiteLLM telemetry disabled; designed to work with a local OpenAI-compatible endpoint)
This repo is intentionally recruiter/hiring-manager friendly: the tools are runnable, small, and focused on engineering fundamentals rather than “prompt demos”.
A thin wrapper over litellm that standardizes:
- retries/backoff, timeout handling
LLMTimeoutError/LLMServiceError/LLMJSONError- text + messages APIs (including streaming)
- strict-JSON generation flows (with JSON correction attempts)
Why it matters: most LLM prototypes fail in the boring parts (timeouts, malformed JSON, inconsistent errors). This file is the reliability foundation the other CLIs build on.
Turns a messy idea into an implementation-ready plan:
- an epic (title/goal/priority)
- up to 5 stories, each with concrete steps and dependencies
- a
recommended_next_story
Key engineering features:
- robust JSON extraction from imperfect model output
- schema validation (
validate_plan) - an automatic repair loop when parsing fails (
repair_plan)
Why it matters: if you want to automate planning, you need structured outputs with guardrails and recovery, not “best effort prose”.
Converts raw intent into a tool-specific, paste-ready prompt (Gemini/ChatGPT/Codex/Claude formats), with a multi-stage flow:
- Reference context builder (
--file): embeds full/excerpt/summary/path-only based on size and a context budget. - Architect (LLM): produces a JSON prompt spec (
role/context/tasks/constraints/output_spec) while preserving scope and uncertainty. - Judge (LLM): rejects outputs that invent details, weaken constraints, or expand scope.
- Polisher (deterministic): parses and formats the final prompt consistently, dedupes/preserves constraints.
Why it matters: this demonstrates a practical anti-hallucination pattern (generate → critique → deterministic finalize) and emphasizes faithfulness to the user’s constraints.
- Python 3.10+
- A local OpenAI-compatible endpoint behind LiteLLM / Ollama (default assumptions are in
LLMClientConfig)
Install deps:
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
This repo pins a minimal set of dependencies for reproducibility.
python3 idea_planner.py \
--idea "Build a small workbench feature to capture user ideas, break them into stories, and recommend next action." \
--context "Django REST API with a React frontend, early stage."
--model qwen-coder-14b \
--temperature 0.2 \
--max-tokens 2200
You’ll get strict JSON like:
epic.title,epic.goal,epic.prioritystories[]withsteps[]anddependencies[]recommended_next_story
python3 promptsmith.py \
--intent "Review this codebase and suggest the smallest safe refactor to improve reliability. Ask questions if you need files." \
--tool claude
Optional: provide reference files (be careful—see Safety):
python3 promptsmith.py \
--intent "Generate a debugging prompt for this tool. Preserve constraints and avoid guessing." \
--tool gemini \
--file llm_client.py \
--max-context-chars 30000
- These tools are designed to work with a local OpenAI-compatible endpoint (e.g., LiteLLM proxy pointing at Ollama). Your configuration determines where prompts go.
promptsmith.py --file ...may embed file contents (full or excerpt) into the model prompt depending on size/budget.- Don’t pass confidential/proprietary files unless your model endpoint and environment are trusted.
llm_client.pydisables LiteLLM telemetry by default (litellm.telemetry = False) and drops unknown params.
- LLM reliability engineering: timeouts, retries/backoff, error taxonomy
- Structured outputs: strict JSON + validation + auto-repair loops
- Faithfulness / anti-hallucination: generator + judge + deterministic finalization
- Tooling ergonomics: CLIs with explicit args and predictable outputs
- Add
pyproject.toml(or keeprequirements.txt) and aMakefile/justfilefor one-command setup + tooling. - Add
examples/with synthetic inputs and expected outputs (so reviewers can run deterministic checks). - Add minimal CI (lint + unit tests + a smoke test that runs without calling a model).
- Document model/provider setup more explicitly (e.g., LiteLLM proxy vs direct Ollama).
This section is a placeholder.
Suggested quick capture on macOS:
- Record a short terminal demo (idea planner + prompt engineer) using
asciinemaor a screen recorder. - Export as a GIF and add it as
docs/demo.gif, then embed it here:

MIT (see LICENSE).