Skip to content

Comments

feat: add comprehensive pdd setup prompts for all LLM providers (#480)#544

Merged
gltanaka merged 5 commits intopromptdriven:mainfrom
niti-go:change/pdd_setup
Feb 22, 2026
Merged

feat: add comprehensive pdd setup prompts for all LLM providers (#480)#544
gltanaka merged 5 commits intopromptdriven:mainfrom
niti-go:change/pdd_setup

Conversation

@niti-go
Copy link
Contributor

@niti-go niti-go commented Feb 20, 2026

Restructure pdd setup into an auto-configuring flow with expanded provider support and functionality.

  1. Auto-configures Gemini, Claude, and Codex (required for agentic tools)
  2. Adds support for many more LiteLLM-supported providers, even those with unique authorization flows (Vertex AI, AWS Bedrock, Azure, etc.)
  3. Model testing uses real LLM calls so errors surface during setup, not at runtime
  4. .pddrc is automatically configured based on detected project language

Important Notes

  • I could only test Gemini models (I had an API key). I recommend more testing for providers with multi-credential or complex auth, especially GitHub Copilot, Vertex AI, and AWS Bedrock.
  • This completely rewrites data/llm_model.csv to support more providers. It also has the script generate_model_catalog.pythat generates this file, but PDD maintainers need to re-run this script whenever LiteLLM's registry updates, and periodically update the hardcoded ELO scores in the file. Please take a look if the csv has the desired models, and if we want to set up a different way to update the csv file.

Design choices

  • Deterministic over agentic. After initial CLI setup, the rest of pdd setup could theoretically be agentic, but in practice this was slower and didn’t offer any improvements that deterministic python code couldn’t do, where it just needs to scan env vars, create files, and save api keys, etc.
  • ELO threshold of 1400. data/llm_model.csv only includes models with ELO scores above 1400, which are really the only ones useful for coding. Local LLMs (Ollama or LMStudio) are not included because they don’t score that high.
  • Users have to press "enter" multiple times to progress through pdd setup. I want users to see what's being configured and where files live. This keeps things interpretable for users who want to customize later.

New Dev Units

Each of these comes with examples and tests (and prompt files in pdd_cap).

setup_tool.py

  • The main orchestrator. First, it bootstraps agentic CLIs via cli_detector. Then, it auto-discovers API keys from env/dotenv/api-env files, matches them to models from the reference CSV, writes the user's ~/.pdd/llm_model.csv, optionally creates .pddrc, and smoke-tests the first available model.
  • Prints a structured summary (CLIs, keys, models, test result), writes PDD-SETUP-SUMMARY.txt and creates a sample success_python.prompt for immediate first use. Also offers an optional menu to add providers or test models.

cli_detector.py

  • Detects Claude, Codex, and Gemini CLIs via PATH/fallback lookup. Presents a numbered selection table with install/key status, and walks through installation (npm) and API key configuration (saved to ~/.pdd/api-env.{shell}) for each selected CLI.

api_key_scanner.py

  • Reads the local model catalog to determines which API keys are needed (including multi-key providers like AWS Bedrock), and checks .env, shell environment, and PDD config files. Reports what's found and what's missing without exposing key values.

provider_manager.py

  • Add and remove LLM providers and models from the local PDD configuration. Supports multi-credential providers like Vertex AI and AWS Bedrock (referenced LiteLLM documentation for the credentials needed). Saves API keys to a shell-sourced env file so they persist across terminal sessions.

model_tester.py

  • Lets users pick a configured model, checks that required credentials are set, then sends a real test call. Shows cost, speed, and a clear error message if something fails. Results persist within a session so users can test multiple models and compare.

pddrc_initializer.py

  • Detects project language from marker files and generates a .pddrc with sensible defaults. Asks for confirmation before writing; skips if one already exists.

New Script (not a full dev unit)

generate_model_catalog.py

  • Script that generates data/llm_model.csv by looking up all models/providers from LiteLLM and filtering to ELO scores above 1400. PDD maintainers should re-run this script whenever LiteLLM's registry updates, and periodically update the hardcoded ELO scores in the file.
  • I decided not to make this a full pdd dev unit because it’s short and relies on a lot of hardcoded values.

Updated files

data/llm_model.csv

  • Now includes all top models with ELO ≥ 1400 across all LiteLLM-supported providers.
  • The api_key column now supports pipe-delimited fields (e.g. VERTEXAI_PROJECT|VERTEXAI_LOCATION|GOOGLE_APPLICATION_CREDENTIALS) for providers whose auth requires multiple credentials.

llm_invoke.py

The api_key handling was rewritten to support the new pipe-delimited credentials format and generalized to remove provider-specific logic.

  • *_ensure_api_key (pre-call credential check):*
    • Single var (e.g. OPENAI_API_KEY) — same as before: check env, prompt if missing, save to .env.
    • Multiple vars (e.g. VAR1|VAR2|VAR3) — checks all are set. If any are missing, directs the user to pdd setup instead of prompting interactively. Includes an ADC fallback for Vertex AI when GOOGLE_APPLICATION_CREDENTIALS isn't set but gcloud auth was used.
    • Empty — no key needed (device flow or local model).
  • LiteLLM call setup (used to have ~80 lines of Vertex-specific logic, now ~20 lines of generic logic):
    • If the model has a single var — read env var, pass as api_key=.
    • If the model has multiple vars — let LiteLLM read them from the environment directly.
    • If the model has no api_key value — pass nothing (authentication may be through device flow, e.g. GitHub Copilot).
  • Retry kwargs were also simplified — removed vertex_credentials, vertex_project, vertex_location since those are now handled through environment variables.

test_llm_invoke.py

  • Updated to support new model api_key column. The old tests were setting VERTEX_CREDENTIALS as the mock env var, but the new CSV api_key column for Vertex AI models specifies GOOGLE_APPLICATION_CREDENTIALS|VERTEXAI_PROJECT|VERTEXAI_LOCATION. The _ensure_api_key function checked those exact env vars (because it previosuly only needed to look for one var), finds them missing , returns False, and skips the model — causing "All candidate models failed."
  • Fix: Changed the tests to set the correct env vars (GOOGLE_APPLICATION_CREDENTIALS, VERTEXAI_PROJECT, VERTEXAI_LOCATION) that match what the CSV now declares.

test_llm_invoke_vertex_retry.py

  • Updated to support new model api_key column. The old tests verified that llm_invoke explicitly passed Vertex AI credentials on retry. The new tests verify that llm_invoke does NOT pass them, because under the multi-credential convention, LiteLLM reads from os.environ directly — both on initial calls and retries.

These tests all pass now.

README.md and SETUP_WITH_GEMINI.md

  • Updated pdd setup documentation

- No prompt files included
- Adds support for many more LiteLLM-supported providers (Vertex AI, AWS Bedrock, Azure, etc.)
- The api_key column now supports pipe-delimited fields (e.g. VERTEXAI_PROJECT|VERTEXAI_LOCATION|GOOGLE_APPLICATION_CREDENTIALS) for providers whose auth requires multiple credentials
- Updated pdd setup documentation
- Update llm_invoke api_key handling to support the new pipe-delimited credentials format and generalized to remove provider-specific logic
test_llm_invoke.py:
- The tests were setting `VERTEX_CREDENTIALS` as the mock env var, but the new CSV `api_key` column for Vertex AI models specifies `GOOGLE_APPLICATION_CREDENTIALS|VERTEXAI_PROJECT|VERTEXAI_LOCATION`. The `_ensure_api_key` function checked those exact env vars (because it previosuly only needed to look for one var), finds them missing , returns `False`, and skips the model — causing "All candidate models failed."
- **Fix**: Changed the tests to set the correct env vars (`GOOGLE_APPLICATION_CREDENTIALS`, `VERTEXAI_PROJECT`, `VERTEXAI_LOCATION`) that match what the CSV actually declares.

test_llm_invoke_vertex_retry.py:
- The old tests verified that `llm_invoke` **explicitly passed** Vertex AI credentials on retry. The new tests verify that `llm_invoke` **does NOT** pass them, because under the multi-credential convention, LiteLLM reads from `os.environ` directly — both on initial calls and retries.

These tests all pass now.
- Skip GitHub Copilot models in --force/CI mode when no OAuth token exists, preventing litellm from hanging on an interactive device flow login
- Respect litellm's GITHUB_COPILOT_TOKEN_DIR and GITHUB_COPILOT_API_KEY_FILE env vars when checking for the token
- In pdd setup, trigger the actual OAuth device flow when a user adds GitHub Copilot as a provider, instead of just showing a "authenticate later" message
- Update provider_manager_example to show the new GitHub Copilot provider flow
- Tests/test_update_command.py now passes
- Use a Pareto filter to remove models that are strictly dominated (higher cost AND lower ELO) by another model from the same provider
- Remove models with designated regions or 'fast' versions
- Override buggy LiteLLM model costs with data from the internet
- Update ELO scores to Coding Arena ELO scores (from Text ELO scores)
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures pdd setup into a comprehensive auto-configuring flow with expanded LLM provider support. It adds 6 new dev units (setup_tool, cli_detector, api_key_scanner, provider_manager, model_tester, pddrc_initializer) with full test coverage, updates the multi-credential authentication system in llm_invoke.py, and expands data/llm_model.csv to support all LiteLLM providers with ELO ≥ 1400.

Changes:

  • Adds interactive setup wizard with CLI detection, API key scanning, model configuration, and testing
  • Implements multi-credential provider support (Azure, AWS Bedrock, Vertex AI, etc.)
  • Expands llm_model.csv from 20 to 93 models across 30+ providers
  • Updates llm_invoke.py to support pipe-delimited api_key fields
  • Adds comprehensive test coverage for all new modules

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_provider_manager.py Comprehensive tests for provider management (add/remove, multi-credential auth, shell escaping)
tests/test_pddrc_initializer.py Tests for .pddrc generation with language detection and YAML building
tests/test_model_tester.py Tests for interactive model testing with real LLM calls
tests/test_cli_detector.py Tests for CLI tool detection and bootstrap flow
tests/test_api_key_scanner.py Tests for API key discovery across multiple sources
tests/test_llm_invoke_vertex_retry.py Updated tests for multi-credential retry behavior
tests/test_llm_invoke.py Updated tests for multi-credential provider support
pdd/provider_manager.py Provider management with multi-credential support and shell-safe key storage
pdd/pddrc_initializer.py .pddrc generation with language detection
pdd/model_tester.py Interactive model testing with diagnostics
pdd/cli_detector.py CLI detection and bootstrap with multi-select support
pdd/api_key_scanner.py API key discovery across shell/dotenv/pdd config
pdd/llm_invoke.py Updated for pipe-delimited api_key support
pdd/data/llm_model.csv Expanded from 20 to 93 models with pipe-delimited api_key fields
README.md, SETUP_WITH_GEMINI.md, docs/ONBOARDING.md Updated setup documentation
pdd/docs/prompting_guide.md Added architecture metadata tags documentation
context/*.py Example files for all new modules

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Cherry-pick + isolation testing on test/pr-544-isolated

Hey @niti-go — really impressive work here! I cherry-picked all 4 commits onto an isolated test branch and put them through a full manual + automated test suite. Here's what I found:


✅ What's working great

The core of this PR is solid. Every critical path I tested passed:

  • All 5 new modules import cleanlyprovider_manager, api_key_scanner, model_tester, cli_detector, and pddrc_initializer all load without errors.
  • Pipe-delimited API key format works end-to-endparse_api_key_vars("AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|AWS_REGION_NAME") correctly returns a list, and _ensure_api_key() correctly returns True when all vars are set and False when any one is missing. This is the most important new behavior and it's correct.
  • Backward compatibility is preserved — single-credential providers like Anthropic and OpenAI continue to work exactly as before.
  • Vertex AI retry tests: 4/4 pass — the new env-var-based credential convention doesn't break the retry logic.
  • 217/218 tests in test_llm_invoke.py pass — the LLM calling path is in great shape.
  • The curated 92-model CSV loads correctly and the avg_cost column is properly parsed.

The architectural idea here — generalizing credential handling to support multi-var providers like AWS Bedrock and Vertex AI using a pipe-delimited format — is elegant and the right call. Well done.


🔧 Two things to fix before merging

1. api_key_scanner source attribution bug (tests/test_api_key_scanner.py, 2 failures)

scan_environment() always reports ".env file" as the source even when the key was set via the shell environment or ~/.pdd/api-env.bash. The two failing tests catch this precisely:

  • test_scan_detects_shell_env_key — expects source == "shell environment", gets ".env file"
  • test_scan_detects_api_env_file_key — expects source == "~/.pdd/api-env.bash", gets ".env file"

The priority-order check in scan_environment() (.env → shell → api-env file) seems to be matching .env first even when the key isn't actually in .env. Worth double-checking the _load_dotenv_values() result when a .env file exists in the test fixture's home directory.

2. CSV / test consistency (test_llm_invoke.py::test_deepseek_maas_passes_response_format_for_structured_output, 1 failure)

The CSV curation commit (78cdd40) removed vertex_ai/deepseek-ai/deepseek-v3.2-maas, but this test checks for that exact model. Either add the model back to the CSV or update the test to reflect the curated set. Since the test was specifically written to catch a structured_output bug for that model, it's worth keeping — just needs the CSV entry restored.


Recommendation

Fix those two issues and this is ready to merge. The hard part (the multi-credential architecture) is already correct. These are small, targeted fixes.

Great contribution — this significantly improves the setup experience for multi-var providers and the Pareto-filtered model list is a nice touch. 🙌

Copy link
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Cherry-pick + isolation testing on test/pr-544-isolated

Hey @niti-go — really impressive work here! I cherry-picked all 4 commits onto an isolated test branch and put them through a full manual + automated test suite. Here's what I found:


What's working great

The core of this PR is solid. Every critical path I tested passed:

  • All 5 new modules import cleanlyprovider_manager, api_key_scanner, model_tester, cli_detector, and pddrc_initializer all load without errors.
  • Pipe-delimited API key format works end-to-endparse_api_key_vars("AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|AWS_REGION_NAME") correctly returns a list, and _ensure_api_key() correctly returns True when all vars are set and False when any one is missing. This is the most important new behavior and it's correct.
  • Backward compatibility is preserved — single-credential providers like Anthropic and OpenAI continue to work exactly as before.
  • Vertex AI retry tests: 4/4 pass — the new env-var-based credential convention doesn't break the retry logic.
  • 217/218 tests in test_llm_invoke.py pass — the LLM calling path is in great shape.
  • The curated 92-model CSV loads correctly and the avg_cost column is properly parsed.

The architectural idea here — generalizing credential handling to support multi-var providers like AWS Bedrock and Vertex AI using a pipe-delimited format — is elegant and the right call. Well done.


Two things to fix before merging

1. api_key_scanner source attribution bug (tests/test_api_key_scanner.py, 2 failures)

scan_environment() always reports ".env file" as the source even when the key was set via the shell environment or ~/.pdd/api-env.bash. The two failing tests catch this precisely:

  • test_scan_detects_shell_env_key — expects source == "shell environment", gets ".env file"
  • test_scan_detects_api_env_file_key — expects source == "~/.pdd/api-env.bash", gets ".env file"

The priority-order check in scan_environment() (.env → shell → api-env file) seems to be matching .env first even when the key isn't actually in .env. Worth double-checking _load_dotenv_values() — if the test fixture's home directory happens to have a .env file that picks up the key, it would shadow the shell/api-env sources.

2. CSV / test consistency (test_llm_invoke.py::test_deepseek_maas_passes_response_format_for_structured_output, 1 failure)

The CSV curation commit (78cdd40) removed vertex_ai/deepseek-ai/deepseek-v3.2-maas, but this test checks for that exact model. Either add the model back to the CSV or update the test to reflect the curated set. Since the test was written to catch a specific structured_output bug for that model, it's worth keeping — just needs the CSV entry restored.


Recommendation

Fix those two issues and this is ready to merge. The hard part (the multi-credential architecture) is already correct. These are small, targeted fixes.

Great contribution — this significantly improves the setup experience for multi-var providers and the Pareto-filtered model list is a nice touch!

- test_llm_invoke.py: Switched the Vertex AI MaaS structured output test from using deepseek-v3.2-maas (which was removed from the model catalog by the Pareto filter) to minimax-m2-maas (which is still in the catalog). The test no longer fails due to a missing CSV row.
- In API key scanner, treat empty API key values as "not set" across all sources and fix test isolation to prevent false failures when a developer or test suite has local .env files
@gltanaka gltanaka merged commit 240ec33 into promptdriven:main Feb 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants