A research copilot for structured company strategy analysis, using the "Playing to Win" strategic framework (originally described by A.G. Lafley & Roger Martin; see Disclaimers).
This is a research assistant, not an automatic strategy generator. It collects public materials, downloads and converts IR documents, runs web searches, and organizes findings into a structured draft — but every output requires human review and validation before use.
Prompt-driven Python orchestrator with editable prompt templates for Claude CLI.
cd strategy_research_copilot
python3 -m venv .venv
source .venv/bin/activate
# Optional: needed for DuckDuckGo mode and DOCX conversion
pip install -r requirements.txt
# Run — output goes to Runs/Apple/
python3 orchestrator.py "Apple"strategy_research_copilot/
├── core_support.py
├── orchestrator.py
├── stages.py
├── pdf_to_md_claude.py # PDF→MD converter (auto-detected)
├── prompts/
├── docs/
├── tests/
└── Runs/ # All pipeline outputs
└── {Company}/
├── _processing/ # Resume state, logs
├── sources.json # URL→file manifest with tiers
├── Materials/
│ ├── IR Research/ # IR materials (PDFs + converted MDs)
│ │ └── {company}/
│ │ ├── pdf/
│ │ └── md/
│ └── Articles/ # T{tier}_{domain}_{title}.md
├── Analytics/
│ ├── Facts and Figures.md
│ ├── Strategy Confidence {Company}.md
│ └── Strategy Confidence {Company}.json
└── Strategy {Company}.md
Detailed stage-level specification: docs/BUSINESS_LOGIC.md Stage interface contracts: docs/STAGE_CONTRACTS.md Research substage design: docs/research_substage_design.md
- Claude CLI — installed and authenticated (
claudecommand available in PATH) - Active Anthropic subscription (Pro/Max) — all API calls go through your subscription
- Python 3.10+
If you want to use --search-engine ddg or Stage 0 DOCX conversion:
pip install -r requirements.txt
# Installs: duckduckgo-search, trafilatura, requests, pypandocFor DOCX conversion, install pandoc separately.
For PDF conversion, provide pdf_to_md_claude.py via --pdf-converter or place it next to orchestrator.py / on PATH.
cd strategy_research_copilot
# Optional but recommended
python3 -m venv .venv
source .venv/bin/activate
# For Claude WebSearch mode (default) — no extra Python deps needed
# For DuckDuckGo mode and DOCX conversion — install dependencies:
pip install -r requirements.txt
# For local tests
pip install -e ".[dev]"# Basic run — output goes to Runs/Apple/
python3 orchestrator.py "Apple"
# Generate report in Russian
python3 orchestrator.py "Apple" --language ru
# Higher quality (Opus for key stages)
python3 orchestrator.py "Apple" --quality high
# Add focus context
python3 orchestrator.py "Samsung" --context "Focus on semiconductor division only"
# Division analysis with parent company
python3 orchestrator.py "AWS" --parent-company "Amazon"
# Skip web search (work only with local/IR documents)
python3 orchestrator.py "Apple" --skip-search
# Custom working directory
python3 orchestrator.py "Apple" --workdir ./custom_dir
# Resume from a specific stage
python3 orchestrator.py "Apple" --from-stage 4
# Enable executive profiles (Stage 4)
python3 orchestrator.py "Apple" --enable-stage-4
# Enable Stage 4 but auto-delete saved profile files after run
python3 orchestrator.py "Apple" --enable-stage-4 --auto-delete-profiles
# Skip IR research (use manually placed materials in Materials/IR Research/)
python3 orchestrator.py "Apple" --skip-research
# Research substage only (download + convert IR materials, no strategy generation)
python3 orchestrator.py "Apple" --research-only
# Clean up intermediate files after completion
python3 orchestrator.py "Apple" --clean# Run tests
pytest -q
# Validate syntax
python3 -m py_compile orchestrator.py| Option | Default | Description |
|---|---|---|
--language |
en |
Output language: en or ru (Stage 7 confidence report is always in English) |
--workdir |
Runs/{company} |
Working directory for outputs |
--parent-company |
— | Parent company name (for division analysis) |
--context |
— | Additional context/scope for the analysis |
--skip-search |
off | Skip web search, use existing materials only |
--from-stage |
0 | Resume from stage N (1–6) |
--quality |
optimal |
optimal (Sonnet throughout) or high (Opus for key stages) |
--model |
auto | Override Claude model for all stages |
--timeout |
600 |
Timeout per Claude call in seconds |
--search-engine |
auto | claude (default for --quality high) or ddg (default for --quality optimal) |
--enable-stage-4 |
off | Enable Stage 4 (executive profiles) |
--auto-delete-profiles |
off | Delete saved Stage 4 profile files after pipeline completes; downstream outputs may still contain personal data |
--pdf-converter |
auto | Path to pdf_to_md_claude.py (auto-detected in project dir) |
--research-dir |
{workdir}/Materials/IR Research |
Path to IR materials directory |
--research-companies |
all | Filter companies for Research substage (space-separated) |
--skip-research |
off | Skip Research substage (R0–R4); use when IR materials are already in place |
--research-only |
off | Run only Research substage (R0–R4), skip main pipeline |
--max-articles |
15 |
Maximum saved web research articles in DDG mode |
--debug |
off | Show DEBUG logs in console output |
--max-context-chars |
100000 |
Switch to two-pass synthesis above this estimated markdown size |
--chunk-size |
160000 |
Chunk size for two-pass synthesis |
--research-workers |
3 |
Parallel Claude workers in Stage 2 Claude mode |
--download-workers |
5 |
Parallel DDG search/download workers |
--future-timeout |
60 |
Timeout for parallel futures in search/research stages |
--http-max-bytes |
2000000 |
Max bytes read from fetched web pages |
--pass1-workers |
3 |
Parallel workers for two-pass extraction (Pass 1 chunks) |
--stage4-workers |
2 |
Parallel workers for Stage 4 people profiles |
--stage4-min-local-files |
3 |
Min targeted local files per person before web search escalation in Stage 4 |
--r0-claude-timeout |
120 |
Timeout in seconds for Claude fallback in R0 IR discovery |
--ir-domain |
— | Override R0: search this domain for IR documents |
--ir-url |
— | Override R0: parse this URL as an IR page to extract PDF links |
--ir-pdf-url |
— | Override R0: directly download this PDF URL |
--clean |
off | Delete intermediate files after completion |
| Stage | Description | Prompt Template |
|---|---|---|
| 0 | Convert PDF/DOCX to Markdown (pre-stage) | external converter / pypandoc |
| R0 | Discover & download IR materials (annual reports, presentations, earnings) | prompts/research_discover_ir.md |
| R1–R3 | Classify, detect page ranges, faithful PDF→MD conversion | prompts/research_classify_document.md |
| R4 | Mirror IR materials into working directory (skip if already inside workdir) | — |
| 1 | Scan materials, generate search queries/themes | prompts/stage_1_scan_and_queries.md |
| 2 | Web research + source reliability tier classification (1–4) | prompts/stage_2_web_research_ddg.md or stage_2_web_research_claude.md |
| 3 | Compile organizational structure | prompts/stage_3_org_structure.md |
| 4 | Generate executive profiles (optional) | prompts/stage_4_people_profiles.md |
| 5 | Collect facts and figures (with traceable citations) | prompts/stage_5_facts_figures.md |
| 6 | Generate final strategy document (with [DECLARED/SUPPORTED/INFERRED] citations) |
prompts/stage_6_strategy_generation.md |
| 7 | Evidence-based confidence scoring (deterministic, zero Claude calls) | — |
The Research substage automatically finds and processes IR (Investor Relations) materials:
- R0: Claude with WebSearch finds the company's IR page and downloads PDFs (annual reports, investor presentations, earnings press releases)
- R1: Classifies downloaded documents by type
- R2: Detects strategically relevant page ranges (filters out ESG/governance/audit sections in large annual reports)
- R3: Converts relevant pages to faithful Markdown via
pdf_to_md_claude.py(no compression or summarization — downstream stages receive original formulations) - R4: Mirrors IR materials into
Runs/{company}/Materials/IR Research/{company}/pdf/and.../md/when--research-diris external (no-op if IR materials already inside workdir)
IR materials are stored inside the working directory by default: Runs/{company}/Materials/IR Research/{company}/pdf/. Use --research-dir to override. Manually placed PDFs are not re-downloaded. If you have already downloaded and converted IR materials yourself, use --skip-research to skip the entire R0–R4 substage.
Design details: docs/research_substage_design.md
All outputs are saved to Runs/{Company}/:
Strategy {Company}.md— main strategy document with traceable[^N]citationsOrg Structure {Company}.md— organizational structureAnalytics/Facts and Figures.md— metrics and data pointsAnalytics/Strategy Confidence {Company}.md— evidence-based confidence report (always in English)Analytics/Strategy Confidence {Company}.json— machine-readable scoressources.json— URL→file mapping with metadata (tier, domain, document type)Materials/IR Research/{company}/pdf/— original PDFsMaterials/IR Research/{company}/md/— faithful MD conversionsMaterials/Articles/— web research articles (T{tier}_{domain}_{title}.md)Materials/People/— executive profiles (if Stage 4 enabled)
Every downloaded source is classified into one of 4 reliability tiers:
| Tier | Weight | Description | Examples |
|---|---|---|---|
| 1 | 1.0 | Official strategy materials | IR presentations, annual reports, strategy updates |
| 2 | 0.85 | Official company communications | Press releases on company website, earnings calls |
| 3 | 0.7 | Authoritative external sources | Rating agencies, analysts, CEO interviews in media |
| 4 | 0.3 | Secondary sources | Blogs, forums, aggregators, opinion pieces |
- IR documents (Stage R0) are auto-classified by type (tier 1–2)
- Web articles (Stage 2) are classified by Claude during relevance check + domain-based fallback
- Tiers are stored in
sources.jsonand used by the Stage 7 confidence scorer
Strategy and Facts documents use traceable footnotes:
[^1]: [DECLARED] `Apple_Annual_Report_2024.md`, p. 12 — target revenue growth of 8% YoY
[^2]: [SUPPORTED] `Investor_Presentation_Q4_2024.md`, lines 80–95 — market share data
[^3]: [INFERRED] based on absence of any mention of enterprise segment in public materials
Evidence markers: [DECLARED] (company explicitly states), [SUPPORTED] (data supports), [INFERRED] (analyst reconstruction).
Deterministic, zero Claude calls. Computes per-section scores (0–100) from 5 signals:
score = 0.35 × source_quality + 0.25 × directness + 0.20 × corroboration
+ 0.10 × recency + 0.10 × citation_quality
+ strategic_doc_boost (up to +15)
| Stage | optimal (default) |
high |
|---|---|---|
| Stage 1 — queries | low / sonnet | low / sonnet |
| Stage 2 — research | medium / sonnet | high / opus |
| Stage 3 — org structure | medium / sonnet | high / opus |
| Stage 4 — profiles | medium / sonnet | high / opus |
| Stage 5 — facts | medium / sonnet | medium / sonnet |
| Stage 6 — strategy | high / sonnet | high / opus |
optimal — faster, Sonnet throughout. Good for drafts and iteration.
high — Opus for research, org structure, profiles, and final strategy. Better for final-quality output.
Edit prompt templates in the prompts/ directory:
# Example: modify Stage 1 prompt
vim prompts/stage_1_scan_and_queries.md
# Example: add custom section to strategy output
vim prompts/stage_6_strategy_generation.mdAvailable variables in templates:
{company}— Company name{parent_company}— Parent company (if specified){context}— Additional context note{summaries}— Short previews of local markdown materials (Stage 1){search_mode_instruction}— Search-mode contract for Stage 1{theme}— Research theme (Stage 2){url},{title},{text}— Article metadata and extracted page text (Stage 2 DDG mode){org_structure_path}— Path to the Stage 3 output (Stage 4 selection step){name},{position}— Executive info (Stage 4){web_search_enabled},{web_search_instruction}— Stage 4 search policy controls{language_instruction}— Language suffix ("Write in Russian" if--language ru){filename},{total_pages},{preview_pages}— PDF metadata (Research substage R2 classification prompt)
Completed stages are tracked in _processing/completed_stages.json. To resume:
# Resume from Stage 4 (skip 1-3)
python3 orchestrator.py "Apple" --from-stage 4To force a full clean run, delete _processing/:
rm -rf Runs/Apple/_processingEach run makes 20–60+ Claude API calls depending on stages and material volume.
All calls go through your Anthropic subscription (Pro/Max). No additional API billing.
| Scenario | --quality optimal |
--quality high |
|---|---|---|
--skip-search, small company |
~10 calls | ~10 calls |
| Full run, default | ~30 calls | ~30 calls (4-5 with Opus) |
| Large company, many materials | ~50+ calls | ~50+ calls (10+ with Opus) |
- PDF conversion — place
pdf_to_md_claude.pynext toorchestrator.pyfor auto-detection, or specify via--pdf-converter - DOCX conversion requires
pandocin addition topypandoc - IR material download — not all company IR pages provide direct PDF links (some use JavaScript viewers or require authentication); such documents must be placed manually in
{workdir}/Materials/IR Research/{company}/pdf/ - Claude-mode originals — pages are saved opportunistically; not all URLs in generated articles are guaranteed to be fetchable
- Fetched pages are size-limited — large responses are skipped to keep runs bounded
- Accuracy — AI-generated output requires human verification
This software is provided "as is", at your own risk. It is an independent, open-source tool, not affiliated with A.G. Lafley, Roger Martin, or the publishers of "Playing to Win." Output is AI-generated and is not investment, legal, or business advice.
- Web scraping — you are responsible for compliance with website Terms of Service
- Stage 4 (executive profiles) — disabled by default; collects personal data from public sources; you assume full responsibility for lawful processing under GDPR/CCPA
- Accuracy — outputs may contain inaccuracies or hallucinations; always validate independently
For full legal notices, see LEGAL.md. For privacy and personal data details, see PRIVACY.md.
This tool requires an active Anthropic subscription (Pro or above). All API calls go through your subscription.
MIT License.