Skip to content

biyachuev/strategy-research-copilot

Repository files navigation

Strategy Research CoPilot

A research copilot for structured company strategy analysis, using the "Playing to Win" strategic framework (originally described by A.G. Lafley & Roger Martin; see Disclaimers).

This is a research assistant, not an automatic strategy generator. It collects public materials, downloads and converts IR documents, runs web searches, and organizes findings into a structured draft — but every output requires human review and validation before use.

Prompt-driven Python orchestrator with editable prompt templates for Claude CLI.

Quick Start

cd strategy_research_copilot
python3 -m venv .venv
source .venv/bin/activate

# Optional: needed for DuckDuckGo mode and DOCX conversion
pip install -r requirements.txt

# Run — output goes to Runs/Apple/
python3 orchestrator.py "Apple"

Project Layout

strategy_research_copilot/
├── core_support.py
├── orchestrator.py
├── stages.py
├── pdf_to_md_claude.py          # PDF→MD converter (auto-detected)
├── prompts/
├── docs/
├── tests/
└── Runs/                        # All pipeline outputs
    └── {Company}/
        ├── _processing/         # Resume state, logs
        ├── sources.json         # URL→file manifest with tiers
        ├── Materials/
        │   ├── IR Research/     # IR materials (PDFs + converted MDs)
        │   │   └── {company}/
        │   │       ├── pdf/
        │   │       └── md/
        │   └── Articles/        # T{tier}_{domain}_{title}.md
        ├── Analytics/
        │   ├── Facts and Figures.md
        │   ├── Strategy Confidence {Company}.md
        │   └── Strategy Confidence {Company}.json
        └── Strategy {Company}.md

Detailed stage-level specification: docs/BUSINESS_LOGIC.md Stage interface contracts: docs/STAGE_CONTRACTS.md Research substage design: docs/research_substage_design.md


Prerequisites

  • Claude CLI — installed and authenticated (claude command available in PATH)
  • Active Anthropic subscription (Pro/Max) — all API calls go through your subscription
  • Python 3.10+

Optional: DuckDuckGo Search and Document Conversion

If you want to use --search-engine ddg or Stage 0 DOCX conversion:

pip install -r requirements.txt
# Installs: duckduckgo-search, trafilatura, requests, pypandoc

For DOCX conversion, install pandoc separately.

For PDF conversion, provide pdf_to_md_claude.py via --pdf-converter or place it next to orchestrator.py / on PATH.


Installation

cd strategy_research_copilot

# Optional but recommended
python3 -m venv .venv
source .venv/bin/activate

# For Claude WebSearch mode (default) — no extra Python deps needed
# For DuckDuckGo mode and DOCX conversion — install dependencies:
pip install -r requirements.txt

# For local tests
pip install -e ".[dev]"

Usage

# Basic run — output goes to Runs/Apple/
python3 orchestrator.py "Apple"

# Generate report in Russian
python3 orchestrator.py "Apple" --language ru

# Higher quality (Opus for key stages)
python3 orchestrator.py "Apple" --quality high

# Add focus context
python3 orchestrator.py "Samsung" --context "Focus on semiconductor division only"

# Division analysis with parent company
python3 orchestrator.py "AWS" --parent-company "Amazon"

# Skip web search (work only with local/IR documents)
python3 orchestrator.py "Apple" --skip-search

# Custom working directory
python3 orchestrator.py "Apple" --workdir ./custom_dir

# Resume from a specific stage
python3 orchestrator.py "Apple" --from-stage 4

# Enable executive profiles (Stage 4)
python3 orchestrator.py "Apple" --enable-stage-4

# Enable Stage 4 but auto-delete saved profile files after run
python3 orchestrator.py "Apple" --enable-stage-4 --auto-delete-profiles

# Skip IR research (use manually placed materials in Materials/IR Research/)
python3 orchestrator.py "Apple" --skip-research

# Research substage only (download + convert IR materials, no strategy generation)
python3 orchestrator.py "Apple" --research-only

# Clean up intermediate files after completion
python3 orchestrator.py "Apple" --clean

Development

# Run tests
pytest -q

# Validate syntax
python3 -m py_compile orchestrator.py

Options

Option Default Description
--language en Output language: en or ru (Stage 7 confidence report is always in English)
--workdir Runs/{company} Working directory for outputs
--parent-company Parent company name (for division analysis)
--context Additional context/scope for the analysis
--skip-search off Skip web search, use existing materials only
--from-stage 0 Resume from stage N (1–6)
--quality optimal optimal (Sonnet throughout) or high (Opus for key stages)
--model auto Override Claude model for all stages
--timeout 600 Timeout per Claude call in seconds
--search-engine auto claude (default for --quality high) or ddg (default for --quality optimal)
--enable-stage-4 off Enable Stage 4 (executive profiles)
--auto-delete-profiles off Delete saved Stage 4 profile files after pipeline completes; downstream outputs may still contain personal data
--pdf-converter auto Path to pdf_to_md_claude.py (auto-detected in project dir)
--research-dir {workdir}/Materials/IR Research Path to IR materials directory
--research-companies all Filter companies for Research substage (space-separated)
--skip-research off Skip Research substage (R0–R4); use when IR materials are already in place
--research-only off Run only Research substage (R0–R4), skip main pipeline
--max-articles 15 Maximum saved web research articles in DDG mode
--debug off Show DEBUG logs in console output
--max-context-chars 100000 Switch to two-pass synthesis above this estimated markdown size
--chunk-size 160000 Chunk size for two-pass synthesis
--research-workers 3 Parallel Claude workers in Stage 2 Claude mode
--download-workers 5 Parallel DDG search/download workers
--future-timeout 60 Timeout for parallel futures in search/research stages
--http-max-bytes 2000000 Max bytes read from fetched web pages
--pass1-workers 3 Parallel workers for two-pass extraction (Pass 1 chunks)
--stage4-workers 2 Parallel workers for Stage 4 people profiles
--stage4-min-local-files 3 Min targeted local files per person before web search escalation in Stage 4
--r0-claude-timeout 120 Timeout in seconds for Claude fallback in R0 IR discovery
--ir-domain Override R0: search this domain for IR documents
--ir-url Override R0: parse this URL as an IR page to extract PDF links
--ir-pdf-url Override R0: directly download this PDF URL
--clean off Delete intermediate files after completion

Pipeline

Stage Description Prompt Template
0 Convert PDF/DOCX to Markdown (pre-stage) external converter / pypandoc
R0 Discover & download IR materials (annual reports, presentations, earnings) prompts/research_discover_ir.md
R1–R3 Classify, detect page ranges, faithful PDF→MD conversion prompts/research_classify_document.md
R4 Mirror IR materials into working directory (skip if already inside workdir)
1 Scan materials, generate search queries/themes prompts/stage_1_scan_and_queries.md
2 Web research + source reliability tier classification (1–4) prompts/stage_2_web_research_ddg.md or stage_2_web_research_claude.md
3 Compile organizational structure prompts/stage_3_org_structure.md
4 Generate executive profiles (optional) prompts/stage_4_people_profiles.md
5 Collect facts and figures (with traceable citations) prompts/stage_5_facts_figures.md
6 Generate final strategy document (with [DECLARED/SUPPORTED/INFERRED] citations) prompts/stage_6_strategy_generation.md
7 Evidence-based confidence scoring (deterministic, zero Claude calls)

Research Substage (R0–R4)

The Research substage automatically finds and processes IR (Investor Relations) materials:

  • R0: Claude with WebSearch finds the company's IR page and downloads PDFs (annual reports, investor presentations, earnings press releases)
  • R1: Classifies downloaded documents by type
  • R2: Detects strategically relevant page ranges (filters out ESG/governance/audit sections in large annual reports)
  • R3: Converts relevant pages to faithful Markdown via pdf_to_md_claude.py (no compression or summarization — downstream stages receive original formulations)
  • R4: Mirrors IR materials into Runs/{company}/Materials/IR Research/{company}/pdf/ and .../md/ when --research-dir is external (no-op if IR materials already inside workdir)

IR materials are stored inside the working directory by default: Runs/{company}/Materials/IR Research/{company}/pdf/. Use --research-dir to override. Manually placed PDFs are not re-downloaded. If you have already downloaded and converted IR materials yourself, use --skip-research to skip the entire R0–R4 substage.

Design details: docs/research_substage_design.md

Output Files

All outputs are saved to Runs/{Company}/:

  • Strategy {Company}.md — main strategy document with traceable [^N] citations
  • Org Structure {Company}.md — organizational structure
  • Analytics/Facts and Figures.md — metrics and data points
  • Analytics/Strategy Confidence {Company}.md — evidence-based confidence report (always in English)
  • Analytics/Strategy Confidence {Company}.json — machine-readable scores
  • sources.json — URL→file mapping with metadata (tier, domain, document type)
  • Materials/IR Research/{company}/pdf/ — original PDFs
  • Materials/IR Research/{company}/md/ — faithful MD conversions
  • Materials/Articles/ — web research articles (T{tier}_{domain}_{title}.md)
  • Materials/People/ — executive profiles (if Stage 4 enabled)

Source Reliability Tiers

Every downloaded source is classified into one of 4 reliability tiers:

Tier Weight Description Examples
1 1.0 Official strategy materials IR presentations, annual reports, strategy updates
2 0.85 Official company communications Press releases on company website, earnings calls
3 0.7 Authoritative external sources Rating agencies, analysts, CEO interviews in media
4 0.3 Secondary sources Blogs, forums, aggregators, opinion pieces
  • IR documents (Stage R0) are auto-classified by type (tier 1–2)
  • Web articles (Stage 2) are classified by Claude during relevance check + domain-based fallback
  • Tiers are stored in sources.json and used by the Stage 7 confidence scorer

Citation Format

Strategy and Facts documents use traceable footnotes:

[^1]: [DECLARED] `Apple_Annual_Report_2024.md`, p. 12 — target revenue growth of 8% YoY
[^2]: [SUPPORTED] `Investor_Presentation_Q4_2024.md`, lines 80–95 — market share data
[^3]: [INFERRED] based on absence of any mention of enterprise segment in public materials

Evidence markers: [DECLARED] (company explicitly states), [SUPPORTED] (data supports), [INFERRED] (analyst reconstruction).

Confidence Scoring (Stage 7)

Deterministic, zero Claude calls. Computes per-section scores (0–100) from 5 signals:

score = 0.35 × source_quality + 0.25 × directness + 0.20 × corroboration
      + 0.10 × recency + 0.10 × citation_quality
      + strategic_doc_boost (up to +15)

Quality Modes

Stage optimal (default) high
Stage 1 — queries low / sonnet low / sonnet
Stage 2 — research medium / sonnet high / opus
Stage 3 — org structure medium / sonnet high / opus
Stage 4 — profiles medium / sonnet high / opus
Stage 5 — facts medium / sonnet medium / sonnet
Stage 6 — strategy high / sonnet high / opus

optimal — faster, Sonnet throughout. Good for drafts and iteration. high — Opus for research, org structure, profiles, and final strategy. Better for final-quality output.


Customizing Prompts

Edit prompt templates in the prompts/ directory:

# Example: modify Stage 1 prompt
vim prompts/stage_1_scan_and_queries.md

# Example: add custom section to strategy output
vim prompts/stage_6_strategy_generation.md

Prompt Template Variables

Available variables in templates:

  • {company} — Company name
  • {parent_company} — Parent company (if specified)
  • {context} — Additional context note
  • {summaries} — Short previews of local markdown materials (Stage 1)
  • {search_mode_instruction} — Search-mode contract for Stage 1
  • {theme} — Research theme (Stage 2)
  • {url}, {title}, {text} — Article metadata and extracted page text (Stage 2 DDG mode)
  • {org_structure_path} — Path to the Stage 3 output (Stage 4 selection step)
  • {name}, {position} — Executive info (Stage 4)
  • {web_search_enabled}, {web_search_instruction} — Stage 4 search policy controls
  • {language_instruction} — Language suffix ("Write in Russian" if --language ru)
  • {filename}, {total_pages}, {preview_pages} — PDF metadata (Research substage R2 classification prompt)

Resume Support

Completed stages are tracked in _processing/completed_stages.json. To resume:

# Resume from Stage 4 (skip 1-3)
python3 orchestrator.py "Apple" --from-stage 4

To force a full clean run, delete _processing/:

rm -rf Runs/Apple/_processing

Approximate Cost

Each run makes 20–60+ Claude API calls depending on stages and material volume.

All calls go through your Anthropic subscription (Pro/Max). No additional API billing.

Scenario --quality optimal --quality high
--skip-search, small company ~10 calls ~10 calls
Full run, default ~30 calls ~30 calls (4-5 with Opus)
Large company, many materials ~50+ calls ~50+ calls (10+ with Opus)

Limitations

  • PDF conversion — place pdf_to_md_claude.py next to orchestrator.py for auto-detection, or specify via --pdf-converter
  • DOCX conversion requires pandoc in addition to pypandoc
  • IR material download — not all company IR pages provide direct PDF links (some use JavaScript viewers or require authentication); such documents must be placed manually in {workdir}/Materials/IR Research/{company}/pdf/
  • Claude-mode originals — pages are saved opportunistically; not all URLs in generated articles are guaranteed to be fetchable
  • Fetched pages are size-limited — large responses are skipped to keep runs bounded
  • Accuracy — AI-generated output requires human verification

Disclaimers

This software is provided "as is", at your own risk. It is an independent, open-source tool, not affiliated with A.G. Lafley, Roger Martin, or the publishers of "Playing to Win." Output is AI-generated and is not investment, legal, or business advice.

  • Web scraping — you are responsible for compliance with website Terms of Service
  • Stage 4 (executive profiles) — disabled by default; collects personal data from public sources; you assume full responsibility for lawful processing under GDPR/CCPA
  • Accuracy — outputs may contain inaccuracies or hallucinations; always validate independently

For full legal notices, see LEGAL.md. For privacy and personal data details, see PRIVACY.md.

This tool requires an active Anthropic subscription (Pro or above). All API calls go through your subscription.


License

MIT License.

About

Prompt-driven orchestrator for structured company strategy research using the Playing to Win framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages