Strategy Research CoPilot

A research copilot for structured company strategy analysis, using the "Playing to Win" strategic framework (originally described by A.G. Lafley & Roger Martin; see Disclaimers).

This is a research assistant, not an automatic strategy generator. It collects public materials, downloads and converts IR documents, runs web searches, and organizes findings into a structured draft — but every output requires human review and validation before use.

Prompt-driven Python orchestrator with editable prompt templates for Claude CLI.

Quick Start

cd strategy_research_copilot
python3 -m venv .venv
source .venv/bin/activate

# Optional: needed for DuckDuckGo mode and DOCX conversion
pip install -r requirements.txt

# Run — output goes to Runs/Apple/
python3 orchestrator.py "Apple"

Project Layout

strategy_research_copilot/
├── core_support.py
├── orchestrator.py
├── stages.py
├── pdf_to_md_claude.py          # PDF→MD converter (auto-detected)
├── prompts/
├── docs/
├── tests/
└── Runs/                        # All pipeline outputs
    └── {Company}/
        ├── _processing/         # Resume state, logs
        ├── sources.json         # URL→file manifest with tiers
        ├── Materials/
        │   ├── IR Research/     # IR materials (PDFs + converted MDs)
        │   │   └── {company}/
        │   │       ├── pdf/
        │   │       └── md/
        │   └── Articles/        # T{tier}_{domain}_{title}.md
        ├── Analytics/
        │   ├── Facts and Figures.md
        │   ├── Strategy Confidence {Company}.md
        │   └── Strategy Confidence {Company}.json
        └── Strategy {Company}.md

Detailed stage-level specification: docs/BUSINESS_LOGIC.md Stage interface contracts: docs/STAGE_CONTRACTS.md Research substage design: docs/research_substage_design.md

Prerequisites

Claude CLI — installed and authenticated (claude command available in PATH)
Active Anthropic subscription (Pro/Max) — all API calls go through your subscription
Python 3.10+

Optional: DuckDuckGo Search and Document Conversion

If you want to use --search-engine ddg or Stage 0 DOCX conversion:

pip install -r requirements.txt
# Installs: duckduckgo-search, trafilatura, requests, pypandoc

For DOCX conversion, install pandoc separately.

For PDF conversion, provide pdf_to_md_claude.py via --pdf-converter or place it next to orchestrator.py / on PATH.

Installation

cd strategy_research_copilot

# Optional but recommended
python3 -m venv .venv
source .venv/bin/activate

# For Claude WebSearch mode (default) — no extra Python deps needed
# For DuckDuckGo mode and DOCX conversion — install dependencies:
pip install -r requirements.txt

# For local tests
pip install -e ".[dev]"

Usage

# Basic run — output goes to Runs/Apple/
python3 orchestrator.py "Apple"

# Generate report in Russian
python3 orchestrator.py "Apple" --language ru

# Higher quality (Opus for key stages)
python3 orchestrator.py "Apple" --quality high

# Add focus context
python3 orchestrator.py "Samsung" --context "Focus on semiconductor division only"

# Division analysis with parent company
python3 orchestrator.py "AWS" --parent-company "Amazon"

# Skip web search (work only with local/IR documents)
python3 orchestrator.py "Apple" --skip-search

# Custom working directory
python3 orchestrator.py "Apple" --workdir ./custom_dir

# Resume from a specific stage
python3 orchestrator.py "Apple" --from-stage 4

# Enable executive profiles (Stage 4)
python3 orchestrator.py "Apple" --enable-stage-4

# Enable Stage 4 but auto-delete saved profile files after run
python3 orchestrator.py "Apple" --enable-stage-4 --auto-delete-profiles

# Skip IR research (use manually placed materials in Materials/IR Research/)
python3 orchestrator.py "Apple" --skip-research

# Research substage only (download + convert IR materials, no strategy generation)
python3 orchestrator.py "Apple" --research-only

# Clean up intermediate files after completion
python3 orchestrator.py "Apple" --clean

Development

# Run tests
pytest -q

# Validate syntax
python3 -m py_compile orchestrator.py

Options

Option	Default	Description
`--language`	`en`	Output language: `en` or `ru` (Stage 7 confidence report is always in English)
`--workdir`	`Runs/{company}`	Working directory for outputs
`--parent-company`	—	Parent company name (for division analysis)
`--context`	—	Additional context/scope for the analysis
`--skip-search`	off	Skip web search, use existing materials only
`--from-stage`	0	Resume from stage N (1–6)
`--quality`	`optimal`	`optimal` (Sonnet throughout) or `high` (Opus for key stages)
`--model`	auto	Override Claude model for all stages
`--timeout`	`600`	Timeout per Claude call in seconds
`--search-engine`	auto	`claude` (default for `--quality high`) or `ddg` (default for `--quality optimal`)
`--enable-stage-4`	off	Enable Stage 4 (executive profiles)
`--auto-delete-profiles`	off	Delete saved Stage 4 profile files after pipeline completes; downstream outputs may still contain personal data
`--pdf-converter`	auto	Path to `pdf_to_md_claude.py` (auto-detected in project dir)
`--research-dir`	`{workdir}/Materials/IR Research`	Path to IR materials directory
`--research-companies`	all	Filter companies for Research substage (space-separated)
`--skip-research`	off	Skip Research substage (R0–R4); use when IR materials are already in place
`--research-only`	off	Run only Research substage (R0–R4), skip main pipeline
`--max-articles`	`15`	Maximum saved web research articles in DDG mode
`--debug`	off	Show DEBUG logs in console output
`--max-context-chars`	`100000`	Switch to two-pass synthesis above this estimated markdown size
`--chunk-size`	`160000`	Chunk size for two-pass synthesis
`--research-workers`	`3`	Parallel Claude workers in Stage 2 Claude mode
`--download-workers`	`5`	Parallel DDG search/download workers
`--future-timeout`	`60`	Timeout for parallel futures in search/research stages
`--http-max-bytes`	`2000000`	Max bytes read from fetched web pages
`--pass1-workers`	`3`	Parallel workers for two-pass extraction (Pass 1 chunks)
`--stage4-workers`	`2`	Parallel workers for Stage 4 people profiles
`--stage4-min-local-files`	`3`	Min targeted local files per person before web search escalation in Stage 4
`--r0-claude-timeout`	`120`	Timeout in seconds for Claude fallback in R0 IR discovery
`--ir-domain`	—	Override R0: search this domain for IR documents
`--ir-url`	—	Override R0: parse this URL as an IR page to extract PDF links
`--ir-pdf-url`	—	Override R0: directly download this PDF URL
`--clean`	off	Delete intermediate files after completion

Pipeline

Stage	Description	Prompt Template
0	Convert PDF/DOCX to Markdown (pre-stage)	external converter / `pypandoc`
R0	Discover & download IR materials (annual reports, presentations, earnings)	`prompts/research_discover_ir.md`
R1–R3	Classify, detect page ranges, faithful PDF→MD conversion	`prompts/research_classify_document.md`
R4	Mirror IR materials into working directory (skip if already inside workdir)	—
1	Scan materials, generate search queries/themes	`prompts/stage_1_scan_and_queries.md`
2	Web research + source reliability tier classification (1–4)	`prompts/stage_2_web_research_ddg.md` or `stage_2_web_research_claude.md`
3	Compile organizational structure	`prompts/stage_3_org_structure.md`
4	Generate executive profiles (optional)	`prompts/stage_4_people_profiles.md`
5	Collect facts and figures (with traceable citations)	`prompts/stage_5_facts_figures.md`
6	Generate final strategy document (with `[DECLARED/SUPPORTED/INFERRED]` citations)	`prompts/stage_6_strategy_generation.md`
7	Evidence-based confidence scoring (deterministic, zero Claude calls)	—

Research Substage (R0–R4)

The Research substage automatically finds and processes IR (Investor Relations) materials:

R0: Claude with WebSearch finds the company's IR page and downloads PDFs (annual reports, investor presentations, earnings press releases)
R1: Classifies downloaded documents by type
R2: Detects strategically relevant page ranges (filters out ESG/governance/audit sections in large annual reports)
R3: Converts relevant pages to faithful Markdown via pdf_to_md_claude.py (no compression or summarization — downstream stages receive original formulations)
R4: Mirrors IR materials into Runs/{company}/Materials/IR Research/{company}/pdf/ and .../md/ when --research-dir is external (no-op if IR materials already inside workdir)

IR materials are stored inside the working directory by default: Runs/{company}/Materials/IR Research/{company}/pdf/. Use --research-dir to override. Manually placed PDFs are not re-downloaded. If you have already downloaded and converted IR materials yourself, use --skip-research to skip the entire R0–R4 substage.

Design details: docs/research_substage_design.md

Output Files

All outputs are saved to Runs/{Company}/:

Strategy {Company}.md — main strategy document with traceable [^N] citations
Org Structure {Company}.md — organizational structure
Analytics/Facts and Figures.md — metrics and data points
Analytics/Strategy Confidence {Company}.md — evidence-based confidence report (always in English)
Analytics/Strategy Confidence {Company}.json — machine-readable scores
sources.json — URL→file mapping with metadata (tier, domain, document type)
Materials/IR Research/{company}/pdf/ — original PDFs
Materials/IR Research/{company}/md/ — faithful MD conversions
Materials/Articles/ — web research articles (T{tier}_{domain}_{title}.md)
Materials/People/ — executive profiles (if Stage 4 enabled)

Source Reliability Tiers

Every downloaded source is classified into one of 4 reliability tiers:

Tier	Weight	Description	Examples
1	1.0	Official strategy materials	IR presentations, annual reports, strategy updates
2	0.85	Official company communications	Press releases on company website, earnings calls
3	0.7	Authoritative external sources	Rating agencies, analysts, CEO interviews in media
4	0.3	Secondary sources	Blogs, forums, aggregators, opinion pieces

IR documents (Stage R0) are auto-classified by type (tier 1–2)
Web articles (Stage 2) are classified by Claude during relevance check + domain-based fallback
Tiers are stored in sources.json and used by the Stage 7 confidence scorer

Citation Format

Strategy and Facts documents use traceable footnotes:

[^1]: [DECLARED] `Apple_Annual_Report_2024.md`, p. 12 — target revenue growth of 8% YoY
[^2]: [SUPPORTED] `Investor_Presentation_Q4_2024.md`, lines 80–95 — market share data
[^3]: [INFERRED] based on absence of any mention of enterprise segment in public materials

Evidence markers: [DECLARED] (company explicitly states), [SUPPORTED] (data supports), [INFERRED] (analyst reconstruction).

Confidence Scoring (Stage 7)

Deterministic, zero Claude calls. Computes per-section scores (0–100) from 5 signals:

score = 0.35 × source_quality + 0.25 × directness + 0.20 × corroboration
      + 0.10 × recency + 0.10 × citation_quality
      + strategic_doc_boost (up to +15)

Quality Modes

Stage	`optimal` (default)	`high`
Stage 1 — queries	low / sonnet	low / sonnet
Stage 2 — research	medium / sonnet	high / opus
Stage 3 — org structure	medium / sonnet	high / opus
Stage 4 — profiles	medium / sonnet	high / opus
Stage 5 — facts	medium / sonnet	medium / sonnet
Stage 6 — strategy	high / sonnet	high / opus

optimal — faster, Sonnet throughout. Good for drafts and iteration. high — Opus for research, org structure, profiles, and final strategy. Better for final-quality output.

Customizing Prompts

Edit prompt templates in the prompts/ directory:

# Example: modify Stage 1 prompt
vim prompts/stage_1_scan_and_queries.md

# Example: add custom section to strategy output
vim prompts/stage_6_strategy_generation.md

Prompt Template Variables

Available variables in templates:

{company} — Company name
{parent_company} — Parent company (if specified)
{context} — Additional context note
{summaries} — Short previews of local markdown materials (Stage 1)
{search_mode_instruction} — Search-mode contract for Stage 1
{theme} — Research theme (Stage 2)
{url}, {title}, {text} — Article metadata and extracted page text (Stage 2 DDG mode)
{org_structure_path} — Path to the Stage 3 output (Stage 4 selection step)
{name}, {position} — Executive info (Stage 4)
{web_search_enabled}, {web_search_instruction} — Stage 4 search policy controls
{language_instruction} — Language suffix ("Write in Russian" if --language ru)
{filename}, {total_pages}, {preview_pages} — PDF metadata (Research substage R2 classification prompt)

Resume Support

Completed stages are tracked in _processing/completed_stages.json. To resume:

# Resume from Stage 4 (skip 1-3)
python3 orchestrator.py "Apple" --from-stage 4

To force a full clean run, delete _processing/:

rm -rf Runs/Apple/_processing

Approximate Cost

Each run makes 20–60+ Claude API calls depending on stages and material volume.

All calls go through your Anthropic subscription (Pro/Max). No additional API billing.

Scenario	`--quality optimal`	`--quality high`
`--skip-search`, small company	~10 calls	~10 calls
Full run, default	~30 calls	~30 calls (4-5 with Opus)
Large company, many materials	~50+ calls	~50+ calls (10+ with Opus)

Limitations

PDF conversion — place pdf_to_md_claude.py next to orchestrator.py for auto-detection, or specify via --pdf-converter
DOCX conversion requires pandoc in addition to pypandoc
IR material download — not all company IR pages provide direct PDF links (some use JavaScript viewers or require authentication); such documents must be placed manually in {workdir}/Materials/IR Research/{company}/pdf/
Claude-mode originals — pages are saved opportunistically; not all URLs in generated articles are guaranteed to be fetchable
Fetched pages are size-limited — large responses are skipped to keep runs bounded
Accuracy — AI-generated output requires human verification

Disclaimers

This software is provided "as is", at your own risk. It is an independent, open-source tool, not affiliated with A.G. Lafley, Roger Martin, or the publishers of "Playing to Win." Output is AI-generated and is not investment, legal, or business advice.

Web scraping — you are responsible for compliance with website Terms of Service
Stage 4 (executive profiles) — disabled by default; collects personal data from public sources; you assume full responsibility for lawful processing under GDPR/CCPA
Accuracy — outputs may contain inaccuracies or hallucinations; always validate independently

For full legal notices, see LEGAL.md. For privacy and personal data details, see PRIVACY.md.

This tool requires an active Anthropic subscription (Pro or above). All API calls go through your subscription.

License

MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Strategy Research CoPilot

Quick Start

Project Layout

Prerequisites

Optional: DuckDuckGo Search and Document Conversion

Installation

Usage

Development

Options

Pipeline

Research Substage (R0–R4)

Output Files

Source Reliability Tiers

Citation Format

Confidence Scoring (Stage 7)

Quality Modes

Customizing Prompts

Prompt Template Variables

Resume Support

Approximate Cost

Limitations

Disclaimers

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
prompts		prompts
tests		tests
.gitignore		.gitignore
LEGAL.md		LEGAL.md
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
README.md		README.md
core_support.py		core_support.py
orchestrator.py		orchestrator.py
pdf_to_md_claude.py		pdf_to_md_claude.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
stages.py		stages.py

Folders and files

Latest commit

History

Repository files navigation

Strategy Research CoPilot

Quick Start

Project Layout

Prerequisites

Optional: DuckDuckGo Search and Document Conversion

Installation

Usage

Development

Options

Pipeline

Research Substage (R0–R4)

Output Files

Source Reliability Tiers

Citation Format

Confidence Scoring (Stage 7)

Quality Modes

Customizing Prompts

Prompt Template Variables

Resume Support

Approximate Cost

Limitations

Disclaimers

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages