Extract structured data from SEC EDGAR 10-K filings using LLMs (Claude or GPT-4o).
Downloads filings directly from EDGAR, extracts key sections (Business, Risk Factors, MD&A), and uses an LLM to produce clean, validated JSON with Pydantic models.
$ sec-parser blackstone
┌─────────────────────────────────────────────────────────┐
│ Filing Metadata │
│ Blackstone Inc. (BX) │
│ CIK: 1393818 | Filed: 2026-02-27 | SIC: 6282 │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Business Overview │
│ World's largest alternative asset manager with $1.13T │
│ in AUM across PE, real estate, credit & hedge funds. │
│ │
│ Industry: Alternative Asset Management │
│ Employees: 5,050 │
└─────────────────────────────────────────────────────────┘
┌───────────────────────┬─────────────────┬──────────┐
│ Key Financials │ Value │ Period │
├───────────────────────┼─────────────────┼──────────┤
│ Total AUM │ $1.127 trillion │ FY2025 │
│ Total Revenue │ $7.2 billion │ FY2025 │
│ Distributable Earnings│ $5.3 billion │ FY2025 │
│ Fee-Related Earnings │ $4.1 billion │ FY2025 │
└───────────────────────┴─────────────────┴──────────┘
┌──────────────────┬────────────────────────────┬──────────┐
│ Top Risk Factors │ Summary │ Severity │
├──────────────────┼────────────────────────────┼──────────┤
│ Market Risk │ Geopolitical conditions... │ high │
│ Interest Rate │ Slower rate decreases... │ high │
│ Cybersecurity │ Data loss, interruptions...│ high │
│ AI & Technology │ AI disruption risks... │ medium │
└──────────────────┴────────────────────────────┴──────────┘
# Install
git clone https://github.com/YOUR_USERNAME/sec-filing-parser.git
cd sec-filing-parser
# Set API key (one of these)
export ANTHROPIC_API_KEY="sk-ant-..."
# or
export OPENAI_API_KEY="sk-..."
# Run
uv run python -m sec_parser.cli blackstone
uv run python -m sec_parser.cli apollo --output apollo.json
uv run python -m sec_parser.cli 1318605 # Tesla by CIKEDGAR API ──→ Download 10-K ──→ Extract Sections ──→ LLM Extraction ──→ Validated JSON
(free) (full text) (Item 1, 1A, 7) (Claude/GPT-4o) (Pydantic models)
- EDGAR Client — Fetches filing metadata and full text from SEC EDGAR (no API key needed)
- Section Extraction — Regex-based extraction of Items 1 (Business), 1A (Risk Factors), 7 (MD&A)
- LLM Extraction — Sends sections to Claude or GPT-4o with a structured prompt
- Validation — Pydantic models enforce schema, types, and required fields
sec_parser/
├── models.py # Pydantic schemas (FilingMetadata, ExtractedFiling, etc.)
├── edgar.py # EDGAR API client (search, download, section extraction)
├── extractor.py # LLM-based structured extraction (Anthropic + OpenAI)
└── cli.py # CLI with Rich tables output
examples/
├── blackstone_10k.json # Blackstone Inc. 10-K extraction
└── apollo_10k.json # Apollo Asset Management 10-K extraction
ExtractedFiling(
metadata=FilingMetadata(company_name, cik, ticker, filing_type, filing_date, ...),
business=BusinessDescription(summary, industry, products_services, geographic_presence, ...),
key_financials=[FinancialMetric(name, value, period), ...],
risk_factors=[RiskFactor(category, summary, severity), ...],
executives=[ExecutiveOfficer(name, title, age), ...],
raw_sections={...}
)- Python 3.11+ — async throughout
- httpx — async HTTP for EDGAR API
- BeautifulSoup + lxml — HTML parsing of filings
- Anthropic SDK / OpenAI SDK — LLM extraction
- Pydantic v2 — schema validation
- Rich — terminal output
See examples/ for ready-made extractions from Blackstone and Apollo 10-K filings — no API key required to inspect the output format.
Each filing extraction costs ~$0.05-0.06 USD (~15K input tokens + 1K output tokens).
MIT