Skip to content

jomen93/sec-filing-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEC Filing Parser

Extract structured data from SEC EDGAR 10-K filings using LLMs (Claude or GPT-4o).

Downloads filings directly from EDGAR, extracts key sections (Business, Risk Factors, MD&A), and uses an LLM to produce clean, validated JSON with Pydantic models.

Output Example

$ sec-parser blackstone

┌─────────────────────────────────────────────────────────┐
│ Filing Metadata                                         │
│ Blackstone Inc. (BX)                                    │
│ CIK: 1393818 | Filed: 2026-02-27 | SIC: 6282          │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ Business Overview                                       │
│ World's largest alternative asset manager with $1.13T   │
│ in AUM across PE, real estate, credit & hedge funds.    │
│                                                         │
│ Industry: Alternative Asset Management                  │
│ Employees: 5,050                                        │
└─────────────────────────────────────────────────────────┘

┌───────────────────────┬─────────────────┬──────────┐
│ Key Financials        │ Value           │ Period   │
├───────────────────────┼─────────────────┼──────────┤
│ Total AUM             │ $1.127 trillion │ FY2025   │
│ Total Revenue         │ $7.2 billion    │ FY2025   │
│ Distributable Earnings│ $5.3 billion    │ FY2025   │
│ Fee-Related Earnings  │ $4.1 billion    │ FY2025   │
└───────────────────────┴─────────────────┴──────────┘

┌──────────────────┬────────────────────────────┬──────────┐
│ Top Risk Factors │ Summary                    │ Severity │
├──────────────────┼────────────────────────────┼──────────┤
│ Market Risk      │ Geopolitical conditions... │ high     │
│ Interest Rate    │ Slower rate decreases...   │ high     │
│ Cybersecurity    │ Data loss, interruptions...│ high     │
│ AI & Technology  │ AI disruption risks...     │ medium   │
└──────────────────┴────────────────────────────┴──────────┘

Quick Start

# Install
git clone https://github.com/YOUR_USERNAME/sec-filing-parser.git
cd sec-filing-parser

# Set API key (one of these)
export ANTHROPIC_API_KEY="sk-ant-..."
# or
export OPENAI_API_KEY="sk-..."

# Run
uv run python -m sec_parser.cli blackstone
uv run python -m sec_parser.cli apollo --output apollo.json
uv run python -m sec_parser.cli 1318605  # Tesla by CIK

How It Works

EDGAR API ──→ Download 10-K ──→ Extract Sections ──→ LLM Extraction ──→ Validated JSON
  (free)       (full text)     (Item 1, 1A, 7)    (Claude/GPT-4o)    (Pydantic models)
  1. EDGAR Client — Fetches filing metadata and full text from SEC EDGAR (no API key needed)
  2. Section Extraction — Regex-based extraction of Items 1 (Business), 1A (Risk Factors), 7 (MD&A)
  3. LLM Extraction — Sends sections to Claude or GPT-4o with a structured prompt
  4. Validation — Pydantic models enforce schema, types, and required fields

Project Structure

sec_parser/
├── models.py       # Pydantic schemas (FilingMetadata, ExtractedFiling, etc.)
├── edgar.py        # EDGAR API client (search, download, section extraction)
├── extractor.py    # LLM-based structured extraction (Anthropic + OpenAI)
└── cli.py          # CLI with Rich tables output
examples/
├── blackstone_10k.json   # Blackstone Inc. 10-K extraction
└── apollo_10k.json       # Apollo Asset Management 10-K extraction

Extracted Data Schema

ExtractedFiling(
    metadata=FilingMetadata(company_name, cik, ticker, filing_type, filing_date, ...),
    business=BusinessDescription(summary, industry, products_services, geographic_presence, ...),
    key_financials=[FinancialMetric(name, value, period), ...],
    risk_factors=[RiskFactor(category, summary, severity), ...],
    executives=[ExecutiveOfficer(name, title, age), ...],
    raw_sections={...}
)

Stack

  • Python 3.11+ — async throughout
  • httpx — async HTTP for EDGAR API
  • BeautifulSoup + lxml — HTML parsing of filings
  • Anthropic SDK / OpenAI SDK — LLM extraction
  • Pydantic v2 — schema validation
  • Rich — terminal output

Pre-generated Examples

See examples/ for ready-made extractions from Blackstone and Apollo 10-K filings — no API key required to inspect the output format.

Cost

Each filing extraction costs ~$0.05-0.06 USD (~15K input tokens + 1K output tokens).

License

MIT

About

Extract structured data from SEC EDGAR 10-K filings using LLMs (Claude/GPT-4o) + Pydantic v2 validation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages