Skip to content

Latest commit

 

History

History
72 lines (61 loc) · 2.36 KB

File metadata and controls

72 lines (61 loc) · 2.36 KB

Download Filings

Our filing downloader supports downloading with multiple companies (multiple CIKs) and form types. The --cik flag can be a single value, comma-separated list, or JSON file with a list, while the --form_types flag can be a single value or comma-separated list. To download the Form 10-K filings, you can use the following code:

  1. Single CIK + Single Form

    python3 download_filings.py \
        --cik "320193" \
        --form_type "10-K" \
        --start_date "2023-01-01" \
        --end_date "2025-01-01" \
        --exclude_exhibits
  2. Single CIK + Multiple Forms

    python3 download_filings.py \
        --cik "320193" \
        --form_type "10-K,10-Q" \
        --start_date "2023-01-01" \
        --end_date "2025-01-01" \
        --exclude_exhibits
  3. Multiple CIKs (comma-separated) + Single Form

    python3 download_filings.py \
        --cik "320193,789019" \
        --form_type "10-K" \
        --start_date "2023-01-01" \
        --end_date "2025-01-01" \
        --exclude_exhibits
  4. Multiple CIKs (JSON file) + Single Form

    python3 download_filings.py \
        --cik cik_list.json \
        --form_type "10-K" \
        --start_date "2023-01-01" \
        --end_date "2025-01-01" \
        --exclude_exhibits
  5. Multiple CIKs + Multiple Forms

    python3 download_filings.py \
        --cik "320193,789019" \
        --form_type "10-K,10-Q" \
        --start_date "2023-01-01" \
        --end_date "2025-01-01" \
        --exclude_exhibits

Tip

Enabling --exclude_exhibits can reduce the raw data size by approximately 80%.

Preprocess Form 10-K Filings

To preprocessing Form 10-K filings into JSONL files, you can follow the following template

python3 preprocess_filings.py \
    --raw_data_path path/to/raw_data_filing/ \
    --company_lookup_file path/to/company_lookup_file/ \
    --output_dir "./filing"

Important

While Form 10-K filings are regulatory documents, their layout can vary across companies. Additionally, some companies may choose to disclose information in supplementary sections that appear after Item 16. In such cases, the preprocessor may not function as intended. It may be necessary to manually process the filing using the original source or the intermediate results saved by the BasePreprocessor.