Skip to content

dhardestylewis/MnA_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

M&A Prediction Pipeline

Objective: Predict M&A target events using machine learning on fundamental financial data.

πŸ“‚ Directory Structure

MnA_Prediction/
β”œβ”€β”€ README.md
β”œβ”€β”€ src/                           # Source code
β”‚   β”œβ”€β”€ mna_colab_pipeline.py      # Main Colab notebook
β”‚   └── feature_engineering.py     # Original reference script
β”‚
β”œβ”€β”€ data/                          # Data files (on GitHub)
β”‚   β”œβ”€β”€ deals/
β”‚   β”‚   β”œβ”€β”€ dma_corpus_metadata_with_factset_id.csv  # 2000-2020
β”‚   β”‚   └── factset_xls/                             # 2000-2025
β”‚   β”‚       β”œβ”€β”€ 2000to05Batch1.xls
β”‚   β”‚       └── ...
β”‚   └── fundamentals/
β”‚       └── compustat_funda_2000on.csv
β”‚
└── archive/                       # Old/unused files

πŸš€ Quick Start (Google Colab)

  1. Open src/mna_colab_pipeline.py in Google Colab
  2. Ensure your Google Drive contains:
    • fundq_full.parquet (quarterly Compustat, ~564 MB)
    • funda_full.parquet (annual Compustat, ~200 MB)
  3. Run all cells

Data Sources

Source Location Coverage
Compustat (fundq/funda) Google Drive Through ~2020
DMA Corpus data/deals/ 2000-2020
FactSet XLS data/deals/factset_xls/factset_2000_2025/ 2000-2025

Pipeline Features

  • Multi-horizon labeling (3m-24m targets)
  • Probability calibration (prior correction + isotonic)
  • Event study verification
  • S&P 500 benchmarking
  • Schema-preserving data extension

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages