Objective: Predict M&A target events using machine learning on fundamental financial data.
MnA_Prediction/
βββ README.md
βββ src/ # Source code
β βββ mna_colab_pipeline.py # Main Colab notebook
β βββ feature_engineering.py # Original reference script
β
βββ data/ # Data files (on GitHub)
β βββ deals/
β β βββ dma_corpus_metadata_with_factset_id.csv # 2000-2020
β β βββ factset_xls/ # 2000-2025
β β βββ 2000to05Batch1.xls
β β βββ ...
β βββ fundamentals/
β βββ compustat_funda_2000on.csv
β
βββ archive/ # Old/unused files
- Open
src/mna_colab_pipeline.pyin Google Colab - Ensure your Google Drive contains:
fundq_full.parquet(quarterly Compustat, ~564 MB)funda_full.parquet(annual Compustat, ~200 MB)
- Run all cells
| Source | Location | Coverage |
|---|---|---|
| Compustat (fundq/funda) | Google Drive | Through ~2020 |
| DMA Corpus | data/deals/ |
2000-2020 |
| FactSet XLS | data/deals/factset_xls/factset_2000_2025/ |
2000-2025 |
- Multi-horizon labeling (3m-24m targets)
- Probability calibration (prior correction + isotonic)
- Event study verification
- S&P 500 benchmarking
- Schema-preserving data extension
MIT