An end-to-end machine learning system extracting alpha signals from S&P 500 earnings call transcripts.
π₯οΈ Interactive Dashboard β’ π Streamlit Demo β’ π Technical Report β’ π Results
Every quarter, 500+ S&P 500 companies hold earnings calls where management presents results and analysts ask probing questions. The linguistic content of these calls β management tone, analyst skepticism, guidance specificity β may contain signals that markets don't fully price immediately.
FinSight processes 14,584 earnings transcripts across 601 S&P 500 companies (2018β2024), extracts 34 NLP features using FinBERT and RAG, and trains walk-forward validated ML models to predict 5-day and 20-day post-earnings stock returns.
14,584 Earnings Transcripts (2018β2024)
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Stage 1 β Data Ingestion β
β HuggingFace datasets + yfinance β
β 601 companies Β· 1M+ price rows β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Stage 2 β NLP Feature Extraction β
β β
β FinBERT (ProsusAI) β
β Β· Sentence-level sentiment β
β Β· Mgmt prepared remarks vs Q&A β
β Β· 14 sentiment features β
β β
β RAG Pipeline (all-MiniLM-L6-v2) β
β Β· 380,507 embedded chunks β
β Β· 5 structured semantic queries β
β Β· 10 relevance + content features β
β β
β Output: 34 features Β· 13,442 rows β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Stage 3 β Prediction Models β
β β
β Β· Baseline (Logistic Regression) β
β Β· FinBERT-only (XGBoost) β
β Β· RAG-only (XGBoost) β
β Β· XGBoost (all 34 features) β
β Β· LightGBM (all 34 features) β
β
β Β· LSTM (temporal 6-quarter seq) β
β β
β Walk-forward CV Β· Zero leakage β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Stage 4 β Backtesting β
β Long-short quartile portfolio β
β 5-day and 20-day holding periods β
β 10bps round-trip transaction cost β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Stage 5 β Sector Analysis β
β GICS sector-level walk-forward β
β Energy IC = +0.311 (best) β
β Technology IC β 0 (efficient) β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Stage 6 β Dashboard + Report β
β Next.js Β· Streamlit Β· HF Spaces β
β 8-page technical report β
βββββββββββββββββββββββββββββββββββββββ
Train on years Tβ3 to Tβ1, test on year T. Zero data leakage.
| Model | IC Mean | IC Std | Hit Rate | AUC |
|---|---|---|---|---|
| Baseline | 0.0429 | 0.1141 |
0.5312 | 0.5174 |
| LightGBM β | 0.0198 | 0.0085 | 0.5329 | 0.5086 |
| LSTM | 0.0153 | 0.0211 | 0.5471 | 0.5060 |
| XGBoost | 0.0141 | 0.0180 | 0.5321 | 0.5099 |
| RAG Only | 0.0000 | 0.0295 | 0.5347 | 0.5086 |
| FinBERT Only | -0.0044 | 0.0117 | 0.5312 | 0.5007 |
IC = Information Coefficient (Pearson correlation of predictions vs actual 5-day returns). LightGBM is 10Γ more stable than baseline (std=0.009 vs std=0.114). LSTM achieves the highest hit rate (54.7%) β best for directional prediction.
| Rank | Feature | Group | Mean |SHAP| | Insight |
|---|---|---|---|---|
| 1 | qa_neg_ratio |
QA FinBERT | 0.0541 | Analyst pushback > management positivity |
| 2 | mgmt_sent_vol |
Mgmt FinBERT | 0.0476 | Inconsistent messaging = larger price moves |
| 3 | qa_n_sentences |
QA FinBERT | 0.0453 | Longer Q&A = more analyst scrutiny |
| 4 | mgmt_mean_neu |
Mgmt FinBERT | 0.0445 | Deliberate neutrality = hedging signal |
| 5 | rag_guidance_specificity_relevance |
RAG | 0.0420 | Specific guidance = clearer market reaction |
| Rank | Sector | IC Mean | IC Std | AUC |
|---|---|---|---|---|
| 1 | Energy β | +0.3111 | 0.2430 | 0.6393 |
| 2 | Real Estate | +0.0779 | 0.2861 | 0.5089 |
| 3 | Industrials | +0.0738 | 0.0359 | 0.5625 |
| 4 | Utilities | +0.0644 | 0.1428 | 0.4703 |
| 5 | Consumer Staples | +0.0613 | 0.1452 | 0.5212 |
| 9 | Technology | +0.0037 | 0.0983 | 0.4874 |
| 11 | Materials | -0.1321 | 0.2903 | 0.4958 |
Key finding: Energy IC = 0.311 is 83Γ stronger than Technology IC β 0.004. Consistent with efficient market hypothesis by sector β Technology is efficiently priced, Energy has high information asymmetry from commodity price exposure.
| Metric | 5-Day | 20-Day |
|---|---|---|
| Annualized Return | -0.91% | -0.69% |
| Sharpe Ratio | -0.81 | -0.23 (+3.6Γ) |
| Max Drawdown | -4.24% | -6.03% |
| Win Rate | 37.5% | 31.3% |
Sharpe improves 3.6Γ from 5-day to 20-day holding, consistent with PEAD theory (Bernard & Thomas 1989). Signal exists (IC=0.0198) but is insufficient to overcome 10bps transaction costs at a 5-day horizon. Extending to 20-day reduces the cost-to-signal ratio significantly.
| Component | Technology |
|---|---|
| Language | Python 3.10 |
| NLP Model | FinBERT (ProsusAI/finbert) |
| Embeddings | all-MiniLM-L6-v2 |
| Vector DB | ChromaDB |
| ML Models | XGBoost, LightGBM, PyTorch LSTM |
| Interpretability | SHAP |
| Dashboard (v2) | Next.js 14, TypeScript, Tailwind, Recharts, Framer Motion |
| Dashboard (v1) | Streamlit + Plotly |
| Deployment | Vercel (Next.js) + Hugging Face Spaces (Streamlit) |
| GPU | NVIDIA RTX 4060 Laptop (CUDA 11.8) |
finsight/
βββ config.py # Central configuration (paths, constants)
βββ run_ingestion.py # Stage 1 runner
βββ run_nlp.py # Stage 2 runner
βββ export_data.py # Export JSON for Next.js dashboard
β
βββ src/
β βββ ingestion/
β β βββ download_transcripts.py # HuggingFace dataset download
β β βββ price_data.py # yfinance price data
β β βββ validate_data.py # Data quality checks
β β
β βββ nlp/
β β βββ finbert_sentiment.py # FinBERT pipeline (GPU, checkpointing)
β β βββ rag_pipeline.py # RAG feature extraction (GPU-accelerated)
β β βββ build_feature_matrix.py # Merge features + price returns
β β
β βββ models/
β β βββ train_models.py # XGBoost + LightGBM walk-forward
β β βββ lstm_model.py # LSTM sequence model
β β
β βββ backtest/
β β βββ backtest_engine.py # 5-day backtest
β β βββ backtest_20d.py # 20-day backtest + comparison
β β
β βββ analysis/
β β βββ sector_analysis.py # GICS sector-level IC analysis
β β
β βββ dashboard/
β βββ app.py # Streamlit dashboard (v1)
β
βββ experiments/ # Model results, SHAP, plots
βββ report/
β βββ FinSight_Technical_Report.docx
βββ requirements.txt
git clone https://github.com/Rajveer-code/Finsight.git
cd Finsight
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txtpython run_ingestion.py
# Output: 14,584 transcripts, 1M+ price rowspython run_nlp.py
# Output: 34 features Γ 13,442 rows
# Checkpoints every 100/500 records β safe to interruptpython src/models/train_models.py
python src/models/lstm_model.pypython src/backtest/backtest_engine.py
python src/backtest/backtest_20d.pypython src/analysis/sector_analysis.py# Streamlit (v1)
streamlit run src/dashboard/app.py
# Export data for Next.js dashboard
python export_data.pyWhy walk-forward validation? Standard k-fold cross-validation leaks future information in time series. Walk-forward trains on years Tβ3 to Tβ1 and tests on year T only. No future data is ever seen during training.
Why FinBERT + RAG together? FinBERT captures emotional tone at the sentence level. RAG captures topical specificity β whether management actually discussed numerical guidance, new risks, or cost pressures. RAG features contribute 34.6% of total SHAP importance despite comprising fewer features.
Why LSTM alongside tree models? Tree models treat each earnings call as independent. The LSTM learns that a company with 6 consecutive quarters of deteriorating sentiment is different from one with a single bad quarter. Its 2022 IC of +0.047 β the strongest single fold across all models β validates this temporal signal.
Why both 5-day and 20-day backtests? Post-earnings announcement drift (PEAD) is documented at 20-60 day horizons. The 3.6Γ Sharpe improvement from 5-day to 20-day validates that the signal takes time to be fully priced, consistent with Bernard & Thomas (1989).
- Long-only 20-day backtest (eliminates short-selling costs)
- Replace RAG keyword scoring with Llama-3 / Mistral generative scorer
- Sector-stratified model training (separate models per sector)
- Cross-lingual extension using multilingual FinBERT
- Real-time pipeline streaming live earnings calls
- Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063
- Bernard, V. & Thomas, J. (1989). Post-Earnings-Announcement Drift. Journal of Accounting Research
- Lewis et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS
- Lundberg & Lee (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS
- Loughran & McDonald (2011). When is a Liability not a Liability? Journal of Finance
- Chan, Jegadeesh & Lakonishok (1996). Momentum Strategies. Journal of Finance, 51(5)
Rajveer Singh Pall
Portfolio project for MSc Data Science application (ETH Zurich 2026).
Interactive Dashboard: finsight-web-rust.vercel.app
Streamlit Demo: huggingface.co/spaces/Rajveer234/finsight