Skip to content

Latest commit

 

History

History
24 lines (20 loc) · 1.88 KB

File metadata and controls

24 lines (20 loc) · 1.88 KB

Model and Strategy Comparison

Datasets and Labeling

  • Universe: AAPL, MSFT, GOOG, TSLA, AMZN from market_data_ml.csv filtered via tickers.csv.
  • Features: 1/3/5-day returns, SMA (5/10), RSI(14), MACD, configured in features_config.json.
  • Label: direction = 1 if next-day return > 0 else 0 (future return also stored for backtests).

Model Performance (holdout test)

  • RandomForestClassifier – accuracy 0.4462, precision 0.5909, recall 0.0909 (CV accuracy 0.49). Feature importance leaders: return_3d (0.1609), rsi_14 (0.1524), macd (0.1498).
  • LogisticRegression – accuracy 0.4303, precision 0.5000, recall 0.0839 (CV accuracy 0.49).
  • Artifacts: outputs/metrics.json, confusion matrices (outputs/confusion_*.png), and importances (outputs/feature_importance_RandomForestClassifier.png).

Signals and Backtest

  • Signals: threshold 0.55 on RF probabilities (outputs/signals_RandomForestClassifier.csv).
  • Strategy (equal-weight across tickers) vs. buy & hold:
    • Strategy total return: -0.34%, CAGR -0.34%, Sharpe -0.17, max drawdown -1.24%.
    • Buy & hold total return: +36.18%, CAGR +36.35%, Sharpe 2.26, max drawdown -8.89%.
  • Equity curves saved to outputs/equity_curve_signals_RandomForestClassifier.{csv,png}; backtest metrics in outputs/backtest_metrics_signals_RandomForestClassifier.json.

Takeaways and Limitations

  • Predictive power is weak; high class imbalance and very low recall suggest the models often stay flat or miss upside moves.
  • Momentum-style signals (return lags) and mean-reversion gauges (RSI, MACD) rank highest, but trees still produce noisy probabilities.
  • No hyperparameter search, no transaction costs, and a simple equal-weight daily rebalance likely understate realistic frictions.
  • Potential improvements: tune thresholds per ticker, expand features (volatility/volume trends), calibrate probabilities, and add walk-forward validation.