This project investigates the efficacy of Sentiment Analysis and Machine Learning in predicting the short-term (24-hour) price direction of major cryptocurrencies. Challenging the Efficient Market Hypothesis (EMH), this study explores whether social media sentiment, news scores, and technical indicators contain predictive signals that can be exploited by non-linear algorithms.
This work was completed as part of the CMP5367: Artificial Intelligence and Machine Learning module.
The project utilizes the Crypto Market Sentiment and Price Dataset (2025) sourced from Kaggle.
- Source: Kaggle Link
- Observations: 2,063 data points.
- Features: Social Sentiment Score, News Impact Score, Fear & Greed Index, RSI, Volatility, Market Cap, and Volume.
- Target: Binary Classification (1 = Bullish/Up, 0 = Bearish/Down).
- Target Engineering: Converted continuous
% Price Changeinto a binary target. - Cleaning: Verified data integrity and handled potential leakage features.
- Transformation:
- One-Hot Encoding for categorical assets (e.g., Bitcoin, Ethereum).
- Standard Scaling for numerical features to normalize disparate ranges (e.g., Volume vs. Sentiment Score).
Four distinct algorithm families were tested to compare linear vs. non-linear performance:
- Logistic Regression (Linear Baseline)
- Random Forest Classifier (Bagging Ensemble)
- XGBoost (Gradient Boosting) (Boosting Ensemble)
- Support Vector Machine (SVM) (Kernel-based)
To ensure rigorous evaluation, GridSearchCV (3-Fold Cross-Validation) was applied to all candidate models to optimize hyperparameters (e.g., Tree Depth, Learning Rate, C-value).
The experiment revealed that market sentiment has a very low linear correlation with price. However, non-linear ensemble methods successfully extracted a marginal predictive edge over random guessing.
| Model | Baseline Accuracy | Tuned Accuracy | Improvement |
|---|---|---|---|
| Random Forest | 49.1% | 52.1% 🏆 | +3.0% |
| XGBoost | 46.5% | 51.8% | +5.3% |
| SVM | 45.8% | 50.6% | +4.8% |
| Logistic Regression | 49.6% | 50.1% | +0.5% |
Key Finding: The Random Forest model with max_depth=10 achieved the highest stability and accuracy, suggesting that regularization is key when modeling high-noise financial data.
- Python 3.8+
- Pip
-
Clone the repository
git clone [https://github.com/Amogh-007-Rin/AI-ML-Model-For-CryptoAnalysis](https://github.com/Amogh-007-Rin/AI-ML-Model-For-CryptoAnalysis) cd AI-ML-Model-For-CryptoAnalysis -
Install dependencies
pip install -r requirements.txt
-
Run the analysis You can run the full analysis pipeline using the provided Python script or Jupyter Notebook:
python main.py
Or launch Jupyter:
jupyter notebook notebooks/Crypto_Trend_Analysis.ipynb
└── Academic_Reports/ # Final Academic Report (D1.b).
├── dataset/ # Dataset files (csv).
├── Final_Notebook/ # Jupyter Notebooks containing complete project and ML model code [ Main execution script ] with a markdown file.
├── Ideas/ # Model training ideas and project discussions proposed by group members.
├── images/ # Plots and correlation heatmaps.
├── Project-Brief/ # Contains the project briefing and sample reports.
├── src/ # Source code for different sections of the project - [Dataset Overview, EDA, Dataset Pre-Processing, Model-Training And Evaluation].
├── README.md # Project documentation.
├── requirements.txt # Python dependencies.