🎓 Student Score Prediction System - End-to-End ML Project

📋 Project Overview

A comprehensive end-to-end machine learning project that predicts student writing scores based on multiple demographic and academic factors. This project demonstrates a complete ML pipeline from data ingestion to model deployment with a modern web interface.

✨ Key Features

🤖 Multiple ML algorithms (Linear Regression, Random Forest, XGBoost, CatBoost, etc.)
📊 Advanced data preprocessing and feature engineering
🎯 Best model accuracy: 88% (Linear Regression)
🌐 Modern Flask web interface with real-time predictions
📈 Interactive data visualizations and analysis
⚡ Sub-100ms inference time
🔄 Automated hyperparameter tuning with GridSearchCV

🎨 Project Screenshots & Dashboard

Web Interface Overview

Dashboard with Project Overview and Dataset Information

Interactive Visualizations and Data Analysis

Model Performance and Metrics Display

🏗️ Project Architecture

Student Score Prediction System
│
├── Data Layer
│   ├── Raw Data (Students.csv)
│   ├── Processed Data (train.csv, test.csv)
│   └── Artifacts Storage
│
├── ML Pipeline
│   ├── Data Ingestion
│   ├── Data Transformation & Preprocessing
│   ├── Model Training
│   ├── Model Evaluation
│   └── Best Model Selection
│
└── Web Interface
    ├── Flask Backend
    ├── HTML/CSS/JavaScript Frontend
    └── Real-time Prediction API

🛠️ Technology Stack

Backend Technologies

┌─────────────────────────────────────────────┐
│         Machine Learning Stack              │
├─────────────────────────────────────────────┤
│  Python 3.8+          - Core Language       │
│  Scikit-learn         - ML Algorithms       │
│  XGBoost              - Gradient Boosting   │
│  CatBoost             - Categorical Boost   │
│  Pandas               - Data Processing     │
│  NumPy                - Numerical Computing │
│  Pickle/Joblib        - Model Serialization │
└─────────────────────────────────────────────┘

Web Framework

┌─────────────────────────────────────────────┐
│         Web Development Stack               │
├─────────────────────────────────────────────┤
│  Flask                - Web Framework       │
│  Jinja2               - Template Engine     │
│  HTML5/CSS3           - Frontend Design     │
│  JavaScript (ES6)     - Client Interactivity│
│  Bootstrap            - Responsive UI       │
└─────────────────────────────────────────────┘

Data & Visualization

┌─────────────────────────────────────────────┐
│      Data & Visualization Libraries         │
├─────────────────────────────────────────────┤
│  Matplotlib           - Static Plots        │
│  Seaborn              - Statistical Charts  │
│  Plotly               - Interactive Graphs  │
│  Pandas Profiling     - Data Reports        │
└─────────────────────────────────────────────┘

📊 Project Structure

MlProject/
├── app.py                      # Flask application entry point
├── README.md                   # Project documentation
├── requirements.txt            # Python dependencies
├── setup.py                    # Package setup configuration
│
├── artifacts/                  # Generated model files
│   ├── train.csv              # Training dataset
│   ├── test.csv               # Testing dataset
│   ├── data.csv               # Raw dataset
│   └── model.pkl              # Trained model
│
├── src/                        # Source code
│   ├── __init__.py
│   ├── exception.py            # Custom exceptions
│   ├── logger.py               # Logging configuration
│   ├── utils.py                # Utility functions
│   │
│   ├── components/
│   │   ├── data_ingestion.py   # Load & split data
│   │   ├── data_transformation.py # Preprocessing
│   │   └── model_trainer.py    # Model training
│   │
│   └── pipeline/
│       ├── train_pipeline.py   # Training workflow
│       └── predict_pipeline.py # Prediction workflow
│
├── templates/                  # HTML templates
│   ├── index.html             # Dashboard
│   └── home.html              # Home page
│
├── notebook/                   # Jupyter notebooks
│   ├── Model Training.ipynb    # Model development
│   └── problemstatement.ipynb  # Problem analysis
│
└── logs/                       # Application logs

🔄 ML Pipeline Flow

START
  │
  ├─→ [Data Ingestion]
  │    └─→ Load Students.csv (1000+ records)
  │
  ├─→ [Train/Test Split] (80/20)
  │
  ├─→ [Data Preprocessing]
  │    ├─→ Handle Missing Values
  │    ├─→ Categorical Encoding (One-Hot/Label)
  │    ├─→ Feature Scaling (StandardScaler)
  │    └─→ Outlier Detection
  │
  ├─→ [Feature Engineering]
  │    └─→ Advanced Feature Creation
  │
  ├─→ [Model Training]
  │    ├─→ Random Forest Regressor
  │    ├─→ Gradient Boosting Regressor
  │    ├─→ XGBRegressor
  │    ├─→ CatBoost Regressor
  │    ├─→ Decision Tree Regressor
  │    ├─→ KNN Regressor
  │    ├─→ AdaBoost Regressor
  │    └─→ Linear Regression ⭐ BEST
  │
  ├─→ [Hyperparameter Tuning]
  │    └─→ GridSearchCV (5-Fold CV)
  │
  ├─→ [Model Evaluation]
  │    ├─→ R² Score: 0.88
  │    ├─→ MAE: ±2.8 points
  │    ├─→ RMSE: 3.2 points
  │    └─→ Cross-Validation Scores
  │
  ├─→ [Best Model Selection]
  │    └─→ Linear Regression (88% Accuracy)
  │
  └─→ [Deployment]
       └─→ Flask Web Interface
END

📈 Model Performance Metrics

🏆 Best Performing Model: Linear Regression

Metric	Value
Accuracy	88%
R² Score	0.88
Mean Absolute Error (MAE)	±2.8 points
Root Mean Squared Error (RMSE)	3.2 points
Training Time	< 1 second
Inference Time	< 10ms per prediction
Cross-Validation Score	0.87 (5-fold)

📊 Dataset Information

Features (8 total)

Feature	Type	Description	Range
Gender	Categorical	Male/Female	2 categories
Race/Ethnicity	Categorical	Groups A-E	5 categories
Parental Education	Categorical	Education levels	6 levels
Lunch Type	Categorical	Standard/Free-Reduced	2 categories
Test Preparation	Categorical	None/Completed	2 states
Math Score	Numerical	Math test score	0-100
Reading Score	Numerical	Reading test score	0-100
Writing Score	Numerical	Target Variable	0-100

Dataset Statistics

Total Records: 1,000+
Training Set: 80% (800+ records)
Testing Set: 20% (200+ records)
Data Completeness: 100% (no missing values)
Source: Kaggle - Students Performance in Exams

🚀 Installation & Setup

Prerequisites

Python 3.8 or higher
pip (Python package manager)
Virtual Environment (recommended)

Step 1: Clone the Repository

git clone https://github.com/yourusername/MlProject.git
cd MlProject

Step 2: Create Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Run the Application

python app.py

The application will be available at: http://localhost:5000

💻 Usage

Making Predictions via Web Interface

Open http://localhost:5000 in your browser
Fill in the student information form:
- Select gender and race/ethnicity
- Choose parental education level
- Select lunch type and test preparation status
- Enter math and reading scores
Click "Predict Result" to get the writing score prediction

API Usage

from src.pipeline.predict_pipeline import PredictionPipeline

# Create predictor
predictor = PredictionPipeline()

# Make prediction
input_data = {
    'gender': 'male',
    'race_ethnicity': 'group A',
    'parental_education': 'bachelor\'s degree',
    'lunch': 'standard',
    'test_preparation_course': 'completed',
    'math_score': 85,
    'reading_score': 90
}

prediction = predictor.predict(input_data)
print(f"Predicted Writing Score: {prediction}")

🔧 Training the Model

Retraining with New Data

python -c "from src.pipeline.train_pipeline import TrainPipeline; pipeline = TrainPipeline(); pipeline.main()"

Evaluating Model Performance

python -c "from src.pipeline.train_pipeline import TrainPipeline; pipeline = TrainPipeline(); pipeline.evaluate_model()"

📚 Project Components

1. Data Ingestion (`src/components/data_ingestion.py`)

Loads raw student data
Splits into training and testing sets
Handles data validation

2. Data Transformation (`src/components/data_transformation.py`)

Categorical encoding
Feature scaling
Missing value handling
Outlier detection

3. Model Training (`src/components/model_trainer.py`)

Trains multiple ML algorithms
Performs hyperparameter tuning
Selects best performing model
Saves model artifacts

4. Prediction Pipeline (`src/pipeline/predict_pipeline.py`)

Loads trained model
Processes input data
Generates predictions

📊 Visualizations Available

The dashboard includes 9 interactive visualizations:

Visualization Gallery

1. Student Demographics Distribution

2. Score Distributions and Correlations

3. Model Performance Metrics

4. Feature Importance Analysis

5. Prediction Accuracy Charts

6. Data Quality Reports

7. Score Distribution Analysis

8. Feature Correlations Heatmap

9. Model Comparison Dashboard

🔐 Error Handling & Logging

The project includes comprehensive:

✅ Custom exception handling
✅ Detailed logging system
✅ Data validation
✅ Model validation
✅ Error recovery mechanisms

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Your Name - End-to-End ML Project Developer

🙋 Support & Contact

For questions or issues, please:

Open an issue on GitHub
Contact: your.email@example.com
Check existing documentation

📞 Additional Resources

Made with ❤️ for the ML Community

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
artifacts		artifacts
catboost_info		catboost_info
images		images
notebook		notebook
src		src
static/images		static/images
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.Docker.md		README.Docker.md
README.md		README.md
app.py		app.py
compose.yaml		compose.yaml
render.yaml		render.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

🎓 Student Score Prediction System - End-to-End ML Project

📋 Project Overview

✨ Key Features

🎨 Project Screenshots & Dashboard

Web Interface Overview

🏗️ Project Architecture

🛠️ Technology Stack

Backend Technologies

Web Framework

Data & Visualization

📊 Project Structure

🔄 ML Pipeline Flow

📈 Model Performance Metrics

🏆 Best Performing Model: Linear Regression

📊 Dataset Information

Features (8 total)

Dataset Statistics

🚀 Installation & Setup

Prerequisites

Step 1: Clone the Repository

Step 2: Create Virtual Environment

Step 3: Install Dependencies

Step 4: Run the Application

💻 Usage

Making Predictions via Web Interface

API Usage

🔧 Training the Model

Retraining with New Data

Evaluating Model Performance

📚 Project Components

1. Data Ingestion (src/components/data_ingestion.py)

2. Data Transformation (src/components/data_transformation.py)

3. Model Training (src/components/model_trainer.py)

4. Prediction Pipeline (src/pipeline/predict_pipeline.py)

📊 Visualizations Available

Visualization Gallery

🔐 Error Handling & Logging

🤝 Contributing

📝 License

👨‍💻 Author

🙋 Support & Contact

📞 Additional Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Ingestion (`src/components/data_ingestion.py`)

2. Data Transformation (`src/components/data_transformation.py`)

3. Model Training (`src/components/model_trainer.py`)

4. Prediction Pipeline (`src/pipeline/predict_pipeline.py`)

Packages