|
Best Performance |
Polynomial Features |
Gradient Descent |
Code Coverage |
| 🎓 Pure Implementation | 🧮 Multiple Algorithms | 📈 Advanced Features | 📝 Detailed Logs |
|---|---|---|---|
| Built from scratch using only NumPy | Batch, SGD & Mini-Batch GD | Polynomial features & L1 reg | Complete failure-to-success journey |
graph LR
A[📊 Load Data] --> B[🔧 Feature Engineering]
B --> C[📏 Normalization]
C --> D[🎯 Train Model]
D --> E{Choose Method}
E -->|Batch GD| F[📊 R²: 95.84%]
E -->|Stochastic GD| G[📊 R²: 98.50%]
E -->|Mini-Batch GD| H[🏆 R²: 98.74%]
F --> I[📈 Evaluate]
G --> I
H --> I
I --> J[✨ Predictions]
style A fill:#e1f5ff
style H fill:#90EE90
style J fill:#FFD700
- ✨ Features
- 🚀 Quick Start
- 📦 Installation
- 💡 Usage Examples
- 📁 Project Structure
- 🧪 The Journey
- 📊 Performance Metrics
- 🔬 Mathematical Foundation
- 📈 Visualizations
- 🧰 Tech Stack
- 🤝 Contributing
- 📝 License
|
|
# 1️⃣ Clone the repository
git clone https://github.com/willow788/Linear-Regression-model-from-scratch.git
cd Linear-Regression-model-from-scratch
# 2️⃣ Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3️⃣ Install dependencies
pip install -r requirements.txt
# 4️⃣ Run the model
python main.py
# 🎉 That's it! Your model is training! 🐳 Docker Quick Start (Click to expand)
# Build the image
docker build -t linear-regression .
# Run the container
docker run -it -p 8888:8888 linear-regression
# Or use docker-compose
docker-compose upfrom linear_regression import LinearRegression
from data_preprocessing import load_and_preprocess_data
# Load your data
X_train, X_test, y_train, y_test = load_and_preprocess_data('Advertising.csv')
# Create and train model
model = LinearRegression(
learn_rate=0.02,
iter=50000,
method='batch',
l1_reg=0.1
)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
print(f"✨ Model R² Score: {model.evaluate(y_test, predictions):.4f}")methods = {
'📊 Batch GD': {'method': 'batch', 'iter': 50000},
'⚡ Stochastic GD': {'method': 'stochastic', 'iter': 50},
'🔄 Mini-Batch GD': {'method': 'mini-batch', 'iter': 1000, 'batch_size': 16}
}
for name, params in methods.items():
model = LinearRegression(learn_rate=0.01, **params)
model.fit(X_train, y_train)
score = calculate_r2(y_test, model.predict(X_test))
print(f"{name}: R² = {score:.4f}")from model_evaluation import cross_validation_score
# Perform 5-fold cross-validation
cv_score = cross_validation_score(X, y, k=5)
print(f"🎯 Cross-Validated R² Score: {cv_score:.4f}")from visualization import (
plot_loss_convergence,
plot_residuals,
plot_actual_vs_predicted
)
# Plot loss over iterations
plot_loss_convergence(model. loss_history)
# Analyze residuals
plot_residuals(y_test, predictions)
# Compare actual vs predicted
plot_actual_vs_predicted(y_test, predictions)📦 Linear-Regression-model-from-scratch/
│
├── 📂 Version- 1/ # 🔴 Initial experiments
│ ├── 📓 experiment_log.txt # The negative R² saga
│ └── 📊 Raw jupyter Notebook/
│
├── 📂 Version- 2/ # 🟡 Feature engineering
│ ├── 📓 experiment_log.txt
│ └── 📊 Raw jupyter Notebook/
│
├── 📂 Version- 3/ # 🟠 Normalization fixes
│ ├── 📓 experiment_log.txt
│ └── 📊 Raw jupyter Notebook/
│
├── 📂 Version- 9/ # 🟢 Production ready!
│ ├── 📊 Raw jupyter Notebook/
│ │ └── 📓 sales. ipynb # Complete analysis
│ └── 🐍 Python Files/
│ ├── 📄 data_preprocessing.py # Data pipeline
│ ├── 📄 linear_regression.py # Core model
│ ├── 📄 model_evaluation.py # Metrics & CV
│ ├── 📄 visualization. py # Plotting utils
│ ├── 📄 main.py # Main script
│ └── 📄 config.py # Configuration
│
├── 🧪 tests/ # Test suite
│ ├── 📄 test_linear_regression.py
│ ├── 📄 test_data_preprocessing.py
│ ├── 📄 test_model_evaluation.py
│ ├── 📄 test_visualization.py
│ ├── 📄 test_integration.py
│ └── 📄 conftest.py
│
├── 📊 outputs/ # Generated visualizations
│ ├── 🖼️ loss_convergence.png
│ ├── 🖼️ residual_plot.png
│ ├── 🖼️ correlation_matrix.png
│ ├── 🖼️ actual_vs_predicted.png
│ └── 🖼️ feature_importance.png
│
├── 📊 Advertising.csv # Dataset
├── 📋 requirements.txt # Dependencies
├── 📋 requirements-dev.txt # Dev dependencies
├── 🐳 Dockerfile # Container config
├── 🐳 docker-compose.yml # Orchestration
├── ⚙️ Makefile # Utility commands
├── 📖 README.md # You are here!
├── 📖 INSTALL.md # Installation guide
└── 📜 LICENSE # MIT License
| Version | R² Score | Key Learnings |
|---|---|---|
|
🔴 Version 1 The Crisis |
-18. 77 😱 |
Problems Discovered:
Breakthrough: "Failure teaches more than success ever could" |
|
🟡 Version 2 Engineering |
~0.60 📈 |
Improvements Made:
|
|
🟠 Version 3 Refinement |
~0.85 📊 |
Progress:
|
|
🟢 Version 9 Production |
0.9874 🏆 |
Final Optimizations:
|
R² Score Evolution
│
1.0 ┤ ████ 🏆
0.9 ┤ ████████
0.8 ┤ █████████
0.7 ┤ █████████
0.6 ┤ ████████
0.5 ┤ ████████
0.0 ┼──────────────────────────────────────────────────────────►
-1. 0┤███ Iterations
-10.0 ┤███ 😱
-18.0 ┤███
V1 V2 V3 V4-V8 V9
| Method | Test R² | Train R² | RMSE | MAE | Training Time |
|---|---|---|---|---|---|
| 📊 Batch GD | 0.9584 |
0.9509 |
0.2249 |
0.1533 |
~45s |
| ⚡ Stochastic GD | 0.9850 |
0.9848 |
0.1352 |
0.1118 |
~5s |
| 🔄 Mini-Batch GD | 0.9874 🏆 |
0.9860 |
0.1238 |
0.1011 |
~12s |
| Fold | R² Score | Status | |: ----:|:--------:|:------:| | 1 | 0.9870 | ✅ | | 2 | 0.9860 | ✅ | | 3 | 0.9925 | ✅ 🏆 | | 4 | 0.9867 | ✅ | | 5 | 0.9690 | ✅ | | Mean | 0.9842 | ✨ |
📐 Linear Regression EquationWhere:
|
🎯 Loss Function (with L1 Regularization)Where:
|
📊 Gradient Descent Update Rules (Click to expand)
Weight Update:
Bias Update:
Parameters:
-
$\alpha$ = learning rate -
$\lambda$ = L1 regularization parameter -
$\text{sign}(\mathbf{w})$ = sign function for L1 penalty
🔢 Polynomial Feature Expansion (Click to expand)
Original Features:
Expanded to 9 features:
| Feature # | Expression | Description |
|---|---|---|
| 1 | Original TV budget | |
| 2 | Original Radio budget | |
| 3 | Original Newspaper budget | |
| 4 | Quadratic TV effect | |
| 5 | Quadratic Radio effect | |
| 6 | Quadratic Newspaper effect | |
| 7 | Interaction effect | |
| 8 | Interaction effect | |
| 9 | Interaction effect |
|
Smooth convergence to global minimum |
Random scatter indicates good fit |
|
Points close to diagonal line |
Feature relationships visualized |
| Attribute | Details |
|---|---|
| 📁 Source | Kaggle / UCI ML Repository |
| 📊 Samples | 200 observations |
| 🔢 Features | TV, Radio, Newspaper (advertising budgets in $1000s) |
| 🎯 Target | Sales (in $1000s of units) |
| ✅ Quality | No missing values |
| 📈 Correlation | TV (0.78), Radio (0.58), Newspaper (0.23) with Sales |
📊 Sample Data Preview (Click to expand)
TV Radio Newspaper Sales
0 230. 1 37.8 69.2 22.1
1 44.5 39.3 45.1 10.4
2 17.2 45.9 69.3 9.3
3 151.5 41.3 58.5 18.5
4 180.8 10.8 58.4 12.9
|
|
-
🔄 L2 Regularization (Ridge)
- Compare with L1
- Implement Elastic Net (L1 + L2)
-
🎯 Adaptive Learning Rates
- Adam optimizer
- RMSprop
- Learning rate scheduling
-
🔍 Automated Hyperparameter Tuning
- Grid Search
- Random Search
- Bayesian Optimization
-
📊 Extended Dataset Support
- Boston Housing
- California Housing
- Custom datasets
-
🌐 Web Interface
- Interactive predictions
- Real-time visualization
- Model playground
-
📱 API Development
- REST API with FastAPI
- Model serving
- Deployment pipeline
-
📚 Educational Content
- Step-by-step tutorials
- Video explanations
- Blog posts
# 📦 Installation
make install # Install production dependencies
make install-dev # Install dev dependencies
# 🧪 Testing
make test # Run all tests
make test-cov # Run tests with coverage report
# 🎨 Code Quality
make lint # Run linters
make format # Format code with black
# 🚀 Running
make run # Run main script
make jupyter # Start Jupyter notebook
# 🐳 Docker
make docker-build # Build Docker image
make docker-run # Run Docker container
# 🧹 Cleanup
make clean # Remove generated files|
Found a bug?
|
Have an idea?
|
Want to contribute?
|
# 1. Fork the repository
# 2. Clone your fork
git clone https://github.com/YOUR_USERNAME/Linear-Regression-model-from-scratch.git
# 3. Create a feature branch
git checkout -b feature/AmazingFeature
# 4. Make your changes and commit
git commit -m '✨ Add some AmazingFeature'
# 5. Push to your branch
git push origin feature/AmazingFeature
# 6. Open a Pull RequestPlease ensure:
- ✅ Code passes all tests (
pytest) - ✅ Code is formatted (
make format) - ✅ Documentation is updated
- ✅ Commit messages are descriptive
|
📊 Dataset
|
🎓 Inspiration
|
🛠️ Tools
|
📚 Community
|
███████╗████████╗ █████╗ ██████╗ ████████╗██╗ ██╗██╗███████╗
██╔════╝╚══██╔══╝██╔══██╗██╔══██╗ ╚══██╔══╝██║ ██║██║██╔════╝
███████╗ ██║ ███████║██████╔╝ ██║ ███████║██║███████╗
╚════██║ ██║ ██╔══██║██╔══██╗ ██║ ██╔══██║██║╚════██║
███████║ ██║ ██║ ██║██║ ██║ ██║ ██║ ██║██║███████║
╚══════╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝╚══════╝
💙 Built with passion and ☕ by willow788
Learning by doing, one gradient descent at a time 🚀




