Skip to content

KumarRaju1313/wine-quality-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

🍷 Wine Quality Analysis and Prediction

This project performs exploratory data analysis (EDA) and builds classification and regression models to analyze wine characteristics and predict wine quality using various machine learning algorithms.


📚 Table of Contents


📂 Dataset

The dataset contains physicochemical properties and sensory quality ratings of red and white Portuguese "Vinho Verde" wines.

Each record includes attributes like:

  • Fixed acidity, volatile acidity, citric acid
  • Residual sugar, chlorides, free sulfur dioxide
  • Density, pH, alcohol content
  • Quality score (target)

📊 Exploratory Data Analysis (EDA)

EDA includes:

  • Distribution plots of numerical features
  • Correlation heatmaps
  • Outlier detection
  • Wine type comparison (red vs white)
  • Quality class distribution

🧹 Data Preprocessing

  • Imputation: Fill missing values using median
  • Outlier Handling: Clip extreme values per wine type
  • Feature Scaling: Normalize for distance-based models
  • One-hot encode wine types (if needed)

🧠 Model Training and Evaluation

🔸 Classification Models

  • Logistic Regression
  • Support Vector Machine (SVM)
  • Decision Tree
  • Random Forest
  • K-Nearest Neighbors (KNN)
  • Gaussian Naive Bayes

Evaluation Metrics: Accuracy, Precision, Recall, F1-score

🔸 Regression Models

  • Linear Regression
  • Huber Regressor
  • RANSAC Regressor
  • Theil-Sen Regressor
  • Decision Tree Regressor
  • Random Forest Regressor
  • Support Vector Regressor (SVR)
  • KNN Regressor

Evaluation Metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE)


📈 Results

✅ Classification Results

Model Accuracy Precision Recall F1-score
Logistic Regression 97.69% 97.84% 99.06% 98.45%
SVM 92.62% 92.26% 98.23% 95.15%
Decision Tree 98.38% 98.96% 98.85% 98.90%
✅ Random Forest 99.62% 99.58% 99.90% 99.74%
KNN 95.62% 96.69% 97.39% 97.04%
Gaussian Naive Bayes 97.15% 98.94% 97.18% 98.05%

✅ Regression Results

Model MSE RMSE
Linear Regression 0.5300 0.7280
Huber Regressor 0.5373 0.7330
RANSAC Regressor 0.7293 0.8540
Theil-Sen Regressor 0.5428 0.7368
Decision Tree Regressor 0.7069 0.8408
✅ Random Forest Regressor 0.3704 0.6086
SVR 0.6099 0.7809
KNN Regressor 0.6318 0.7948

🏁 Conclusion

  • Random Forest Classifier achieved the highest classification performance.
  • Random Forest Regressor outperformed all others in predicting quality ratings.
  • Preprocessing and EDA significantly improved performance and interpretability.

▶️ How to Run

  1. Clone the repository
  2. Install dependencies
  3. Run the notebook: Wine_prediction.ipynb

📦 Dependencies

Install required packages:

pip install pandas numpy matplotlib seaborn scikit-learn

About

Classify wine type and predict quality using ML models like Random Forest and SVM on Vinho Verde dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors