Skip to content

joaogcfa/Machine-Learnig-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Project: Diabetes Prediction

This repository contains a Jupyter Notebook implementation of a supervised machine learning pipeline that predicts whether a woman has diabetes. The project includes data loading, cleaning, exploratory data analysis (EDA), model training, evaluation, and model export steps.

Contents

  • notebook.ipynb - Main Jupyter Notebook with the full experiment (data processing, modeling, evaluation).
  • data/ (optional) - Place dataset files here if you keep them in the repo.

Project Overview

The goal of this project is to build a classification model to predict diabetes (positive/negative) for female patients using clinical features. Typical datasets used for this task include the Pima Indians Diabetes dataset (if not included, you can download it from public sources).

Key steps implemented in the notebook:

  1. Data loading and basic validation
  2. Exploratory data analysis (visualizations & summary statistics)
  3. Data preprocessing (imputation, scaling, encoding if needed)
  4. Feature selection / engineering
  5. Model training (e.g., Logistic Regression, Random Forest, XGBoost)
  6. Evaluation using metrics such as accuracy, precision, recall, F1-score, and ROC AUC
  7. Model serialization/export (joblib/pickle)

How to run

Prerequisites: Python 3.8+ and the packages listed in the notebook (commonly: pandas, numpy, scikit-learn, matplotlib, seaborn, xgboost, joblib).

  1. Clone the repository:

    git clone https://github.com/joaogcfa/Machine-Learnig-Project.git cd Machine-Learnig-Project

  2. (Optional) Create and activate a virtual environment:

    python -m venv venv source venv/bin/activate # Linux / macOS venv\Scripts\activate # Windows

  3. Install dependencies (example):

    pip install -r requirements.txt

    If a requirements.txt is not present, install commonly used packages:

    pip install pandas numpy scikit-learn matplotlib seaborn xgboost joblib notebook

  4. Open the notebook:

    jupyter notebook notebook.ipynb

Dataset

If the dataset is not included in this repository, you can download the Pima Indians Diabetes dataset from UCI Machine Learning Repository or Kaggle. Ensure the dataset is placed in a data/ folder or update the notebook paths accordingly.

Typical feature columns include:

  • Pregnancies
  • Glucose
  • BloodPressure
  • SkinThickness
  • Insulin
  • BMI
  • DiabetesPedigreeFunction
  • Age
  • Outcome (target: 0 = no diabetes, 1 = diabetes)

Results and Metrics

See the notebook for model comparisons and evaluation metrics. Commonly reported metrics:

  • Accuracy
  • Precision / Recall / F1-score
  • ROC AUC
  • Confusion matrix visualization

Reproducibility

  • Set a random seed in the notebook to make experiments reproducible.
  • Save trained model artifacts (e.g., model.joblib) and preprocessing pipelines.

Contact

Repository owner: joaogcfa (https://github.com/joaogcfa)

About

Machine Learning Project that developed a model to predict if a woman had diabetes or not

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •