Machine Learning Project: Diabetes Prediction

This repository contains a Jupyter Notebook implementation of a supervised machine learning pipeline that predicts whether a woman has diabetes. The project includes data loading, cleaning, exploratory data analysis (EDA), model training, evaluation, and model export steps.

Project Overview

The goal of this project is to build a classification model to predict diabetes (positive/negative) for female patients using clinical features. Typical datasets used for this task include the Pima Indians Diabetes dataset (if not included, you can download it from public sources).

Key steps implemented in the notebook:

Data loading and basic validation
Exploratory data analysis (visualizations & summary statistics)
Data preprocessing (imputation, scaling, encoding if needed)
Feature selection / engineering
Model training (e.g., Logistic Regression, Random Forest, XGBoost)
Evaluation using metrics such as accuracy, precision, recall, F1-score, and ROC AUC
Model serialization/export (joblib/pickle)

How to run

Prerequisites: Python 3.8+ and the packages listed in the notebook (commonly: pandas, numpy, scikit-learn, matplotlib, seaborn, xgboost, joblib).

Clone the repository:

git clone https://github.com/joaogcfa/Machine-Learnig-Project.git cd Machine-Learnig-Project
(Optional) Create and activate a virtual environment:

python -m venv venv source venv/bin/activate # Linux / macOS venv\Scripts\activate # Windows
Install dependencies (example):

pip install -r requirements.txt

If a requirements.txt is not present, install commonly used packages:

pip install pandas numpy scikit-learn matplotlib seaborn xgboost joblib notebook
Open the notebook:

jupyter notebook notebook.ipynb

Dataset

If the dataset is not included in this repository, you can download the Pima Indians Diabetes dataset from UCI Machine Learning Repository or Kaggle. Ensure the dataset is placed in a data/ folder or update the notebook paths accordingly.

Typical feature columns include:

Pregnancies
Glucose
BloodPressure
SkinThickness
Insulin
BMI
DiabetesPedigreeFunction
Age
Outcome (target: 0 = no diabetes, 1 = diabetes)

Results and Metrics

See the notebook for model comparisons and evaluation metrics. Commonly reported metrics:

Accuracy
Precision / Recall / F1-score
ROC AUC
Confusion matrix visualization

Reproducibility

Set a random seed in the notebook to make experiments reproducible.
Save trained model artifacts (e.g., model.joblib) and preprocessing pipelines.

Contact

Repository owner: joaogcfa (https://github.com/joaogcfa)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Projeto .ipynb		Projeto .ipynb
README.md		README.md
diabetes.csv		diabetes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Project: Diabetes Prediction

Contents

Project Overview

How to run

Dataset

Results and Metrics

Reproducibility

Contact

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

joaogcfa/Machine-Learnig-Project

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Project: Diabetes Prediction

Contents

Project Overview

How to run

Dataset

Results and Metrics

Reproducibility

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages