Machine Learning Diabetes Prediction

Algorithms Used:

Logistic Regression
Support Vector Classifier
KNearest Neighbors Classifier
Random Forest Classifier

Python Libraries Used:

Numpy
Pandas
Matplotlib
Seaborn
Scikit-learn

Exploratory Data Analysis and Visualization Summary:

There are no missing values in the dataset
The dataset is imbalanced
No negative and closer to 1.0 correlations based on the correlation matrix
The features are skewed and have outliers

Conclusion (Comments):

It seems that the SVC model has reached its maximum potential on a imbalanced dataset
Other solutions is to penalize or apply regularization and gradient descent techniques in the selected models
Other solutions is to have more data to solve the imbalanced dataset or use sampling techniques
After that, select a few more classification models based on the defined problem, type of data, and the expected outcome

Noticed Mistakes:

Scaled all the features too early
Fit only for the x_train and tranform only for the x_test dataset using the StandardScaler

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Diabetes_Prediction.ipynb		Diabetes_Prediction.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Diabetes Prediction

Algorithms Used:

Python Libraries Used:

Exploratory Data Analysis and Visualization Summary:

Conclusion (Comments):

Noticed Mistakes:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Diabetes Prediction

Algorithms Used:

Python Libraries Used:

Exploratory Data Analysis and Visualization Summary:

Conclusion (Comments):

Noticed Mistakes:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages