Repository files navigation Machine Learning Diabetes Prediction
Logistic Regression
Support Vector Classifier
KNearest Neighbors Classifier
Random Forest Classifier
Numpy
Pandas
Matplotlib
Seaborn
Scikit-learn
Exploratory Data Analysis and Visualization Summary:
There are no missing values in the dataset
The dataset is imbalanced
No negative and closer to 1.0 correlations based on the correlation matrix
The features are skewed and have outliers
Conclusion (Comments):
It seems that the SVC model has reached its maximum potential on a imbalanced dataset
Other solutions is to penalize or apply regularization and gradient descent techniques in the selected models
Other solutions is to have more data to solve the imbalanced dataset or use sampling techniques
After that, select a few more classification models based on the defined problem, type of data, and the expected outcome
Scaled all the features too early
Fit only for the x_train and tranform only for the x_test dataset using the StandardScaler
About
Machine Learning Diabetes Prediction using 4 Classifier Algorithms for Fitting the Data.
Topics
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.