- Introduction
- Project Objective
- Executive Summary
- Technologies and Concepts
- Data Sources
- Getting Started
- Results
- Future Improvements
- Contributing
- License
- Acknowledgments
SafeConnect is an innovative platform that leverages real-world data and advanced AI techniques to predict disease spread on a global scale. This project was developed as part of the Data Science Post-Graduate Program at Descomplica College, specifically for Module 03, which covers Regression and Prediction, Deep Learning, Network Science, Perceptron, and Adaline.
- Predict disease spread using real-world data and AI models
- Support health authorities in strategic decision-making and resource allocation
- Demonstrate practical application of Module 03 concepts in an integrated project
The rapid spread of diseases like COVID-19 highlights the critical need for effective predictive tools. SafeConnect combines reliable real-world data with advanced AI techniques to forecast disease spread and support strategic decision-making. Our platform empowers governments and communities to take effective action, ultimately saving lives and resources.
- Regression and Prediction
- Random Forest Regressor model for predicting the logarithm of new cases based on population and confirmed case data
- Deep Learning
- Implementation of Artificial Neural Networks (MLPClassifier) with hyperparameter tuning and stratified cross-validation for high-risk country classification
- Perceptron and Adaline
- While not used in the final version due to performance optimization, these concepts influenced early development
- Network Science
- Country centrality analysis in a simplified global network to understand disease spread influence
- Cross-Validation and Hyperparameter Tuning
- Utilizing Stratified K-Fold Cross-Validation and GridSearchCV to ensure model generalization and prevent overfitting
- COVID-19 Confirmed Cases
- Source: Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
- URL:
https://github.com/CSSEGISandData/COVID-19
- Demographic Data
- Source: World Bank Open Data
- URL:
https://data.worldbank.org/indicator/SP.POP.TOTL
- Python 3.7+
- Required Python libraries:
numpypandasnetworkxmatplotlibrequestsscikit-learnseaborn
-
Clone the repository
git clone https://github.com/your-username/safeconnect.git
-
Navigate to project directory
cd safeconnect -
Create virtual environment (recommended)
python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows
-
Install dependencies
pip install -r requirements.txt
-
Run the main script
python safeconnect.py
-
View results
- Graphs and results will display on screen
- Logs and metrics will appear in console
-
Random Forest Regressor
- Mean Squared Error: 6.56
- Interpretation: Model accurately predicts the logarithm of new cases
-
Random Forest Classifier
- Test Set Accuracy: 100%
- Stratified Cross-Validation Mean Accuracy: 100%
- Interpretation: Highly effective in classifying high-risk countries with excellent generalization capability
-
MLPClassifier (Neural Network)
- Test Set Accuracy: 100%
- Stratified Cross-Validation Mean Accuracy: 100%
- Interpretation: Exceptional performance after hyperparameter tuning and overfitting prevention
-
Network Science Analysis
- Country centrality as a predictor variable improved model performance
-
Overfitting Analysis
- Continue monitoring high accuracy rates to prevent overfitting with future data
-
Additional Data
- Incorporate mobility indices, government measures, and vaccination rates
-
Advanced Models
- Explore time series models or LSTM networks for temporal dependencies
-
Network Refinement
- Use actual country connection data (travel flows, borders) to enhance network science analysis
Contributions are welcome! Feel free to open issues and submit pull requests.
-
Fork the repository
-
Create a feature branch
git checkout -b feature/new-feature
-
Commit changes
git commit -m "Feature description" -
Push to remote
git push origin feature/new-feature
-
Open a Pull Request
This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). See the LICENSE file for details.
- Attribution required — You must give appropriate credit, provide a link to the license, and indicate if changes were made
- NonCommercial — You may not use the material for commercial purposes
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits
For the complete CC BY-NC-ND 4.0 license terms, please refer to the LICENSE file in this repository or visit Creative Commons.
- Descomplica College
- For the opportunity to apply knowledge gained in the Data Science Post-Graduate Program
- Open Source Community
- For providing essential libraries and datasets
Contact:
For questions or suggestions about this repository, please contact me through GitHub.
This project fulfills requirements for Module 03 of the Data Science Post-Graduate Program at Descomplica College.