This project presents a state-of-the-art machine learning system for cancer prediction, achieving 97.37% accuracy using the Wisconsin Breast Cancer Dataset. The system incorporates advanced ML techniques, comprehensive feature engineering, and a professional web-based interface suitable for medical applications.
๐ [Live Demo] https://cancer-prediction-system.netlify.app/
- Best Model: Support Vector Machine (SVM) & Neural Network (tied)
- Accuracy: 97.37% (114/114 test samples)
- AUC Score: 0.994 (SVM) / 0.996 (Neural Network)
- Sensitivity: 97.22% (Benign detection)
- Specificity: 97.62% (Malignant detection)
- 7 Advanced ML Models with hyperparameter tuning
- Ensemble Methods with voting classifiers
- Feature Engineering with correlation-based selection
- Professional Web Interface with medical-grade UX
| Model | Accuracy | AUC Score | Key Features |
|---|---|---|---|
| SVM | 97.37% | 0.9940 | RBF kernel, C=100 |
| Neural Network | 97.37% | 0.9964 | 2-layer MLP (100,50) |
| Ensemble Voting | 96.49% | 0.9940 | 4-model ensemble |
| XGBoost | 95.61% | 0.9901 | 200 estimators |
| Logistic Regression | 95.61% | 0.9911 | L1 regularization |
| Random Forest | 94.74% | 0.9907 | 200 trees, depth=10 |
| Gradient Boosting | 93.86% | 0.9891 | 100 estimators |
- 569 samples (357 benign, 212 malignant)
- 30 original features computed from cell nuclei images
- 15 selected features based on correlation analysis
- Worst Perimeter (16.44% importance)
- Worst Concave Points (14.84% importance)
- Worst Radius (14.09% importance)
- Worst Area (13.17% importance)
- Mean Concave Points (13.12% importance)
- GridSearchCV & RandomizedSearchCV for hyperparameter optimization
- StandardScaler for feature normalization
- Stratified splitting to maintain class distribution
- Cross-validation (5-fold CV) for robust evaluation
- Ensemble methods for improved performance
Input Features (15) โ Scaling โ Feature Selection โ Model Training
โ
Best Models: SVM (RBF) & Neural Network (100-50-1)
โ
Ensemble Voting โ Final Prediction + Confidence
- SVM: C=100, gamma='scale', kernel='rbf'
- Neural Network: hidden_layers=(100,50), activation='relu'
- Random Forest: n_estimators=200, max_depth=10
- XGBoost: n_estimators=200, max_depth=6
- Interactive input forms for 15 key features
- Real-time validation with normal range indicators
- Risk stratification: Low/Medium/High categories
- Confidence scoring with probability breakdown
- Medical recommendations based on results
- Performance comparison charts
- Feature importance visualization
- Model parameter displays
- Cross-validation metrics
- Professional medical design
- Print-friendly results
- Clinical disclaimers
- Educational content
Predicted
Malignant Benign
Actual Malignant 41 1
Benign 2 70
- Precision (Benign): 98.59%
- Precision (Malignant): 95.35%
- False Positive Rate: 2.38%
- False Negative Rate: 2.78%
- Wisconsin Diagnostic Breast Cancer Dataset
- UCI Machine Learning Repository
- Dr. William H. Wolberg, University of Wisconsin
- Size Measurements: radius, perimeter, area
- Shape Characteristics: compactness, concavity, concave points
- Texture Properties: smoothness, symmetry, fractal dimension
- Statistical Variations: standard error, worst values
- Responsive design for all devices
- Cross-browser compatibility
- Fast loading with optimized assets
- Accessible interface following WCAG guidelines
- Frontend: HTML5, CSS3, JavaScript
- ML Backend: Python, Scikit-learn
- Visualization: Chart.js
- Deployment: Web-based interface
- Not a substitute for professional medical diagnosis
- Educational purpose only
- Requires physician interpretation
- Based on limited dataset scope
- Single dataset training (generalizability concerns)
- Feature-based prediction (no image analysis)
- Simplified model deployment (not production-grade)
- Deep learning models with CNN for image analysis
- Multi-modal integration (clinical + imaging data)
- Federated learning for privacy-preserving training
- Real-time model updating with new data
- FHIR compliance for healthcare systems
- EHR integration capabilities
- Multi-language support
- Telemedicine compatibility
- Comprehensive ML pipeline demonstration
- Real-world dataset application
- Professional development practices
- Medical AI ethics consideration
- Graduate-level implementation complexity
- Industry-standard practices
- Reproducible methodology
- Scalable architecture
This Advanced Cancer Prediction System demonstrates the successful application of modern machine learning techniques to healthcare challenges. With 97.37% accuracy and a professional web interface, it represents a comprehensive solution suitable for both educational and preliminary clinical applications.
The system successfully integrates:
- Multiple ML algorithms with optimal hyperparameter tuning
- Advanced feature engineering and selection techniques
- Professional-grade web interface with medical design principles
- Comprehensive evaluation with clinical metrics
This project showcases the potential of AI/ML in healthcare while maintaining appropriate clinical disclaimers and limitations awareness.
๐ฌ Project Status: Complete
๐
Completion Date: August 2025
๐จโ๐ป Development Level: CSE BTech Graduate
๐ฏ Accuracy Achieved: 97.37%
๐ฅ Clinical Readiness: Educational/Research Phase
Educational purposes only. Not for clinical diagnosis.
- GitHub: https://github.com/SwastikaManna
- Email: swastikamanna03@gmail.com