Skip to content

SwastikaManna/Cancer-Prediction-System

Repository files navigation

๐Ÿ”ฌ Advanced Cancer Prediction System - Project Report

Executive Summary

This project presents a state-of-the-art machine learning system for cancer prediction, achieving 97.37% accuracy using the Wisconsin Breast Cancer Dataset. The system incorporates advanced ML techniques, comprehensive feature engineering, and a professional web-based interface suitable for medical applications.

๐ŸŽฏ Project Highlights

Performance Achievements

  • Best Model: Support Vector Machine (SVM) & Neural Network (tied)
  • Accuracy: 97.37% (114/114 test samples)
  • AUC Score: 0.994 (SVM) / 0.996 (Neural Network)
  • Sensitivity: 97.22% (Benign detection)
  • Specificity: 97.62% (Malignant detection)

Technical Excellence

  • 7 Advanced ML Models with hyperparameter tuning
  • Ensemble Methods with voting classifiers
  • Feature Engineering with correlation-based selection
  • Professional Web Interface with medical-grade UX

๐Ÿ“Š Model Performance Comparison

Model Accuracy AUC Score Key Features
SVM 97.37% 0.9940 RBF kernel, C=100
Neural Network 97.37% 0.9964 2-layer MLP (100,50)
Ensemble Voting 96.49% 0.9940 4-model ensemble
XGBoost 95.61% 0.9901 200 estimators
Logistic Regression 95.61% 0.9911 L1 regularization
Random Forest 94.74% 0.9907 200 trees, depth=10
Gradient Boosting 93.86% 0.9891 100 estimators

๐Ÿงฌ Dataset & Features

Wisconsin Breast Cancer Dataset

  • 569 samples (357 benign, 212 malignant)
  • 30 original features computed from cell nuclei images
  • 15 selected features based on correlation analysis

Top 5 Most Important Features

  1. Worst Perimeter (16.44% importance)
  2. Worst Concave Points (14.84% importance)
  3. Worst Radius (14.09% importance)
  4. Worst Area (13.17% importance)
  5. Mean Concave Points (13.12% importance)

๐Ÿ”ง Technical Implementation

Advanced ML Techniques

  • GridSearchCV & RandomizedSearchCV for hyperparameter optimization
  • StandardScaler for feature normalization
  • Stratified splitting to maintain class distribution
  • Cross-validation (5-fold CV) for robust evaluation
  • Ensemble methods for improved performance

Model Architecture

Input Features (15) โ†’ Scaling โ†’ Feature Selection โ†’ Model Training
                                                  โ†“
Best Models: SVM (RBF) & Neural Network (100-50-1)
                                                  โ†“
Ensemble Voting โ†’ Final Prediction + Confidence

Hyperparameter Optimization

  • SVM: C=100, gamma='scale', kernel='rbf'
  • Neural Network: hidden_layers=(100,50), activation='relu'
  • Random Forest: n_estimators=200, max_depth=10
  • XGBoost: n_estimators=200, max_depth=6

๐Ÿ’ป Web Application Features

๐ŸŽฏ Prediction Interface

  • Interactive input forms for 15 key features
  • Real-time validation with normal range indicators
  • Risk stratification: Low/Medium/High categories
  • Confidence scoring with probability breakdown
  • Medical recommendations based on results

๐Ÿ“ˆ Model Analytics

  • Performance comparison charts
  • Feature importance visualization
  • Model parameter displays
  • Cross-validation metrics

๐Ÿฅ Medical Integration

  • Professional medical design
  • Print-friendly results
  • Clinical disclaimers
  • Educational content

๐Ÿ”ฌ Clinical Validation

Confusion Matrix (SVM Model)

                Predicted
              Malignant  Benign
Actual Malignant   41      1
       Benign       2     70

Key Metrics

  • Precision (Benign): 98.59%
  • Precision (Malignant): 95.35%
  • False Positive Rate: 2.38%
  • False Negative Rate: 2.78%

๐Ÿ“š Dataset Information

Source

  • Wisconsin Diagnostic Breast Cancer Dataset
  • UCI Machine Learning Repository
  • Dr. William H. Wolberg, University of Wisconsin

Features Categories

  1. Size Measurements: radius, perimeter, area
  2. Shape Characteristics: compactness, concavity, concave points
  3. Texture Properties: smoothness, symmetry, fractal dimension
  4. Statistical Variations: standard error, worst values

๐Ÿš€ Deployment & Accessibility

Web Application

  • Responsive design for all devices
  • Cross-browser compatibility
  • Fast loading with optimized assets
  • Accessible interface following WCAG guidelines

Technical Stack

  • Frontend: HTML5, CSS3, JavaScript
  • ML Backend: Python, Scikit-learn
  • Visualization: Chart.js
  • Deployment: Web-based interface

โš ๏ธ Limitations & Disclaimers

Medical Disclaimer

  • Not a substitute for professional medical diagnosis
  • Educational purpose only
  • Requires physician interpretation
  • Based on limited dataset scope

Technical Limitations

  • Single dataset training (generalizability concerns)
  • Feature-based prediction (no image analysis)
  • Simplified model deployment (not production-grade)

๐Ÿ”ฎ Future Enhancements

Advanced Features

  • Deep learning models with CNN for image analysis
  • Multi-modal integration (clinical + imaging data)
  • Federated learning for privacy-preserving training
  • Real-time model updating with new data

Clinical Integration

  • FHIR compliance for healthcare systems
  • EHR integration capabilities
  • Multi-language support
  • Telemedicine compatibility

๐Ÿ“ˆ Project Impact

Educational Value

  • Comprehensive ML pipeline demonstration
  • Real-world dataset application
  • Professional development practices
  • Medical AI ethics consideration

Technical Excellence

  • Graduate-level implementation complexity
  • Industry-standard practices
  • Reproducible methodology
  • Scalable architecture

๐Ÿ† Conclusion

This Advanced Cancer Prediction System demonstrates the successful application of modern machine learning techniques to healthcare challenges. With 97.37% accuracy and a professional web interface, it represents a comprehensive solution suitable for both educational and preliminary clinical applications.

The system successfully integrates:

  • Multiple ML algorithms with optimal hyperparameter tuning
  • Advanced feature engineering and selection techniques
  • Professional-grade web interface with medical design principles
  • Comprehensive evaluation with clinical metrics

This project showcases the potential of AI/ML in healthcare while maintaining appropriate clinical disclaimers and limitations awareness.


๐Ÿ”ฌ Project Status: Complete
๐Ÿ“… Completion Date: August 2025
๐Ÿ‘จโ€๐Ÿ’ป Development Level: CSE BTech Graduate
๐ŸŽฏ Accuracy Achieved: 97.37%
๐Ÿฅ Clinical Readiness: Educational/Research Phase

โš ๏ธ Disclaimer

Educational purposes only. Not for clinical diagnosis.

๐Ÿ“ง Contact

About

This project presents a state-of-the-art machine learning system for cancer prediction, achieving 97.37% accuracy using the Wisconsin Breast Cancer Dataset. The system incorporates advanced ML techniques, comprehensive feature engineering, and a professional web-based interface suitable for medical applications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors