|
| 1 | + |
| 2 | +<div align="center"> |
| 3 | +# 🫁 Chest Cancer Classification using Deep Learning |
| 4 | + |
| 5 | +[](https://www.python.org/) |
| 6 | +[](https://www.tensorflow.org/) |
| 7 | +[](https://fastapi.tiangolo.com/) |
| 8 | +[](https://www.docker.com/) |
| 9 | +[](https://aws.amazon.com/ecs/) |
| 10 | +[](https://mlflow.org/) |
| 11 | + |
| 12 | +</div> |
| 13 | +--- |
| 14 | + |
| 15 | +## 📋 Overview |
| 16 | + |
| 17 | +An end-to-end deep learning solution for detecting **Adenocarcinoma** cancer from chest CT scan images. Built with production-grade MLOps practices, this project demonstrates complete ML pipeline implementation from data ingestion to deployment with automated CI/CD workflows. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## ✨ Key Features |
| 22 | + |
| 23 | +### Machine Learning Pipeline |
| 24 | +- **Transfer Learning** with EfficientNetB0 for optimal performance |
| 25 | +- **Automated training pipeline** with modular component architecture |
| 26 | +- **MLflow integration** for experiment tracking and model versioning |
| 27 | +- **DVC (Data Version Control)** for reproducible data pipelines |
| 28 | + |
| 29 | +### Production-Ready Application |
| 30 | +- **FastAPI REST API** with clean, async endpoints |
| 31 | +- **Interactive web interface** with drag-and-drop image upload |
| 32 | +- **Model caching** for sub-second inference after initial load |
| 33 | +- **Health check endpoints** for monitoring |
| 34 | + |
| 35 | +### MLOps & DevOps |
| 36 | +- **CI/CD Pipeline** with GitHub Actions |
| 37 | +- **Docker containerization** with optimized image size |
| 38 | +- **AWS ECS deployment** ready with automated workflows |
| 39 | +- **Environment-based configuration** for secure credential management |
| 40 | + |
| 41 | +--- |
| 42 | + |
| 43 | +## 🛠️ Tech Stack |
| 44 | + |
| 45 | +### Core ML/DL |
| 46 | +- **TensorFlow/Keras** - Deep learning framework |
| 47 | +- **EfficientNetB0** - Pre-trained CNN model |
| 48 | +- **NumPy, Pandas** - Data manipulation |
| 49 | + |
| 50 | +### MLOps Tools |
| 51 | +- **MLflow** - Experiment tracking and model registry |
| 52 | +- **DVC** - Data and model versioning |
| 53 | +- **DagHub** - Remote experiment tracking |
| 54 | + |
| 55 | +### Backend & API |
| 56 | +- **FastAPI** - Modern web framework for building APIs |
| 57 | +- **Uvicorn** - ASGI server |
| 58 | +- **Python-multipart** - File upload handling |
| 59 | + |
| 60 | +### Frontend |
| 61 | +- **TailwindCSS** - Responsive UI design |
| 62 | +- **Vanilla JavaScript** - Interactive web interface |
| 63 | + |
| 64 | +### DevOps & Cloud |
| 65 | +- **Docker** - Containerization |
| 66 | +- **GitHub Actions** - CI/CD automation |
| 67 | +- **AWS ECS** - Container orchestration |
| 68 | +- **AWS ECR** - Container registry |
| 69 | + |
| 70 | +### Development Tools |
| 71 | +- **Python-dotenv** - Environment variable management |
| 72 | +- **PyYAML** - Configuration file parsing |
| 73 | +- **Python-box** - Dict to object conversion |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +## 📁 Project Structure |
| 78 | + |
| 79 | +``` |
| 80 | +├── .github/ |
| 81 | +│ └── workflows/ |
| 82 | +│ └── main.yaml # CI/CD pipeline configuration |
| 83 | +├── artifacts/ |
| 84 | +│ ├── data_ingestion/ # Downloaded and processed data |
| 85 | +│ ├── prepare_base_model/ # Base and updated models |
| 86 | +│ └── training/ # Trained models and logs |
| 87 | +├── config/ |
| 88 | +│ └── config.yaml # Project configuration |
| 89 | +├── research/ |
| 90 | +│ ├── 01_data_ingestion.ipynb |
| 91 | +│ ├── 02_prepare_base_model.ipynb |
| 92 | +│ ├── 03_model_trainer.ipynb |
| 93 | +│ └── 04_model_evaluation_with_mlflow.ipynb |
| 94 | +├── src/cnnClassifier/ |
| 95 | +│ ├── components/ # Core ML components |
| 96 | +│ │ ├── data_ingestion.py |
| 97 | +│ │ ├── prepare_base_model.py |
| 98 | +│ │ ├── model_trainer.py |
| 99 | +│ │ └── model_evaluation_mlflow.py |
| 100 | +│ ├── config/ |
| 101 | +│ │ └── configuration.py # Configuration manager |
| 102 | +│ ├── entity/ |
| 103 | +│ │ └── config_entity.py # Configuration dataclasses |
| 104 | +│ ├── pipeline/ # Training and prediction pipelines |
| 105 | +│ │ ├── stage_01_data_ingestion.py |
| 106 | +│ │ ├── stage_02_prepare_base_model.py |
| 107 | +│ │ ├── stage_03_model_trainer.py |
| 108 | +│ │ ├── stage_04_model_evaluation.py |
| 109 | +│ │ └── prediction.py |
| 110 | +│ ├── utils/ |
| 111 | +│ │ └── common.py # Utility functions |
| 112 | +│ └── constants/ |
| 113 | +│ └── __init__.py # Project constants |
| 114 | +├── templates/ |
| 115 | +│ └── index.html # Web interface |
| 116 | +├── app.py # FastAPI application |
| 117 | +├── main.py # Training pipeline entry point |
| 118 | +├── dvc.yaml # DVC pipeline configuration |
| 119 | +├── params.yaml # Model hyperparameters |
| 120 | +├── requirements.txt # Python dependencies |
| 121 | +├── Dockerfile # Container configuration |
| 122 | +├── .dockerignore # Docker build exclusions |
| 123 | +└── README.md |
| 124 | +``` |
| 125 | + |
| 126 | +<div align="center"> |
| 127 | +**⭐ Star this repo if you find it useful** |
| 128 | +</div> |
0 commit comments