A microservices-based computer vision application powered by YOLO11 models, featuring a Streamlit frontend, orchestration layer, and inference backend. This project demonstrates multiple YOLO tasks including object detection, classification, segmentation, pose estimation, and oriented bounding boxes.
The project follows a microservices architecture with three main services:
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Frontend βββββββΆβ Orchestrator βββββββΆβ Vision β
β (Streamlit) β β Service β β Service β
β Port 9700 β β Port 9600 β β Port 9500 β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
- Frontend - Streamlit-based web UI for user interaction
- Orchestrator - Middleware service that coordinates requests and annotates results
- Vision - Core inference service running YOLO11 models
- common/ - Shared schemas, enums, and utilities
schemas/- Pydantic models for requests, responses, and resultsutils/- Image conversion utilities (OpenCV β bytes)
| Task | Status | Description |
|---|---|---|
| Object Detection | β Implemented | Detect and classify objects with bounding boxes |
| Classification | π§ Pending | Image classification into predefined categories |
| Segmentation | π§ Pending | Instance segmentation with pixel-level masks |
| Pose Estimation | π§ Pending | Detect human keypoints and poses |
| Oriented Bounding Boxes (OBB) | π§ Pending | Rotated bounding boxes for aerial/satellite imagery |
| Mode | Status | Description |
|---|---|---|
| Image | β Implemented | Upload and process single images |
| Video | π§ Pending | Process video files frame-by-frame |
| Webcam (Live) | π§ Pending | Real-time inference from webcam feed |
- Docker & Docker Compose
- Python 3.13+ (for local development)
- 2GB+ RAM recommended
-
Clone the repository
git clone <repository-url> cd yolo-playground
-
Build and start all services
docker-compose up --build
-
Access the application
- Frontend UI: http://localhost:9700
- Orchestrator API: http://localhost:9600/docs
- Vision API: http://localhost:9500/docs
Each service can be run independently. See individual service READMEs for details:
- Open the frontend at http://localhost:9700
- Select inference mode (Image/Video/Webcam)
- Select task type (Detect/Classify/Segment/Pose/OBB)
- Upload an image or start webcam
- View annotated results in real-time
curl -X POST "http://localhost:9600/api/v1/tasks/detect" \
-F "file=@image.jpg" \
-o result.jpgVision Service
MODEL_VERSION: YOLO model version (default:yolo11n)
Orchestrator Service
VISION_SERVICE_HOST: Vision service hostname (default:vision)VISION_SERVICE_PORT: Vision service port (default:8000)
Frontend Service
ORCHESTRATOR_SERVICE_HOST: Orchestrator hostname (default:orchestrator)ORCHESTRATOR_SERVICE_PORT: Orchestrator port (default:8000)
yolo11n- Nano (fastest, smallest)yolo11s- Smallyolo11m- Mediumyolo11l- Largeyolo11x- Extra Large (slowest, most accurate)
yolo-playground/
βββ common/ # Shared code across services
β βββ schemas/ # Pydantic models
β β βββ enums.py # Enums (Task, Mode, ModelVersion)
β β βββ requests.py # Request schemas
β β βββ responses.py # Response schemas
β β βββ results.py # Result models (boxes, masks, etc.)
β βββ utils/
β βββ convert.py # Image conversion utilities
βββ services/
β βββ frontend/ # Streamlit UI
β βββ orchestrator/ # Middleware service
β βββ vision/ # YOLO inference engine
βββ docker-compose.yml # Multi-service orchestration
# Test vision service health
curl http://localhost:9500/health
# Test orchestrator service health
curl http://localhost:9600/health
# View API documentation
# Vision: http://localhost:9500/docs
# Orchestrator: http://localhost:9600/docsInteractive API documentation is available via Swagger UI:
- Vision Service: http://localhost:9500/docs
- Orchestrator Service: http://localhost:9600/docs
- Complete classification task implementation
- Complete segmentation task implementation
- Complete pose estimation task implementation
- Complete OBB (Oriented Bounding Box) task implementation
- Implement video file processing
- Implement real-time webcam inference
- Add batch processing support
- Implement model caching and optimization
- Add confidence threshold configuration
- Add NMS (Non-Maximum Suppression) threshold tuning
- Support for custom trained models
- Add result export functionality (JSON, CSV)
- Performance metrics dashboard
- Model comparison feature
- Add unit tests and integration tests
- CI/CD pipeline setup
- Video mode UI placeholder implemented but not functional
- Webcam mode commented out (requires streamlit-webrtc)
- Classification endpoint returns empty results
- Segmentation endpoint returns empty results
- Pose endpoint returns empty results
- OBB endpoint returns empty results
- Backend: FastAPI, Uvicorn
- Frontend: Streamlit
- ML Framework: Ultralytics YOLO11, ONNX Runtime
- Image Processing: OpenCV, NumPy
- HTTP Client: httpx, requests
- Containerization: Docker, Docker Compose
[Add license information here]
[Add contribution guidelines here]
[Add support/contact information here]