A production-grade ML-based anomaly detection system for diesel engine air leaks using deep learning autoencoders, statistical methods, and real-time WebSocket inference.
- Overview
- Tech Stack
- Installation & Setup
- How to Run
- Project Structure
- Main Workflow & Pipeline
- Database Design
- API Endpoints
- Features
Diesel Engine Air Leak Detection is a comprehensive anomaly detection platform that identifies air leaks in diesel engine systems using a multi-model fusion approach. The system processes real-time sensor data from 12 engine parameters through multiple ML pipelines:
- Deep autoencoders (reconstruction-based anomaly scoring)
- One-Class SVM (distribution-based outlier detection)
- Mahalanobis distance (statistical distance metrics)
- Kalman filtering (signal smoothing across 12 sensor channels)
Real-time detection happens via WebSocket with rolling-window confirmation (2-window validation) to minimize false positives.
- β Predictive maintenance for diesel engine fleets
- β Early detection of air intake system failures
- β Automated health monitoring dashboards
- β Real-time alerting for critical anomalies
- Django 6.0 - Web framework & ORM
- Django REST Framework - API development
- Django Channels - WebSocket/ASGI for real-time inference
- TensorFlow/Keras - Deep learning (autoencoders)
- scikit-learn - Classical ML (SVM, Mahalanobis)
- NumPy - Numerical computing
- Pandas - Data manipulation
- Kalman Filters (custom implementation) - Signal filtering
- Streamlit - Real-time dashboard
- Plotly - Interactive charts
- joblib - Model serialization
- python-dotenv - Environment management
- dj-database-url - Database URL parsing
- gunicorn - Production WSGI server
- daphne - WebSocket ASGI server
- Python 3.8+
- pip or conda
- Virtual environment (recommended)
git clone https://github.com/[YOUR_USERNAME]/DieselEngineLeakDetection.git
cd DieselEngineLeakDetection# Using venv
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Or using conda
conda create -n diesel-leak python=3.9
conda activate diesel-leakpip install -r requirements.txtCreate a .env file in the backend/diesel_engine_predictor/ directory:
SECRET_KEY=your-secret-key-here-change-in-production
DEBUG=True
DATABASE_URL=sqlite:///db.sqlite3
ALLOWED_HOSTS=localhost,127.0.0.1For production, use a secure secret key and external database.
cd backend/diesel_engine_predictor
python manage.py migrate
python manage.py createsuperuser # Create admin user# Generate 20K healthy engine samples
python -m ml_model.data_gen.healthy_data_gen
# Generate 20K leaky engine samples (with 3 leak types)
python -m ml_model.data_gen.leaky_data_gen# Train One-Class SVM on 12 sensor channels
python -m ml_model.models.svm.train.train_svm
# Train 4 subsystem-specific autoencoders
python -m ml_model.models.autoencoders.train.nn_model_boost
python -m ml_model.models.autoencoders.train.nn_model_dpf
python -m ml_model.models.autoencoders.train.nn_model_maf
python -m ml_model.models.autoencoders.train.nn_model_exhaust
# Train Mahalanobis distance model
python -m ml_model.models.mahalanobis.train_mahal
# Compute z-scores on test datasets
python -m ml_model.models.svm.z_score.svm_z_scorescd backend/diesel_engine_predictor
# Development (HTTP only)
python manage.py runserver
# Production (with WebSocket support via ASGI)
daphne -b 0.0.0.0 -p 8000 diesel_engine_predictor.asgi:applicationAPI Server runs at: http://localhost:8000
# From project root
streamlit run engine_simulator/app.pyDashboard opens at: http://localhost:8501
DieselEngineLeakDetection/
β
βββ backend/ # Django REST API backend
β βββ diesel_engine_predictor/
β βββ manage.py # Django CLI
β βββ db.sqlite3 # Development database
β βββ diesel_engine_predictor/ # Project settings
β β βββ settings.py # Django configuration
β β βββ urls.py # URL routing
β β βββ asgi.py # ASGI config (WebSocket)
β β βββ wsgi.py # WSGI config
β βββ predict/ # Core prediction app
β β βββ views.py # REST endpoints
β β βββ consumers.py # WebSocket consumer
β β βββ services/
β β β βββ pipeline.py # Inference pipeline (ensemble)
β β β βββ kalman_service.py # Kalman filtering service
β β βββ routing.py # WebSocket routing
β β βββ models.py # DB models
β β βββ urls.py # App URL patterns
β βββ user_auth/ # Authentication & authorization
β βββ views.py # Login, signup, logout
β βββ models.py # User, Engine, Sensor models
β βββ serializers.py # DRF serializers
β βββ urls.py # Auth URL patterns
β
βββ ml_model/ # ML training & evaluation
β βββ data_gen/ # Data generation
β β βββ engine_simulator_core.py # Physics-based engine simulator
β β βββ healthy_data_gen.py # Generate 20K healthy samples
β β βββ leaky_data_gen.py # Generate 20K leaky samples
β β βββ physics.py # 10 physics equations
β βββ data_store/ # Datasets (CSV)
β β βββ healthy_dataset.csv # 20K Γ 12 features
β β βββ leaky_dataset.csv # 20K Γ 12 features
β βββ kalman/ # Kalman filter implementations
β β βββ kalman_filter.py # 2D Kalman filter class
β β βββ kalman_layer.py # 12-channel Kalman layer
β β βββ kalman_tuning.py # Parameter estimation
β βββ models/ # Trained ML models
β β βββ residual/ # Legacy residual models (not used in main pipeline)
β β β βββ train/
β β β β βββ train_residual.py # Train 4 residual models
β β β βββ encoded/ # Saved .joblib models
β β βββ svm/ # One-Class SVM
β β β βββ train/
β β β β βββ train_svm.py # Train SVM
β β β βββ encoded/
β β β β βββ svm_model.joblib # Trained SVM
β β β βββ z_score/
β β β βββ svm_z_scores.py # Compute anomaly scores
β β βββ autoencoders/ # Deep autoencoders (reconstruction)
β β β βββ train/
β β β β βββ nn_model_boost.py # Boost subsystem AE (5 input features)
β β β β βββ nn_model_dpf.py # DPF subsystem AE (7 input features)
β β β β βββ nn_model_maf.py # MAF subsystem AE (5 input features)
β β β β βββ nn_model_exhaust.py # Exhaust subsystem AE (6 input features)
β β β βββ residual_score/
β β β βββ encoded_model/ # Saved .keras models
β β βββ mahalanobis/ # Mahalanobis distance
β β β βββ train_mahal.py # Train Mahalanobis model
β β β βββ distance.py # Mahalanobis distance computation
β β β βββ encoded/ # Saved model
β β βββ __init__.py
β βββ analysis/ # Visualizations & analysis
β βββ visualize_results.py # Generate analysis plots
β
βββ engine_simulator/ # Real-time monitoring dashboard
β βββ app.py # Streamlit dashboard (~280 lines)
β βββ digital_twin.py # Backward compatibility
β
βββ requirements.txt # Python dependencies
βββ README.md # This file
Diesel Engine Sensors (12 channels)
β
[Real-time Sensor Input: 12 parameters]
β
Kalman Filtering Layer (12 independent 2D filters)
β
[Smoothed sensor signals]
β
Ensemble Inference Pipeline:
βββ 4 Autoencoders (subsystem-specific)
β βββ 4 reconstruction error z-scores
βββ One-Class SVM (12 channels)
β βββ 1 SVM anomaly z-score
βββ Mahalanobis Distance (12D covariance matrix)
βββ 1 Mahalanobis z-score
β
Weighted L2 Fusion Scoring:
z_cumulative = β(z_ae_boostΒ² + z_ae_dpfΒ² + z_ae_mafΒ² + z_ae_exhaustΒ² + 0.3Β·z_mahalΒ² + z_svmΒ²)
β
[Fused anomaly score: 0 = healthy, >3.5 = anomaly]
β
Rolling Window Detector (7-sample window):
- Majority voting (β₯4/7 anomalous)
- 2-window confirmation (prevent false positives)
β
[Alert Decision: LEAK_CONFIRMED or HEALTHY]
β
WebSocket Real-time Response to Client
Uses a physics-driven digital twin with:
- Persistent engine state (RPM gradually evolves, not random)
- Turbo lag (first-order dynamical response)
- 3 leak types: precompressor, charge_air, exhaust
- Leak severity escalation (grows gradually over time)
- Healthy Data: 20,000 sequential samples from simulator
- Leaky Data: 20,000 samples with leak injections
- Autoencoders: 50 epochs, batch=128, 99th-percentile threshold
- SVM: 5% contamination assumption (nu=0.05)
- Mahalanobis: Covariance-based distance model on healthy distribution
- Kalman Parameters: Estimated from healthy data variance (auto-tuned)
- id (auto-generated)
- username (unique)
- email
- password (hashed)
- role (Viewer, Tester, Admin)
- history (JSON field - stores test history)
- last_login_time (DateTime)- EID (primary key)
- model_no (unique string)
- type (engine type/variant)
- created_at (auto timestamp)
- photo (optional image)- SID (primary key)
- rolling_window_data (JSON - anomalous sensor readings)
- next_steps (recommendations as text)- id (primary key)
- engine (ForeignKey β Engine)
- user (ForeignKey β User)
- sensor (ForeignKey β Sensor_Leaky_Data)
- test_check (Pass/Fail)
- checked_at (auto timestamp)User β1:Nβ Engine_Test
Engine β1:Nβ Engine_Test
Sensor_Leaky_Data β1:Nβ Engine_Test
| Endpoint | Method | Purpose | Auth Required |
|---|---|---|---|
/user_auth/signup/ |
POST | Register new user | β No |
/user_auth/login/ |
POST | Generate auth token | β No |
/user_auth/logout/ |
POST | Invalidate token | β Yes |
/user_auth/delete_account/ |
DELETE | Delete user account | β Yes |
| Endpoint | Method | Purpose | Auth Required |
|---|---|---|---|
/api/predict |
POST | One-shot inference (single sensor reading) | β Yes |
ws/engine/ |
WebSocket | Real-time streaming inference | β Yes |
Client β Server:
{
"rpm": 1714.94,
"fuel_rate": 92.65,
"turbo_speed": 62902.97,
"boost_pressure": 1.51,
"MAP": 2.49,
"IAT": 306.01,
"MAF": 1000.0,
"EGT": 819.44,
"exhaust_pressure": 3.5,
"VGT": 41.78,
"DPF_delta": 50203.11,
"ambient_pressure": 0.99
}Server β Client (Health Status):
{
"status": "HEALTHY" | "WINDOW_EVALUATED" | "LEAK_CONFIRMED",
"window_leak": true/false,
"leaky_samples_last_window": [...],
"z_scores": {...}
}- β 4 Subsystem Autoencoders (5-layer architecture, 2-neuron bottleneck)
- β One-Class SVM with Kalman-filtered 12-channel input
- β Mahalanobis Distance with data-driven covariance matrix
- β Weighted L2 Fusion combining 6 independent z-scores
- β 12 Independent Kalman Filters (data-driven parameters)
- β Auto-tuned from healthy data (variance-based Q & R estimation)
- β 2D state-space (position + velocity per channel)
- β Real-time WebSocket inference (ASGI/Channels)
- β 7-sample rolling window with majority voting (4/7 threshold)
- β 2-window confirmation to reduce false positives
- β Adjustable threshold (default: 3.5 z-score)
- β Token-based authentication (Django REST Framework)
- β Role-based permissions (Viewer, Tester, Admin)
- β User history tracking (JSON field)
- β Real-time Streamlit dashboard
- β 4D radar chart (subsystem anomaly z-scores: Boost, MAF, Exhaust, DPF)
- β Anomaly score gauge (0β15 scale)
- β Trend line visualization (50-point history)
- β Dynamic system status (HEALTHY / SUBTLE LEAK / CRITICAL LEAK)
- β 40,000 total samples (20K healthy + 20K leaky)
- β 12 sensor channels (physics-validated)
- β 3 leak types (precompressor, charge_air, exhaust)
- β 10 physics equations for realistic simulation
# 1. Setup
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# 2. Generate Data
python -m ml_model.data_gen.healthy_data_gen
python -m ml_model.data_gen.leaky_data_gen
# 3. Train Models
python -m ml_model.models.svm.train.train_svm
python -m ml_model.models.autoencoders.train.nn_model_boost
python -m ml_model.models.autoencoders.train.nn_model_dpf
python -m ml_model.models.autoencoders.train.nn_model_maf
python -m ml_model.models.autoencoders.train.nn_model_exhaust
python -m ml_model.models.mahalanobis.train_mahal
# 4. Setup Database
cd backend/diesel_engine_predictor
python manage.py migrate
python manage.py createsuperuser
# 5. Run Backend API
python manage.py runserver
# 6. Run Dashboard (new terminal)
streamlit run engine_simulator/app.pyBuilt with β€οΈ for predictive maintenance and real-time anomaly detection.