Skip to content

Latest commit

Β 

History

History
484 lines (388 loc) Β· 15.6 KB

File metadata and controls

484 lines (388 loc) Β· 15.6 KB

πŸ”§ Diesel Engine Air Leak Detection

A production-grade ML-based anomaly detection system for diesel engine air leaks using deep learning autoencoders, statistical methods, and real-time WebSocket inference.

πŸ“‹ Table of Contents


🎯 Overview

Diesel Engine Air Leak Detection is a comprehensive anomaly detection platform that identifies air leaks in diesel engine systems using a multi-model fusion approach. The system processes real-time sensor data from 12 engine parameters through multiple ML pipelines:

  • Deep autoencoders (reconstruction-based anomaly scoring)
  • One-Class SVM (distribution-based outlier detection)
  • Mahalanobis distance (statistical distance metrics)
  • Kalman filtering (signal smoothing across 12 sensor channels)

Real-time detection happens via WebSocket with rolling-window confirmation (2-window validation) to minimize false positives.

Key Use Cases

  • βœ… Predictive maintenance for diesel engine fleets
  • βœ… Early detection of air intake system failures
  • βœ… Automated health monitoring dashboards
  • βœ… Real-time alerting for critical anomalies

πŸ›  Tech Stack

Backend & ML Framework

  • Django 6.0 - Web framework & ORM
  • Django REST Framework - API development
  • Django Channels - WebSocket/ASGI for real-time inference
  • TensorFlow/Keras - Deep learning (autoencoders)
  • scikit-learn - Classical ML (SVM, Mahalanobis)

Data & Processing

  • NumPy - Numerical computing
  • Pandas - Data manipulation
  • Kalman Filters (custom implementation) - Signal filtering

Visualization & Monitoring

  • Streamlit - Real-time dashboard
  • Plotly - Interactive charts

Utilities

  • joblib - Model serialization
  • python-dotenv - Environment management
  • dj-database-url - Database URL parsing
  • gunicorn - Production WSGI server
  • daphne - WebSocket ASGI server

πŸ“¦ Installation & Setup

Prerequisites

  • Python 3.8+
  • pip or conda
  • Virtual environment (recommended)

Step 1: Clone the Repository

git clone https://github.com/[YOUR_USERNAME]/DieselEngineLeakDetection.git
cd DieselEngineLeakDetection

Step 2: Create a Virtual Environment

# Using venv
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Or using conda
conda create -n diesel-leak python=3.9
conda activate diesel-leak

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Environment Configuration

Create a .env file in the backend/diesel_engine_predictor/ directory:

SECRET_KEY=your-secret-key-here-change-in-production
DEBUG=True
DATABASE_URL=sqlite:///db.sqlite3
ALLOWED_HOSTS=localhost,127.0.0.1

For production, use a secure secret key and external database.

Step 5: Database Migrations

cd backend/diesel_engine_predictor
python manage.py migrate
python manage.py createsuperuser  # Create admin user

▢️ How to Run

Generate Training Data

# Generate 20K healthy engine samples
python -m ml_model.data_gen.healthy_data_gen

# Generate 20K leaky engine samples (with 3 leak types)
python -m ml_model.data_gen.leaky_data_gen

Train ML Models

# Train One-Class SVM on 12 sensor channels
python -m ml_model.models.svm.train.train_svm

# Train 4 subsystem-specific autoencoders
python -m ml_model.models.autoencoders.train.nn_model_boost
python -m ml_model.models.autoencoders.train.nn_model_dpf
python -m ml_model.models.autoencoders.train.nn_model_maf
python -m ml_model.models.autoencoders.train.nn_model_exhaust

# Train Mahalanobis distance model
python -m ml_model.models.mahalanobis.train_mahal

# Compute z-scores on test datasets
python -m ml_model.models.svm.z_score.svm_z_scores

Run the Backend API Server

cd backend/diesel_engine_predictor

# Development (HTTP only)
python manage.py runserver

# Production (with WebSocket support via ASGI)
daphne -b 0.0.0.0 -p 8000 diesel_engine_predictor.asgi:application

API Server runs at: http://localhost:8000

Run the Real-Time Monitoring Dashboard

# From project root
streamlit run engine_simulator/app.py

Dashboard opens at: http://localhost:8501


πŸ“ Project Structure

DieselEngineLeakDetection/
β”‚
β”œβ”€β”€ backend/                              # Django REST API backend
β”‚   └── diesel_engine_predictor/
β”‚       β”œβ”€β”€ manage.py                     # Django CLI
β”‚       β”œβ”€β”€ db.sqlite3                    # Development database
β”‚       β”œβ”€β”€ diesel_engine_predictor/      # Project settings
β”‚       β”‚   β”œβ”€β”€ settings.py              # Django configuration
β”‚       β”‚   β”œβ”€β”€ urls.py                  # URL routing
β”‚       β”‚   β”œβ”€β”€ asgi.py                  # ASGI config (WebSocket)
β”‚       β”‚   └── wsgi.py                  # WSGI config
β”‚       β”œβ”€β”€ predict/                      # Core prediction app
β”‚       β”‚   β”œβ”€β”€ views.py                 # REST endpoints
β”‚       β”‚   β”œβ”€β”€ consumers.py             # WebSocket consumer
β”‚       β”‚   β”œβ”€β”€ services/
β”‚       β”‚   β”‚   β”œβ”€β”€ pipeline.py          # Inference pipeline (ensemble)
β”‚       β”‚   β”‚   └── kalman_service.py    # Kalman filtering service
β”‚       β”‚   β”œβ”€β”€ routing.py               # WebSocket routing
β”‚       β”‚   β”œβ”€β”€ models.py                # DB models
β”‚       β”‚   └── urls.py                  # App URL patterns
β”‚       └── user_auth/                    # Authentication & authorization
β”‚           β”œβ”€β”€ views.py                 # Login, signup, logout
β”‚           β”œβ”€β”€ models.py                # User, Engine, Sensor models
β”‚           β”œβ”€β”€ serializers.py           # DRF serializers
β”‚           └── urls.py                  # Auth URL patterns
β”‚
β”œβ”€β”€ ml_model/                             # ML training & evaluation
β”‚   β”œβ”€β”€ data_gen/                         # Data generation
β”‚   β”‚   β”œβ”€β”€ engine_simulator_core.py     # Physics-based engine simulator
β”‚   β”‚   β”œβ”€β”€ healthy_data_gen.py          # Generate 20K healthy samples
β”‚   β”‚   β”œβ”€β”€ leaky_data_gen.py            # Generate 20K leaky samples
β”‚   β”‚   └── physics.py                   # 10 physics equations
β”‚   β”œβ”€β”€ data_store/                       # Datasets (CSV)
β”‚   β”‚   β”œβ”€β”€ healthy_dataset.csv          # 20K Γ— 12 features
β”‚   β”‚   └── leaky_dataset.csv            # 20K Γ— 12 features
β”‚   β”œβ”€β”€ kalman/                           # Kalman filter implementations
β”‚   β”‚   β”œβ”€β”€ kalman_filter.py             # 2D Kalman filter class
β”‚   β”‚   β”œβ”€β”€ kalman_layer.py              # 12-channel Kalman layer
β”‚   β”‚   └── kalman_tuning.py             # Parameter estimation
β”‚   β”œβ”€β”€ models/                           # Trained ML models
β”‚   β”‚   β”œβ”€β”€ residual/                     # Legacy residual models (not used in main pipeline)
β”‚   β”‚   β”‚   β”œβ”€β”€ train/
β”‚   β”‚   β”‚   β”‚   └── train_residual.py    # Train 4 residual models
β”‚   β”‚   β”‚   └── encoded/                 # Saved .joblib models
β”‚   β”‚   β”œβ”€β”€ svm/                          # One-Class SVM
β”‚   β”‚   β”‚   β”œβ”€β”€ train/
β”‚   β”‚   β”‚   β”‚   └── train_svm.py         # Train SVM
β”‚   β”‚   β”‚   β”œβ”€β”€ encoded/
β”‚   β”‚   β”‚   β”‚   └── svm_model.joblib     # Trained SVM
β”‚   β”‚   β”‚   └── z_score/
β”‚   β”‚   β”‚       └── svm_z_scores.py      # Compute anomaly scores
β”‚   β”‚   β”œβ”€β”€ autoencoders/                 # Deep autoencoders (reconstruction)
β”‚   β”‚   β”‚   β”œβ”€β”€ train/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ nn_model_boost.py    # Boost subsystem AE (5 input features)
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ nn_model_dpf.py      # DPF subsystem AE (7 input features)
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ nn_model_maf.py      # MAF subsystem AE (5 input features)
β”‚   β”‚   β”‚   β”‚   └── nn_model_exhaust.py  # Exhaust subsystem AE (6 input features)
β”‚   β”‚   β”‚   └── residual_score/
β”‚   β”‚   β”‚       └── encoded_model/       # Saved .keras models
β”‚   β”‚   β”œβ”€β”€ mahalanobis/                  # Mahalanobis distance
β”‚   β”‚   β”‚   β”œβ”€β”€ train_mahal.py           # Train Mahalanobis model
β”‚   β”‚   β”‚   β”œβ”€β”€ distance.py              # Mahalanobis distance computation
β”‚   β”‚   β”‚   └── encoded/                 # Saved model
β”‚   β”‚   └── __init__.py
β”‚   └── analysis/                         # Visualizations & analysis
β”‚       └── visualize_results.py         # Generate analysis plots
β”‚
β”œβ”€β”€ engine_simulator/                     # Real-time monitoring dashboard
β”‚   β”œβ”€β”€ app.py                            # Streamlit dashboard (~280 lines)
β”‚   └── digital_twin.py                   # Backward compatibility
β”‚
β”œβ”€β”€ requirements.txt                      # Python dependencies
└── README.md                             # This file

πŸ”„ Main Workflow & Pipeline

End-to-End Data Flow

Diesel Engine Sensors (12 channels)
        ↓
[Real-time Sensor Input: 12 parameters]
        ↓
Kalman Filtering Layer (12 independent 2D filters)
        ↓
[Smoothed sensor signals]
        ↓
Ensemble Inference Pipeline:
        β”œβ”€β†’ 4 Autoencoders (subsystem-specific)
    β”‚   └─→ 4 reconstruction error z-scores
    β”œβ”€β†’ One-Class SVM (12 channels)
    β”‚   └─→ 1 SVM anomaly z-score
    └─→ Mahalanobis Distance (12D covariance matrix)
        └─→ 1 Mahalanobis z-score
        ↓
Weighted L2 Fusion Scoring:
z_cumulative = √(z_ae_boost² + z_ae_dpf² + z_ae_maf² + z_ae_exhaust² + 0.3·z_mahal² + z_svm²)
        ↓
[Fused anomaly score: 0 = healthy, >3.5 = anomaly]
        ↓
Rolling Window Detector (7-sample window):
    - Majority voting (β‰₯4/7 anomalous)
    - 2-window confirmation (prevent false positives)
        ↓
[Alert Decision: LEAK_CONFIRMED or HEALTHY]
        ↓
WebSocket Real-time Response to Client

Data Generation Pipeline

Uses a physics-driven digital twin with:

  • Persistent engine state (RPM gradually evolves, not random)
  • Turbo lag (first-order dynamical response)
  • 3 leak types: precompressor, charge_air, exhaust
  • Leak severity escalation (grows gradually over time)

Model Training Workflow

  1. Healthy Data: 20,000 sequential samples from simulator
  2. Leaky Data: 20,000 samples with leak injections
  3. Autoencoders: 50 epochs, batch=128, 99th-percentile threshold
  4. SVM: 5% contamination assumption (nu=0.05)
  5. Mahalanobis: Covariance-based distance model on healthy distribution
  6. Kalman Parameters: Estimated from healthy data variance (auto-tuned)

πŸ—„οΈ Database Design

Models (Django ORM)

1. User (extends Django AbstractUser)

- id (auto-generated)
- username (unique)
- email
- password (hashed)
- role (Viewer, Tester, Admin)
- history (JSON field - stores test history)
- last_login_time (DateTime)

2. Engine

- EID (primary key)
- model_no (unique string)
- type (engine type/variant)
- created_at (auto timestamp)
- photo (optional image)

3. Sensor_Leaky_Data

- SID (primary key)
- rolling_window_data (JSON - anomalous sensor readings)
- next_steps (recommendations as text)

4. Engine_Test

- id (primary key)
- engine (ForeignKey β†’ Engine)
- user (ForeignKey β†’ User)
- sensor (ForeignKey β†’ Sensor_Leaky_Data)
- test_check (Pass/Fail)
- checked_at (auto timestamp)

Relationships

User ←1:Nβ†’ Engine_Test
Engine ←1:Nβ†’ Engine_Test
Sensor_Leaky_Data ←1:Nβ†’ Engine_Test

πŸ“‘ API Endpoints

Authentication Endpoints

Endpoint Method Purpose Auth Required
/user_auth/signup/ POST Register new user ❌ No
/user_auth/login/ POST Generate auth token ❌ No
/user_auth/logout/ POST Invalidate token βœ… Yes
/user_auth/delete_account/ DELETE Delete user account βœ… Yes

Prediction Endpoints

Endpoint Method Purpose Auth Required
/api/predict POST One-shot inference (single sensor reading) βœ… Yes
ws/engine/ WebSocket Real-time streaming inference βœ… Yes

WebSocket Message Format

Client β†’ Server:

{
  "rpm": 1714.94,
  "fuel_rate": 92.65,
  "turbo_speed": 62902.97,
  "boost_pressure": 1.51,
  "MAP": 2.49,
  "IAT": 306.01,
  "MAF": 1000.0,
  "EGT": 819.44,
  "exhaust_pressure": 3.5,
  "VGT": 41.78,
  "DPF_delta": 50203.11,
  "ambient_pressure": 0.99
}

Server β†’ Client (Health Status):

{
  "status": "HEALTHY" | "WINDOW_EVALUATED" | "LEAK_CONFIRMED",
  "window_leak": true/false,
  "leaky_samples_last_window": [...],
  "z_scores": {...}
}

✨ Features

πŸ€– Model Ensemble

  • βœ… 4 Subsystem Autoencoders (5-layer architecture, 2-neuron bottleneck)
  • βœ… One-Class SVM with Kalman-filtered 12-channel input
  • βœ… Mahalanobis Distance with data-driven covariance matrix
  • βœ… Weighted L2 Fusion combining 6 independent z-scores

πŸ”„ Signal Processing

  • βœ… 12 Independent Kalman Filters (data-driven parameters)
  • βœ… Auto-tuned from healthy data (variance-based Q & R estimation)
  • βœ… 2D state-space (position + velocity per channel)

🚨 Anomaly Detection

  • βœ… Real-time WebSocket inference (ASGI/Channels)
  • βœ… 7-sample rolling window with majority voting (4/7 threshold)
  • βœ… 2-window confirmation to reduce false positives
  • βœ… Adjustable threshold (default: 3.5 z-score)

πŸ” Authentication & Access Control

  • βœ… Token-based authentication (Django REST Framework)
  • βœ… Role-based permissions (Viewer, Tester, Admin)
  • βœ… User history tracking (JSON field)

πŸ“Š Monitoring & Visualization

  • βœ… Real-time Streamlit dashboard
  • βœ… 4D radar chart (subsystem anomaly z-scores: Boost, MAF, Exhaust, DPF)
  • βœ… Anomaly score gauge (0–15 scale)
  • βœ… Trend line visualization (50-point history)
  • βœ… Dynamic system status (HEALTHY / SUBTLE LEAK / CRITICAL LEAK)

πŸ“ˆ Dataset & Training

  • βœ… 40,000 total samples (20K healthy + 20K leaky)
  • βœ… 12 sensor channels (physics-validated)
  • βœ… 3 leak types (precompressor, charge_air, exhaust)
  • βœ… 10 physics equations for realistic simulation

πŸ“š Quick Start Commands

# 1. Setup
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# 2. Generate Data
python -m ml_model.data_gen.healthy_data_gen
python -m ml_model.data_gen.leaky_data_gen

# 3. Train Models
python -m ml_model.models.svm.train.train_svm
python -m ml_model.models.autoencoders.train.nn_model_boost
python -m ml_model.models.autoencoders.train.nn_model_dpf
python -m ml_model.models.autoencoders.train.nn_model_maf
python -m ml_model.models.autoencoders.train.nn_model_exhaust
python -m ml_model.models.mahalanobis.train_mahal

# 4. Setup Database
cd backend/diesel_engine_predictor
python manage.py migrate
python manage.py createsuperuser

# 5. Run Backend API
python manage.py runserver

# 6. Run Dashboard (new terminal)
streamlit run engine_simulator/app.py

Built with ❀️ for predictive maintenance and real-time anomaly detection.