🛡️ Droidit – ML-Powered Android Security Platform

Enterprise-grade ML-powered Android APK security analysis platform with comprehensive static analysis, ensemble machine learning models, explainable AI, production deployment infrastructure, and advanced security features.

🌟 Key Features

🎯 Multi-Layer Security Analysis

Rich Static Analysis: 40+ sophisticated features (permissions, components, certificates, reflection, obfuscation, entropy)
Dynamic Analysis Framework: Optional runtime behavior monitoring and API call tracking
Rule-Based Heuristics: Interpretable security rules with confidence scoring and performance tracking
Feature Fusion: Intelligent combination of static and dynamic analysis results with auto-alignment

🤖 Advanced Machine Learning Pipeline

Ensemble Models: Random Forest, XGBoost, LightGBM with stacking meta-learner
Auto-Feature Alignment: Handles model schema evolution automatically
Incremental Learning: Continuous model improvement from user feedback with importance weighting
Model Versioning: Complete lifecycle management with JSON manifests and performance tracking
SHAP Explainability: Model interpretability with intelligent caching for performance optimization

🚀 Production-Ready Architecture

Scalable FastAPI Service: Async processing, horizontal scaling, OpenAPI documentation
Security Hardened: Encrypted secret management, API key rotation, comprehensive audit trails
Comprehensive Monitoring: Prometheus metrics, Grafana dashboards, health monitoring
High Availability: Docker orchestration with Redis caching and PostgreSQL persistence
Risk Intelligence: Automated LOW/MEDIUM/HIGH/CRITICAL risk tier classification

🔧 Developer & Operations Experience

Plugin Architecture: Extensible system with auto-discovery and namespaced features
Complete CLI Tools: Training, inference, management, incremental learning commands
CI/CD Ready: Automated testing, security scanning, deployment pipelines
Rich Reporting: Beautiful HTML reports and structured JSON exports with analytics
Load Testing: Performance benchmarking and bottleneck identification

Key Features

🎯 Core Analysis

Rich Static Analysis: 40+ features including permissions, exported components, certificates, reflection usage, native libraries, entropy analysis, obfuscation detection
Rule-Based Heuristics: Interpretable rules for weak signatures, excessive permissions, obfuscation patterns, dynamic loading
Risk Tier Classification: Automatic low/medium/high risk categorization
Stubbed Dynamic Analysis: Framework for future API monitoring, network analysis

🤖 Machine Learning

Ensemble Stacking: RF + XGBoost + LightGBM → per-label LogisticRegression meta-learners
Auto Feature Alignment: Handles missing columns gracefully with defaults
Model Versioning: JSON manifests with performance metrics and metadata
SHAP Explainability: Cached feature importance explanations

🚀 Production Ready

FastAPI Service: RESTful API with OpenAPI docs, auth, rate limiting, CORS
Prometheus Metrics: Scan counts, durations, risk tiers, rule triggers, cache stats
Feedback Loop: CSV/SQLAlchemy storage for active learning
Plugin System: Auto-discovery for custom feature extractors
Enhanced Reporting: HTML + JSON exports with tier distributions and rule usage

Quick Start (Windows)

Recommended: Use WSL2 Ubuntu for smooth androguard install.

# (Optional) In PowerShell – install WSL if not present
wsl --install

1. Environment Setup (WSL bash)

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

2. Generate Synthetic Data & Train Enhanced Model

# Generate synthetic data with all new features
python scripts/synth_generate.py

# Train with expanded feature set and versioning
python -m droidit.cli retrain --regenerate-synth

# Or train with static+dynamic fusion (synthetic dynamic features)
python -m droidit.cli train-fused --include-dynamic --regenerate-synth

3. Run Enhanced Inference

# Includes rule evaluation, risk tiers, JSON export
python scripts/run_infer.py

4. Run API Server with Full Feature Set

uvicorn droidit.api.server:app --reload --port 8000

Open: http://localhost:8000/docs

Configuration:

API key: DROIDIT_API_KEY=yourkey → header X-API-Key: yourkey
Rate limit: DROIDIT_RATE_LIMIT_PER_MINUTE=60
Metrics: DROIDIT_ENABLE_METRICS=1 → scrape /metrics
Feedback: DROIDIT_FEEDBACK_ENABLED=1, DROIDIT_DB_URL=sqlite:///feedback.db
SHAP explanations: DROIDIT_ENABLE_SHAP=1
Plugins: place .py files in plugins/ directory

5. Scan Real APKs

# Extract features from APKs
python scripts/extract_features.py data/apks data/features/features.csv --append

# Run inference with rule evaluation and risk tiers
python scripts/run_infer.py

# Or scan single APK via CLI
python -m droidit.cli scan path/to/app.apk --mode full

POST the APK directly:

curl -X POST -F "file=@/path/app.apk" http://localhost:8000/v1/scan/apk

Enhanced API Endpoints

🔍 Scanning & Analysis

POST /v1/scan/apk - Scan APK with static analysis and optional dynamic features
POST /v1/scan/batch - Batch scan multiple APKs
POST /v1/explain/apk - SHAP-based feature explanations (cached)

📊 Rules & Risk Assessment

GET /v1/rules - List all registered heuristic rules
Rules automatically evaluate: weak signatures, reflection usage, obfuscation, dynamic loading, excessive permissions

📈 Feedback & Analytics

POST /v1/feedback/label - Submit ground truth labels for active learning
GET /v1/feedback/stats - Aggregated feedback statistics
Supports CSV and SQLAlchemy database backends

🔧 Dynamic Analysis (Stubbed)

POST /v1/dynamic/logs - Ingest JSONL behavioral logs
POST /v1/dynamic/run - Execute dynamic analysis pipeline (synthetic)

📏 Metrics & Monitoring

GET /metrics - Prometheus metrics (scans, durations, risk tiers, rule triggers, SHAP cache)
Tracks: scan counts, failure rates, risk tier distribution, rule usage frequency

⚡ Performance Features

SHAP result caching with hit/miss metrics
Auto feature alignment for missing columns
Rule-based risk tier classification (low/medium/high)

Project Layout

droidit/
  features/            # Feature extraction modules
  ml/                  # Training, inference, registry
  api/                 # FastAPI server
  reporting/           # HTML/PDF report generation
  utils/               # Helpers (logging, hashing, paths)
  dynamic/frida/       # Frida hook starters
scripts/               # CLI helpers
data/                  # APKs, features, labels, models, reports
tests/                 # Pytest tests (basic)

Multi-Label Risk Categories (initial)

label_data_leak
label_insecure_crypto
label_exported_component

Extend by editing droidit/ml/dataset.py:BASE_LABELS.

Ensemble Overview

Base models: RandomForest, XGBoost, LightGBM (One-vs-Rest per label)
Meta layer: Logistic Regression stacking (prob inputs per label)
Persistence: joblib artifact in data/models/model.joblib

Dynamic Analysis (Stub + Future)

Current build offers:

POST /v1/dynamic/logs → ingest JSONL (Frida-style) logs, aggregate API usage.
POST /v1/dynamic/run → runs a simulated dynamic pipeline (placeholder) and returns synthetic behavioral features.
POST /v1/scan/apk?dynamic=true → combines static scan with stub dynamic feature block under dynamic field.

Planned full pipeline: Instrumentation (Frida), network interception (mitmproxy), behavior timeline → feature fusion.

Reporting

report_generator.py renders an HTML report with model scores, rule-based signals, risk_tier, and top features. Extend with SHAP (already integrated if installed) and PDF export.

Development

pytest -q

Security & Ethics

Only analyze APKs you are authorized to inspect. This tool is for defensive/security review purposes.

Next Steps

Dynamic feature ingestion (Frida logs → features)
Active learning feedback endpoint
SHAP caching & on-demand generation queue
Advanced permission/intent graph features
Grafana/Prom metrics endpoint

Docker

docker build -t droidit:latest .
docker run -p 8000:8000 -e DROIDIT_ENABLE_SHAP=1 droidit:latest

Key API Endpoints

Purpose	Method	Path	Notes
Health	GET	`/v1/health`	Status & model info
Labels	GET	`/v1/labels`	Active risk labels
Single Scan	POST	`/v1/scan/apk`	`file` APK multipart; `dynamic` & `explain` query flags
Batch Scan	POST	`/v1/scan/zip`	ZIP of APKs
Explain	POST	`/v1/explain/apk`	Requires `DROIDIT_ENABLE_SHAP=1`
Dynamic Logs	POST	`/v1/dynamic/logs`	JSONL ingestion
Dynamic Run (stub)	POST	`/v1/dynamic/run`	Simulated dynamic features
Feedback	POST	`/v1/feedback/label`	`{apk_hash,label,value}` (0/1)
Feedback Stats	GET	`/v1/feedback/stats`	Aggregated counts & positive ratios
Rules	GET	`/v1/rules`	List heuristic rules
Metrics	GET	`/metrics`	Prometheus exposition

CLI

droidit train                 # Train ensemble
droidit infer                 # Inference using existing features
droidit scan <apk> [--mode static|dynamic|full]
droidit feedback <apk_hash> <label> <value>

Install CLI (editable): pip install -e . (if pyproject present) or invoke module scripts.

Metrics

Exposed Prometheus counters / histograms:

droidit_scans_total
droidit_scan_failures_total
droidit_scan_duration_seconds
droidit_feedback_total
droidit_plugins_loaded
droidit_dynamic_analyses_total
droidit_risk_tier_total{tier="low|medium|high"}
droidit_rule_score

Feedback Persistence

CSV fallback: data/models/feedback.csv
SQLAlchemy (if installed & DROIDIT_DB_URL set) → creates feedback table.
Future: add retrieval endpoints & active learning retraining loop.

Plugin System

Implement a plugin:

from droidit.plugins import base
class MyPlugin:
  name = "sigscan"
  version = "0.1"
  def extract(self, apk_path: str):
    return {"matches": 3}
base.register(MyPlugin())

Features appear as plugin_sigscan_matches.

Enjoy hacking – iterate fast and expand feature coverage.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
droidit.egg-info		droidit.egg-info
droidit		droidit
k8s		k8s
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
INTELLIGENCE_SYSTEM.md		INTELLIGENCE_SYSTEM.md
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
SESSION_ENHANCEMENT_SUMMARY.md		SESSION_ENHANCEMENT_SUMMARY.md
config_snapshot_07047ac79414ae54.json		config_snapshot_07047ac79414ae54.json
config_snapshot_0e2f43753672f5d4.json		config_snapshot_0e2f43753672f5d4.json
config_snapshot_5b83ace214bcc6fb.json		config_snapshot_5b83ace214bcc6fb.json
config_snapshot_8d23e4ba175199ad.json		config_snapshot_8d23e4ba175199ad.json
config_snapshot_9bcceaed670e8bf1.json		config_snapshot_9bcceaed670e8bf1.json
config_snapshot_f58900bc49798078.json		config_snapshot_f58900bc49798078.json
config_snapshot_fdb1d57c15e8fd1a.json		config_snapshot_fdb1d57c15e8fd1a.json
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🛡️ Droidit – ML-Powered Android Security Platform

🌟 Key Features

🎯 Multi-Layer Security Analysis

🤖 Advanced Machine Learning Pipeline

🚀 Production-Ready Architecture

🔧 Developer & Operations Experience

Key Features

🎯 Core Analysis

🤖 Machine Learning

🚀 Production Ready

Quick Start (Windows)

1. Environment Setup (WSL bash)

2. Generate Synthetic Data & Train Enhanced Model

3. Run Enhanced Inference

4. Run API Server with Full Feature Set

5. Scan Real APKs

Enhanced API Endpoints

🔍 Scanning & Analysis

📊 Rules & Risk Assessment

📈 Feedback & Analytics

🔧 Dynamic Analysis (Stubbed)

📏 Metrics & Monitoring

⚡ Performance Features

Project Layout

Multi-Label Risk Categories (initial)

Ensemble Overview

Dynamic Analysis (Stub + Future)

Reporting

Development

Security & Ethics

Next Steps

Docker

Key API Endpoints

CLI

Metrics

Feedback Persistence

Plugin System

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages