Enterprise-grade ML-powered Android APK security analysis platform with comprehensive static analysis, ensemble machine learning models, explainable AI, production deployment infrastructure, and advanced security features.
- Rich Static Analysis: 40+ sophisticated features (permissions, components, certificates, reflection, obfuscation, entropy)
- Dynamic Analysis Framework: Optional runtime behavior monitoring and API call tracking
- Rule-Based Heuristics: Interpretable security rules with confidence scoring and performance tracking
- Feature Fusion: Intelligent combination of static and dynamic analysis results with auto-alignment
- Ensemble Models: Random Forest, XGBoost, LightGBM with stacking meta-learner
- Auto-Feature Alignment: Handles model schema evolution automatically
- Incremental Learning: Continuous model improvement from user feedback with importance weighting
- Model Versioning: Complete lifecycle management with JSON manifests and performance tracking
- SHAP Explainability: Model interpretability with intelligent caching for performance optimization
- Scalable FastAPI Service: Async processing, horizontal scaling, OpenAPI documentation
- Security Hardened: Encrypted secret management, API key rotation, comprehensive audit trails
- Comprehensive Monitoring: Prometheus metrics, Grafana dashboards, health monitoring
- High Availability: Docker orchestration with Redis caching and PostgreSQL persistence
- Risk Intelligence: Automated LOW/MEDIUM/HIGH/CRITICAL risk tier classification
- Plugin Architecture: Extensible system with auto-discovery and namespaced features
- Complete CLI Tools: Training, inference, management, incremental learning commands
- CI/CD Ready: Automated testing, security scanning, deployment pipelines
- Rich Reporting: Beautiful HTML reports and structured JSON exports with analytics
- Load Testing: Performance benchmarking and bottleneck identification
- Rich Static Analysis: 40+ features including permissions, exported components, certificates, reflection usage, native libraries, entropy analysis, obfuscation detection
- Rule-Based Heuristics: Interpretable rules for weak signatures, excessive permissions, obfuscation patterns, dynamic loading
- Risk Tier Classification: Automatic low/medium/high risk categorization
- Stubbed Dynamic Analysis: Framework for future API monitoring, network analysis
- Ensemble Stacking: RF + XGBoost + LightGBM β per-label LogisticRegression meta-learners
- Auto Feature Alignment: Handles missing columns gracefully with defaults
- Model Versioning: JSON manifests with performance metrics and metadata
- SHAP Explainability: Cached feature importance explanations
- FastAPI Service: RESTful API with OpenAPI docs, auth, rate limiting, CORS
- Prometheus Metrics: Scan counts, durations, risk tiers, rule triggers, cache stats
- Feedback Loop: CSV/SQLAlchemy storage for active learning
- Plugin System: Auto-discovery for custom feature extractors
- Enhanced Reporting: HTML + JSON exports with tier distributions and rule usage
Recommended: Use WSL2 Ubuntu for smooth androguard install.
# (Optional) In PowerShell β install WSL if not present
wsl --installpython3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt# Generate synthetic data with all new features
python scripts/synth_generate.py
# Train with expanded feature set and versioning
python -m droidit.cli retrain --regenerate-synth
# Or train with static+dynamic fusion (synthetic dynamic features)
python -m droidit.cli train-fused --include-dynamic --regenerate-synth# Includes rule evaluation, risk tiers, JSON export
python scripts/run_infer.pyuvicorn droidit.api.server:app --reload --port 8000Open: http://localhost:8000/docs
Configuration:
- API key:
DROIDIT_API_KEY=yourkeyβ headerX-API-Key: yourkey - Rate limit:
DROIDIT_RATE_LIMIT_PER_MINUTE=60 - Metrics:
DROIDIT_ENABLE_METRICS=1β scrape/metrics - Feedback:
DROIDIT_FEEDBACK_ENABLED=1,DROIDIT_DB_URL=sqlite:///feedback.db - SHAP explanations:
DROIDIT_ENABLE_SHAP=1 - Plugins: place
.pyfiles inplugins/directory
# Extract features from APKs
python scripts/extract_features.py data/apks data/features/features.csv --append
# Run inference with rule evaluation and risk tiers
python scripts/run_infer.py
# Or scan single APK via CLI
python -m droidit.cli scan path/to/app.apk --mode fullPOST the APK directly:
curl -X POST -F "file=@/path/app.apk" http://localhost:8000/v1/scan/apkPOST /v1/scan/apk- Scan APK with static analysis and optional dynamic featuresPOST /v1/scan/batch- Batch scan multiple APKsPOST /v1/explain/apk- SHAP-based feature explanations (cached)
GET /v1/rules- List all registered heuristic rules- Rules automatically evaluate: weak signatures, reflection usage, obfuscation, dynamic loading, excessive permissions
POST /v1/feedback/label- Submit ground truth labels for active learningGET /v1/feedback/stats- Aggregated feedback statistics- Supports CSV and SQLAlchemy database backends
POST /v1/dynamic/logs- Ingest JSONL behavioral logsPOST /v1/dynamic/run- Execute dynamic analysis pipeline (synthetic)
GET /metrics- Prometheus metrics (scans, durations, risk tiers, rule triggers, SHAP cache)- Tracks: scan counts, failure rates, risk tier distribution, rule usage frequency
- SHAP result caching with hit/miss metrics
- Auto feature alignment for missing columns
- Rule-based risk tier classification (low/medium/high)
droidit/
features/ # Feature extraction modules
ml/ # Training, inference, registry
api/ # FastAPI server
reporting/ # HTML/PDF report generation
utils/ # Helpers (logging, hashing, paths)
dynamic/frida/ # Frida hook starters
scripts/ # CLI helpers
data/ # APKs, features, labels, models, reports
tests/ # Pytest tests (basic)
label_data_leaklabel_insecure_cryptolabel_exported_component
Extend by editing droidit/ml/dataset.py:BASE_LABELS.
- Base models: RandomForest, XGBoost, LightGBM (One-vs-Rest per label)
- Meta layer: Logistic Regression stacking (prob inputs per label)
- Persistence:
joblibartifact indata/models/model.joblib
Current build offers:
POST /v1/dynamic/logsβ ingest JSONL (Frida-style) logs, aggregate API usage.POST /v1/dynamic/runβ runs a simulated dynamic pipeline (placeholder) and returns synthetic behavioral features.POST /v1/scan/apk?dynamic=trueβ combines static scan with stub dynamic feature block underdynamicfield.
Planned full pipeline: Instrumentation (Frida), network interception (mitmproxy), behavior timeline β feature fusion.
report_generator.py renders an HTML report with model scores, rule-based signals, risk_tier, and top features. Extend with SHAP (already integrated if installed) and PDF export.
pytest -qOnly analyze APKs you are authorized to inspect. This tool is for defensive/security review purposes.
- Dynamic feature ingestion (Frida logs β features)
- Active learning feedback endpoint
- SHAP caching & on-demand generation queue
- Advanced permission/intent graph features
- Grafana/Prom metrics endpoint
docker build -t droidit:latest .
docker run -p 8000:8000 -e DROIDIT_ENABLE_SHAP=1 droidit:latest| Purpose | Method | Path | Notes |
|---|---|---|---|
| Health | GET | /v1/health |
Status & model info |
| Labels | GET | /v1/labels |
Active risk labels |
| Single Scan | POST | /v1/scan/apk |
file APK multipart; dynamic & explain query flags |
| Batch Scan | POST | /v1/scan/zip |
ZIP of APKs |
| Explain | POST | /v1/explain/apk |
Requires DROIDIT_ENABLE_SHAP=1 |
| Dynamic Logs | POST | /v1/dynamic/logs |
JSONL ingestion |
| Dynamic Run (stub) | POST | /v1/dynamic/run |
Simulated dynamic features |
| Feedback | POST | /v1/feedback/label |
{apk_hash,label,value} (0/1) |
| Feedback Stats | GET | /v1/feedback/stats |
Aggregated counts & positive ratios |
| Rules | GET | /v1/rules |
List heuristic rules |
| Metrics | GET | /metrics |
Prometheus exposition |
droidit train # Train ensemble
droidit infer # Inference using existing features
droidit scan <apk> [--mode static|dynamic|full]
droidit feedback <apk_hash> <label> <value>
Install CLI (editable): pip install -e . (if pyproject present) or invoke module scripts.
Exposed Prometheus counters / histograms:
droidit_scans_totaldroidit_scan_failures_totaldroidit_scan_duration_secondsdroidit_feedback_totaldroidit_plugins_loadeddroidit_dynamic_analyses_totaldroidit_risk_tier_total{tier="low|medium|high"}droidit_rule_score
- CSV fallback:
data/models/feedback.csv - SQLAlchemy (if installed &
DROIDIT_DB_URLset) β createsfeedbacktable. - Future: add retrieval endpoints & active learning retraining loop.
Implement a plugin:
from droidit.plugins import base
class MyPlugin:
name = "sigscan"
version = "0.1"
def extract(self, apk_path: str):
return {"matches": 3}
base.register(MyPlugin())Features appear as plugin_sigscan_matches.
Enjoy hacking β iterate fast and expand feature coverage.