Skip to content

Av7danger/DroidIT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ Droidit – ML-Powered Android Security Platform

CI/CD Pipeline Security Rating License Python 3.9+

Enterprise-grade ML-powered Android APK security analysis platform with comprehensive static analysis, ensemble machine learning models, explainable AI, production deployment infrastructure, and advanced security features.

🌟 Key Features

🎯 Multi-Layer Security Analysis

  • Rich Static Analysis: 40+ sophisticated features (permissions, components, certificates, reflection, obfuscation, entropy)
  • Dynamic Analysis Framework: Optional runtime behavior monitoring and API call tracking
  • Rule-Based Heuristics: Interpretable security rules with confidence scoring and performance tracking
  • Feature Fusion: Intelligent combination of static and dynamic analysis results with auto-alignment

πŸ€– Advanced Machine Learning Pipeline

  • Ensemble Models: Random Forest, XGBoost, LightGBM with stacking meta-learner
  • Auto-Feature Alignment: Handles model schema evolution automatically
  • Incremental Learning: Continuous model improvement from user feedback with importance weighting
  • Model Versioning: Complete lifecycle management with JSON manifests and performance tracking
  • SHAP Explainability: Model interpretability with intelligent caching for performance optimization

πŸš€ Production-Ready Architecture

  • Scalable FastAPI Service: Async processing, horizontal scaling, OpenAPI documentation
  • Security Hardened: Encrypted secret management, API key rotation, comprehensive audit trails
  • Comprehensive Monitoring: Prometheus metrics, Grafana dashboards, health monitoring
  • High Availability: Docker orchestration with Redis caching and PostgreSQL persistence
  • Risk Intelligence: Automated LOW/MEDIUM/HIGH/CRITICAL risk tier classification

πŸ”§ Developer & Operations Experience

  • Plugin Architecture: Extensible system with auto-discovery and namespaced features
  • Complete CLI Tools: Training, inference, management, incremental learning commands
  • CI/CD Ready: Automated testing, security scanning, deployment pipelines
  • Rich Reporting: Beautiful HTML reports and structured JSON exports with analytics
  • Load Testing: Performance benchmarking and bottleneck identification

Key Features

🎯 Core Analysis

  • Rich Static Analysis: 40+ features including permissions, exported components, certificates, reflection usage, native libraries, entropy analysis, obfuscation detection
  • Rule-Based Heuristics: Interpretable rules for weak signatures, excessive permissions, obfuscation patterns, dynamic loading
  • Risk Tier Classification: Automatic low/medium/high risk categorization
  • Stubbed Dynamic Analysis: Framework for future API monitoring, network analysis

πŸ€– Machine Learning

  • Ensemble Stacking: RF + XGBoost + LightGBM β†’ per-label LogisticRegression meta-learners
  • Auto Feature Alignment: Handles missing columns gracefully with defaults
  • Model Versioning: JSON manifests with performance metrics and metadata
  • SHAP Explainability: Cached feature importance explanations

πŸš€ Production Ready

  • FastAPI Service: RESTful API with OpenAPI docs, auth, rate limiting, CORS
  • Prometheus Metrics: Scan counts, durations, risk tiers, rule triggers, cache stats
  • Feedback Loop: CSV/SQLAlchemy storage for active learning
  • Plugin System: Auto-discovery for custom feature extractors
  • Enhanced Reporting: HTML + JSON exports with tier distributions and rule usage

Quick Start (Windows)

Recommended: Use WSL2 Ubuntu for smooth androguard install.

# (Optional) In PowerShell – install WSL if not present
wsl --install

1. Environment Setup (WSL bash)

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

2. Generate Synthetic Data & Train Enhanced Model

# Generate synthetic data with all new features
python scripts/synth_generate.py

# Train with expanded feature set and versioning
python -m droidit.cli retrain --regenerate-synth

# Or train with static+dynamic fusion (synthetic dynamic features)
python -m droidit.cli train-fused --include-dynamic --regenerate-synth

3. Run Enhanced Inference

# Includes rule evaluation, risk tiers, JSON export
python scripts/run_infer.py

4. Run API Server with Full Feature Set

uvicorn droidit.api.server:app --reload --port 8000

Open: http://localhost:8000/docs

Configuration:

  • API key: DROIDIT_API_KEY=yourkey β†’ header X-API-Key: yourkey
  • Rate limit: DROIDIT_RATE_LIMIT_PER_MINUTE=60
  • Metrics: DROIDIT_ENABLE_METRICS=1 β†’ scrape /metrics
  • Feedback: DROIDIT_FEEDBACK_ENABLED=1, DROIDIT_DB_URL=sqlite:///feedback.db
  • SHAP explanations: DROIDIT_ENABLE_SHAP=1
  • Plugins: place .py files in plugins/ directory

5. Scan Real APKs

# Extract features from APKs
python scripts/extract_features.py data/apks data/features/features.csv --append

# Run inference with rule evaluation and risk tiers
python scripts/run_infer.py

# Or scan single APK via CLI
python -m droidit.cli scan path/to/app.apk --mode full

POST the APK directly:

curl -X POST -F "file=@/path/app.apk" http://localhost:8000/v1/scan/apk

Enhanced API Endpoints

πŸ” Scanning & Analysis

  • POST /v1/scan/apk - Scan APK with static analysis and optional dynamic features
  • POST /v1/scan/batch - Batch scan multiple APKs
  • POST /v1/explain/apk - SHAP-based feature explanations (cached)

πŸ“Š Rules & Risk Assessment

  • GET /v1/rules - List all registered heuristic rules
  • Rules automatically evaluate: weak signatures, reflection usage, obfuscation, dynamic loading, excessive permissions

πŸ“ˆ Feedback & Analytics

  • POST /v1/feedback/label - Submit ground truth labels for active learning
  • GET /v1/feedback/stats - Aggregated feedback statistics
  • Supports CSV and SQLAlchemy database backends

πŸ”§ Dynamic Analysis (Stubbed)

  • POST /v1/dynamic/logs - Ingest JSONL behavioral logs
  • POST /v1/dynamic/run - Execute dynamic analysis pipeline (synthetic)

πŸ“ Metrics & Monitoring

  • GET /metrics - Prometheus metrics (scans, durations, risk tiers, rule triggers, SHAP cache)
  • Tracks: scan counts, failure rates, risk tier distribution, rule usage frequency

⚑ Performance Features

  • SHAP result caching with hit/miss metrics
  • Auto feature alignment for missing columns
  • Rule-based risk tier classification (low/medium/high)

Project Layout

droidit/
  features/            # Feature extraction modules
  ml/                  # Training, inference, registry
  api/                 # FastAPI server
  reporting/           # HTML/PDF report generation
  utils/               # Helpers (logging, hashing, paths)
  dynamic/frida/       # Frida hook starters
scripts/               # CLI helpers
data/                  # APKs, features, labels, models, reports
tests/                 # Pytest tests (basic)

Multi-Label Risk Categories (initial)

  • label_data_leak
  • label_insecure_crypto
  • label_exported_component

Extend by editing droidit/ml/dataset.py:BASE_LABELS.

Ensemble Overview

  1. Base models: RandomForest, XGBoost, LightGBM (One-vs-Rest per label)
  2. Meta layer: Logistic Regression stacking (prob inputs per label)
  3. Persistence: joblib artifact in data/models/model.joblib

Dynamic Analysis (Stub + Future)

Current build offers:

  • POST /v1/dynamic/logs β†’ ingest JSONL (Frida-style) logs, aggregate API usage.
  • POST /v1/dynamic/run β†’ runs a simulated dynamic pipeline (placeholder) and returns synthetic behavioral features.
  • POST /v1/scan/apk?dynamic=true β†’ combines static scan with stub dynamic feature block under dynamic field.

Planned full pipeline: Instrumentation (Frida), network interception (mitmproxy), behavior timeline β†’ feature fusion.

Reporting

report_generator.py renders an HTML report with model scores, rule-based signals, risk_tier, and top features. Extend with SHAP (already integrated if installed) and PDF export.

Development

pytest -q

Security & Ethics

Only analyze APKs you are authorized to inspect. This tool is for defensive/security review purposes.

Next Steps

  • Dynamic feature ingestion (Frida logs β†’ features)
  • Active learning feedback endpoint
  • SHAP caching & on-demand generation queue
  • Advanced permission/intent graph features
  • Grafana/Prom metrics endpoint

Docker

docker build -t droidit:latest .
docker run -p 8000:8000 -e DROIDIT_ENABLE_SHAP=1 droidit:latest

Key API Endpoints

Purpose Method Path Notes
Health GET /v1/health Status & model info
Labels GET /v1/labels Active risk labels
Single Scan POST /v1/scan/apk file APK multipart; dynamic & explain query flags
Batch Scan POST /v1/scan/zip ZIP of APKs
Explain POST /v1/explain/apk Requires DROIDIT_ENABLE_SHAP=1
Dynamic Logs POST /v1/dynamic/logs JSONL ingestion
Dynamic Run (stub) POST /v1/dynamic/run Simulated dynamic features
Feedback POST /v1/feedback/label {apk_hash,label,value} (0/1)
Feedback Stats GET /v1/feedback/stats Aggregated counts & positive ratios
Rules GET /v1/rules List heuristic rules
Metrics GET /metrics Prometheus exposition

CLI

droidit train                 # Train ensemble
droidit infer                 # Inference using existing features
droidit scan <apk> [--mode static|dynamic|full]
droidit feedback <apk_hash> <label> <value>

Install CLI (editable): pip install -e . (if pyproject present) or invoke module scripts.

Metrics

Exposed Prometheus counters / histograms:

  • droidit_scans_total
  • droidit_scan_failures_total
  • droidit_scan_duration_seconds
  • droidit_feedback_total
  • droidit_plugins_loaded
  • droidit_dynamic_analyses_total
  • droidit_risk_tier_total{tier="low|medium|high"}
  • droidit_rule_score

Feedback Persistence

  • CSV fallback: data/models/feedback.csv
  • SQLAlchemy (if installed & DROIDIT_DB_URL set) β†’ creates feedback table.
  • Future: add retrieval endpoints & active learning retraining loop.

Plugin System

Implement a plugin:

from droidit.plugins import base
class MyPlugin:
  name = "sigscan"
  version = "0.1"
  def extract(self, apk_path: str):
    return {"matches": 3}
base.register(MyPlugin())

Features appear as plugin_sigscan_matches.


Enjoy hacking – iterate fast and expand feature coverage.

About

Droidit is an ML-powered Android security platform offering rich static analysis, heuristic rules, and ensemble ML models with SHAP explainability. It provides risk tiering, CI/CD-ready FastAPI APIs, plugin support, and reporting for enterprise-grade APK security with scalable, production-ready deployment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages