VulnScanner is a modular, extensible vulnerability detection system designed for scanning source code repositories. Built with Python 3.11+, it employs a plugin-based architecture for maximum flexibility and maintainability.
┌─────────────────────────────────────────────────────────────────┐
│ CLI Interface │
│ (scanner command + flags) │
└────────────────┬────────────────────────────────────────────────┘
│
┌────────────────▼────────────────────────────────────────────────┐
│ Orchestration Engine │
│ (Task scheduling, plugin management, results aggregation)│
└──┬──────────┬─────────┬────────┬──────────┬──────────┬────────┘
│ │ │ │ │ │
┌──▼──┐ ┌───▼──┐ ┌───▼──┐ ┌──▼───┐ ┌──▼───┐ ┌───▼────┐
│Repo │ │Tech │ │ SBOM │ │ CVE │ │ SAST │ │Secrets │
│Clone│ │Detect│ │ Gen │ │Match │ │Engine│ │Scanner │
└──┬──┘ └───┬──┘ └───┬──┘ └──┬───┘ └──┬───┘ └───┬────┘
│ │ │ │ │ │
└──────────┴─────────┴────────┴──────────┴──────────┘
│
┌────────▼────────┐
│ Plugin System │
│ (Dynamic loader) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Custom │ │ Community │ │ Third-party│
│ Plugins │ │ Plugins │ │ Integrations│
└───────────┘ └───────────┘ └───────────┘
│
┌────────▼────────┐
│ Data Persistence │
│ (SQLite/PostgreSQL)│
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ JSON │ │ SARIF │ │ HTML │
│ Report │ │ Output │ │ Report │
└───────────┘ └───────────┘ └───────────┘
- Purpose: Entry point for user interaction
- Technologies: Click framework for command parsing
- Commands:
scan: Main scanning command with configurable flagsupdate-advisories: Update CVE/advisory databasesinit-db: Initialize local databaselist-plugins: Show available plugins
- Purpose: Coordinates scanning workflow
- Responsibilities:
- Task scheduling and parallelization
- Plugin lifecycle management
- Result aggregation and deduplication
- Progress tracking and logging
- Purpose: Repository intake and validation
- Features:
- Git clone support (HTTPS/SSH)
- Archive extraction (.zip, .tar.gz)
- Path sanitization
- File filtering and exclusion
- Purpose: Identify technologies, languages, frameworks
- Methods:
- File extension analysis
- Package manifest detection
- Framework fingerprinting
- Build tool identification
- Purpose: Generate Software Bill of Materials
- Supports:
- package.json (Node.js)
- requirements.txt/Pipfile (Python)
- go.mod (Go)
- pom.xml (Java)
- Gemfile (Ruby)
- composer.json (PHP)
- Purpose: Map dependencies to known vulnerabilities
- Data Sources:
- NVD API
- GitHub Advisory Database
- OSV.dev
- Local cache with periodic updates
- Purpose: Static application security testing
- Approach:
- AST-based analysis for supported languages
- Pattern matching for generic detection
- Rule engine for custom checks
- Purpose: Detect hardcoded secrets and PII
- Methods:
- Entropy analysis
- Regular expression patterns
- Contextual validation
- Confidence scoring
- Purpose: Enable extensibility
- Features:
- Dynamic plugin loading
- Standard plugin interface
- Dependency injection
- Result schema validation
- Purpose: Persist scan results and cache
- Backends:
- SQLite (default, local)
- PostgreSQL (optional, scalable)
- Schema: Normalized tables for findings, scans, plugins
- Purpose: Generate various output formats
- Formats:
- JSON (machine-readable)
- SARIF (IDE integration)
- HTML (human-readable)
- Markdown (documentation)
- Input Stage: User provides repository (path/URL/archive)
- Validation: Repository validated and cloned/extracted
- Discovery: Technology detection and SBOM generation
- Analysis: Parallel execution of scanning modules
- Plugin Execution: Custom plugins run with sandboxing
- Aggregation: Results collected and deduplicated
- Prioritization: Findings scored and ranked
- Persistence: Results saved to database
- Reporting: Output generated in requested formats
class PluginInterface(ABC):
@abstractmethod
def scan(self, repo_path: str, metadata: dict) -> List[Finding]:
"""Execute plugin scan logic"""
pass
@abstractmethod
def get_info(self) -> PluginInfo:
"""Return plugin metadata"""
pass- Sandboxing: Plugins run in restricted environment
- No Exfiltration: All network calls are opt-in
- Secret Redaction: Sensitive values masked in reports
- Rate Limiting: Advisory API calls throttled
- Input Validation: All inputs sanitized
- Parallel Processing: Multi-threaded scanning
- Caching: Advisory data and intermediate results
- Incremental Scanning: Delta analysis for repeat scans
- Resource Limits: Configurable timeouts and memory limits
- Config File: YAML-based configuration (
config.yaml) - Environment Variables: Override config values
- CLI Flags: Runtime configuration
- Plugin Config: Per-plugin settings
- Standalone: Direct Python execution
- Docker: Containerized deployment
- CI/CD Integration: GitHub Actions, GitLab CI
- Cloud: AWS Lambda, Google Cloud Run compatible
- Language: Python 3.11+
- CLI Framework: Click
- AST Parsing: ast (Python), esprima (JavaScript)
- Database: SQLAlchemy ORM
- Testing: pytest, coverage
- Linting: ruff, mypy
- Containerization: Docker