Skip to content

Latest commit

 

History

History
206 lines (176 loc) · 8.41 KB

File metadata and controls

206 lines (176 loc) · 8.41 KB

VulnScanner Architecture Documentation

Overview

VulnScanner is a modular, extensible vulnerability detection system designed for scanning source code repositories. Built with Python 3.11+, it employs a plugin-based architecture for maximum flexibility and maintainability.

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         CLI Interface                           │
│                    (scanner command + flags)                    │
└────────────────┬────────────────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────────────────┐
│                      Orchestration Engine                       │
│         (Task scheduling, plugin management, results aggregation)│
└──┬──────────┬─────────┬────────┬──────────┬──────────┬────────┘
   │          │         │        │          │          │
┌──▼──┐  ┌───▼──┐  ┌───▼──┐ ┌──▼───┐  ┌──▼───┐  ┌───▼────┐
│Repo │  │Tech  │  │ SBOM │ │ CVE  │  │ SAST │  │Secrets │
│Clone│  │Detect│  │ Gen  │ │Match │  │Engine│  │Scanner │
└──┬──┘  └───┬──┘  └───┬──┘ └──┬───┘  └──┬───┘  └───┬────┘
   │          │         │        │          │          │
   └──────────┴─────────┴────────┴──────────┴──────────┘
                           │
                  ┌────────▼────────┐
                  │  Plugin System   │
                  │ (Dynamic loader) │
                  └────────┬────────┘
                           │
            ┌──────────────┼──────────────┐
            │              │              │
      ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
      │ Custom    │ │ Community │ │ Third-party│
      │ Plugins   │ │ Plugins   │ │ Integrations│
      └───────────┘ └───────────┘ └───────────┘
                           │
                  ┌────────▼────────┐
                  │ Data Persistence │
                  │  (SQLite/PostgreSQL)│
                  └────────┬────────┘
                           │
            ┌──────────────┼──────────────┐
            │              │              │
      ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
      │   JSON    │ │   SARIF   │ │   HTML    │
      │  Report   │ │  Output   │ │  Report   │
      └───────────┘ └───────────┘ └───────────┘

Core Components

1. CLI Interface (cli/)

  • Purpose: Entry point for user interaction
  • Technologies: Click framework for command parsing
  • Commands:
    • scan: Main scanning command with configurable flags
    • update-advisories: Update CVE/advisory databases
    • init-db: Initialize local database
    • list-plugins: Show available plugins

2. Orchestration Engine (core/engine.py)

  • Purpose: Coordinates scanning workflow
  • Responsibilities:
    • Task scheduling and parallelization
    • Plugin lifecycle management
    • Result aggregation and deduplication
    • Progress tracking and logging

3. Repository Handler (core/repo.py)

  • Purpose: Repository intake and validation
  • Features:
    • Git clone support (HTTPS/SSH)
    • Archive extraction (.zip, .tar.gz)
    • Path sanitization
    • File filtering and exclusion

4. Technology Detection (modules/tech_detect.py)

  • Purpose: Identify technologies, languages, frameworks
  • Methods:
    • File extension analysis
    • Package manifest detection
    • Framework fingerprinting
    • Build tool identification

5. SBOM Generator (modules/sbom.py)

  • Purpose: Generate Software Bill of Materials
  • Supports:
    • package.json (Node.js)
    • requirements.txt/Pipfile (Python)
    • go.mod (Go)
    • pom.xml (Java)
    • Gemfile (Ruby)
    • composer.json (PHP)

6. CVE Matcher (modules/cve_matcher.py)

  • Purpose: Map dependencies to known vulnerabilities
  • Data Sources:
    • NVD API
    • GitHub Advisory Database
    • OSV.dev
    • Local cache with periodic updates

7. SAST Engine (modules/sast/)

  • Purpose: Static application security testing
  • Approach:
    • AST-based analysis for supported languages
    • Pattern matching for generic detection
    • Rule engine for custom checks

8. Secrets Scanner (modules/secrets.py)

  • Purpose: Detect hardcoded secrets and PII
  • Methods:
    • Entropy analysis
    • Regular expression patterns
    • Contextual validation
    • Confidence scoring

9. Plugin System (core/plugin_manager.py)

  • Purpose: Enable extensibility
  • Features:
    • Dynamic plugin loading
    • Standard plugin interface
    • Dependency injection
    • Result schema validation

10. Data Layer (core/database.py)

  • Purpose: Persist scan results and cache
  • Backends:
    • SQLite (default, local)
    • PostgreSQL (optional, scalable)
  • Schema: Normalized tables for findings, scans, plugins

11. Report Generator (core/reporters/)

  • Purpose: Generate various output formats
  • Formats:
    • JSON (machine-readable)
    • SARIF (IDE integration)
    • HTML (human-readable)
    • Markdown (documentation)

Data Flow

  1. Input Stage: User provides repository (path/URL/archive)
  2. Validation: Repository validated and cloned/extracted
  3. Discovery: Technology detection and SBOM generation
  4. Analysis: Parallel execution of scanning modules
  5. Plugin Execution: Custom plugins run with sandboxing
  6. Aggregation: Results collected and deduplicated
  7. Prioritization: Findings scored and ranked
  8. Persistence: Results saved to database
  9. Reporting: Output generated in requested formats

Plugin Interface

class PluginInterface(ABC):
    @abstractmethod
    def scan(self, repo_path: str, metadata: dict) -> List[Finding]:
        """Execute plugin scan logic"""
        pass

    @abstractmethod
    def get_info(self) -> PluginInfo:
        """Return plugin metadata"""
        pass

Security Considerations

  • Sandboxing: Plugins run in restricted environment
  • No Exfiltration: All network calls are opt-in
  • Secret Redaction: Sensitive values masked in reports
  • Rate Limiting: Advisory API calls throttled
  • Input Validation: All inputs sanitized

Performance Optimizations

  • Parallel Processing: Multi-threaded scanning
  • Caching: Advisory data and intermediate results
  • Incremental Scanning: Delta analysis for repeat scans
  • Resource Limits: Configurable timeouts and memory limits

Configuration

  • Config File: YAML-based configuration (config.yaml)
  • Environment Variables: Override config values
  • CLI Flags: Runtime configuration
  • Plugin Config: Per-plugin settings

Deployment Options

  1. Standalone: Direct Python execution
  2. Docker: Containerized deployment
  3. CI/CD Integration: GitHub Actions, GitLab CI
  4. Cloud: AWS Lambda, Google Cloud Run compatible

Technology Stack

  • Language: Python 3.11+
  • CLI Framework: Click
  • AST Parsing: ast (Python), esprima (JavaScript)
  • Database: SQLAlchemy ORM
  • Testing: pytest, coverage
  • Linting: ruff, mypy
  • Containerization: Docker