Skip to content

Latest commit

 

History

History
176 lines (138 loc) · 4.73 KB

File metadata and controls

176 lines (138 loc) · 4.73 KB

ClamAV Signature Generator - Project Status

Created: 2025-11-14 Status: Phase 1 Complete - Ready for Testing

What's Been Implemented

Core Functionality (Phase 1)

  • Hash Signature Generation

    • MD5 and SHA256 hashing
    • ClamAV .hdb format (MD5-based)
    • ClamAV .hsb format (SHA256-based)
  • String Extraction

    • Shannon entropy calculation
    • Configurable entropy threshold (default: 4.0)
    • Minimum/maximum length filtering
    • Sorted by entropy (highest first)
  • Body Signatures

    • .ndb format generation
    • Hex pattern conversion
    • Multiple signature generation from extracted strings
  • PE File Support

    • Section hash extraction
    • .mdb format signatures
    • pefile library integration
  • CLI Interface

    • hash - Generate hash signatures
    • extract - Extract and analyze strings
    • generate - Generate all signature types
    • from-capa - Placeholder for capa integration
    • validate - Placeholder for validation

Testing & Quality

  • Unit test suite with pytest (7 tests, all passing)
  • Test coverage for core functionality
  • Example scripts demonstrating usage
  • Full project documentation

Project Infrastructure

  • Modern Python packaging (pyproject.toml)
  • Development dependencies (pytest, mypy, pylint, black)
  • Makefile for common tasks
  • Documentation structure
  • Git repository ready

Installation Verification

# Package installed successfully
pip install -e .
# Successfully installed clamav-siggen-0.1.0

# All tests passing
pytest tests/ -v
# 7 passed in 0.04s

# CLI working
clamav-siggen --help
# All commands available

Functional Testing

Hash Generation

$ clamav-siggen hash /tmp/test_sample.bin --name "Test.Malware"
# MD5 Hash
7a3cf2a1badf2011845df945bee64d2f

# SHA256 Hash
a1304402131e0c8d428e2bfb96e4188e90bdbff714a7232b9b7c961652117c2d

# ClamAV .hdb format (MD5)
7a3cf2a1badf2011845df945bee64d2f:55:Test.Malware

# ClamAV .hsb format (SHA256)
a1304402131e0c8d428e2bfb96e4188e90bdbff714a7232b9b7c961652117c2d:55:Test.Malware

String Extraction

$ clamav-siggen extract /tmp/test_sample.bin --min-entropy 2.0 --format text
String                                             Entropy  Length
----------------------------------------------------------------------
This is a test malware sample for signature gen...   3.84       54

Total: 1 strings extracted

Python Library

from clamav_siggen import ClamAVSigGen

siggen = ClamAVSigGen(file_path="malware.exe")
hash_sigs = siggen.generate_hash()
# Works correctly

What's NOT Yet Implemented

Validation Framework

  • Clean file corpus scanning
  • False positive detection
  • Signature quality scoring
  • Needed for: Production use

capa Integration (Phase 2)

  • capa JSON result parser
  • Capability-based signature generation
  • Integration with capa-server API
  • Needed for: capa-server integration

YARA Conversion (Phase 3)

  • YARA rule parser (plyara)
  • YARA to ClamAV conversion
  • Logical signature building
  • Needed for: Advanced features

Next Steps

For Testing

  1. Test with real malware samples

    • Place samples in data/samples/
    • Run: clamav-siggen generate sample.exe --name "Malware.Family"
    • Verify signatures are generated correctly
  2. Test with PE files

    • Use Windows executables
    • Verify PE section hashes are extracted
    • Check .mdb signature format
  3. Validate entropy filtering

    • Experiment with different --min-entropy values
    • Typical range: 3.0 (low quality) to 5.0 (high quality)
    • Default 4.0 is a good starting point

For Development

  1. Implement validation framework

    • Download clean file corpus (e.g., Windows System32 files)
    • Place in data/clean/
    • Implement signature testing against corpus
    • Report false positive rate
  2. Add capa integration

    • Parse capa JSON results
    • Extract capability indicators
    • Generate targeted signatures from capabilities
    • Create quality scoring based on capability uniqueness
  3. Future integration with capa-server

    • Import clamav_siggen as library
    • Add /api/analyses/{id}/generate-signature endpoint
    • Pass capa results to generator
    • Return signatures via API

Known Limitations

  1. String extraction currently only handles ASCII printable characters

    • Unicode strings are not extracted
    • Future: Add Unicode support
  2. Entropy calculation is byte-based Shannon entropy

    • May not catch all interesting patterns
    • Consider adding n-gram analysis
  3. No signature optimization

    • Generates many signatures without prioritization
    • Future: Add quality scoring and ranking
  4. No logical signatures

    • Can't combine multiple patterns with AND/OR logic
    • Future: Implement .ldb format support