Contributing to AnonShield

Thank you for your interest in contributing. This document covers everything you need to get started: setting up the development environment, coding conventions, how to add new features, and the pull-request process.

Development Setup
Project Structure
Coding Conventions
Running Tests
Adding New Features
Commit Message Convention
Pull Request Process
Reporting Bugs
License

1. Development Setup

Prerequisites: Python 3.12+, uv, Git.

# Clone the repository
git clone https://github.com/AnonShield/tool.git
cd tool

# Install all dependencies using uv
uv sync

# Verify the CLI works
uv run anon.py --help

For GPU support, ensure CUDA 12.x drivers are installed. The cupy-cuda12x and torch packages are declared in pyproject.toml and installed automatically via uv sync.

Alternatively, use Docker (see Makefile):

make build       # CPU image
make build-gpu   # GPU image
make shell       # interactive shell inside container

2. Project Structure

anon.py                   # CLI entry point (composition root)
src/anon/                 # Core library
  core/protocols.py       # Protocol interfaces (EntityStorage, CacheStrategy, …)
  engine.py               # AnonymizationOrchestrator
  strategies.py           # Built-in anonymization strategies
  processors.py           # File-format processors
  entity_detector.py      # NER (spaCy + Transformers)
  slm/                    # Small Language Model integration
scripts/                  # Utility/analysis scripts
tests/                    # Test suite (unittest)
benchmark/                # Benchmarking suite
docs/developers/          # Developer documentation
examples/                 # Sample configs and documents
docker/                   # Dockerfile and docker-compose

See docs/developers/ARCHITECTURE.md for the full module map and data-flow diagram.

3. Coding Conventions

Style

Naming conventions: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for module-level constants.
Line length: keep lines under 100 characters.
Type annotations on all public functions and methods (params and return type).
Docstrings on all new public classes and methods; use the Google style (Args:, Returns:, Raises:) when the method has non-trivial parameters or return values.

Patterns

Respect the established patterns — do not work around them:

Pattern	Where
Strategy	`AnonymizationStrategy` ABC in `strategies.py`
Template Method	`FileProcessor` ABC in `processors.py`
Repository	`EntityRepository` in `repository.py`
Dependency Injection	`AnonymizationOrchestrator.__init__()` parameters
Protocol-based inversion	`src/anon/core/protocols.py`

New extension points must follow these patterns. See docs/developers/EXTENSIBILITY.md for worked examples for every extension point.

Security

Never hard-code secrets or PII in tests or examples.
Supply the HMAC key via ANON_SECRET_KEY (value) or ANON_SECRET_KEY_FILE (path to a file containing the key); key loading is handled by src/anon/security.py.
Avoid eval, exec, subprocess with user-controlled strings, and any form of shell injection.

4. Running Tests

Tests use Python's standard unittest library. Run them with:

# Run all tests
uv run python -m unittest discover -s tests/

# Run a specific test file
uv run python -m unittest tests.test_security

Inside Docker:

make test

Requirements for new code:

Tests for all new public functions and classes.
Tests for new file processors and strategies should use the sample files in examples/.

5. Adding New Features

The most common extension points are:

What you want to add	Where to look
New anonymization strategy	Section 2 of EXTENSIBILITY.md
New file format processor	Section 3 of EXTENSIBILITY.md
New entity type / regex	Section 4 of EXTENSIBILITY.md
New transformer model	Section 5 of EXTENSIBILITY.md
Custom cache / hash / storage	Sections 6–8 of EXTENSIBILITY.md
New SLM backend	Section 11 of EXTENSIBILITY.md

For larger changes (new strategies, new processors), open an issue first to discuss the approach before writing code.

6. Commit Message Convention

Use a short type prefix followed by a description:

<type>: <short description>

Types:

Type	Use for
`feat`	New feature or extension point
`fix`	Bug fix
`perf`	Performance improvement
`refactor`	Code restructuring with no behaviour change
`test`	Adding or improving tests
`docs`	Documentation only
`chore`	Build, CI, dependency updates

Examples:

feat: add ODS file processor

fix: handle empty text chunks in fallback path

docs: add worked example for custom SLM client

Keep the subject line under 72 characters. Use the commit body for motivation and context when needed.

7. Pull Request Process

Branch off main with a descriptive name: feat/xml-streaming, fix/csv-empty-header, docs/slm-guide-update.
Write tests that cover the change. All existing tests must continue to pass.
Update documentation — if you change a public interface or add an extension point, update the relevant file in docs/developers/.
Open the PR against main. Fill in the PR description:
- What problem does this solve?
- How was it tested?
- Any breaking changes?

8. Reporting Bugs

Please include:

AnonShield version / git commit hash.
Python version and OS.
Minimal command that reproduces the issue (redact any real PII from inputs).
Full traceback or error output.
Expected vs actual behaviour.

9. License

By contributing you agree that your contributions will be licensed under the GNU General Public License v3.0 that covers this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to AnonShield

Table of Contents

1. Development Setup

2. Project Structure

3. Coding Conventions

Style

Patterns

Security

4. Running Tests

5. Adding New Features

6. Commit Message Convention

7. Pull Request Process

8. Reporting Bugs

9. License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to AnonShield

Table of Contents

1. Development Setup

2. Project Structure

3. Coding Conventions

Style

Patterns

Security

4. Running Tests

5. Adding New Features

6. Commit Message Convention

7. Pull Request Process

8. Reporting Bugs

9. License