Thank you for your interest in contributing. This document covers everything you need to get started: setting up the development environment, coding conventions, how to add new features, and the pull-request process.
- Development Setup
- Project Structure
- Coding Conventions
- Running Tests
- Adding New Features
- Commit Message Convention
- Pull Request Process
- Reporting Bugs
- License
Prerequisites: Python 3.12+, uv, Git.
# Clone the repository
git clone https://github.com/AnonShield/tool.git
cd tool
# Install all dependencies using uv
uv sync
# Verify the CLI works
uv run anon.py --helpFor GPU support, ensure CUDA 12.x drivers are installed. The cupy-cuda12x and torch packages are declared in pyproject.toml and installed automatically via uv sync.
Alternatively, use Docker (see Makefile):
make build # CPU image
make build-gpu # GPU image
make shell # interactive shell inside containeranon.py # CLI entry point (composition root)
src/anon/ # Core library
core/protocols.py # Protocol interfaces (EntityStorage, CacheStrategy, …)
engine.py # AnonymizationOrchestrator
strategies.py # Built-in anonymization strategies
processors.py # File-format processors
entity_detector.py # NER (spaCy + Transformers)
slm/ # Small Language Model integration
scripts/ # Utility/analysis scripts
tests/ # Test suite (unittest)
benchmark/ # Benchmarking suite
docs/developers/ # Developer documentation
examples/ # Sample configs and documents
docker/ # Dockerfile and docker-compose
See docs/developers/ARCHITECTURE.md for the full module map and data-flow diagram.
- Naming conventions: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for module-level constants.
- Line length: keep lines under 100 characters.
- Type annotations on all public functions and methods (params and return type).
- Docstrings on all new public classes and methods; use the Google style (
Args:,Returns:,Raises:) when the method has non-trivial parameters or return values.
Respect the established patterns — do not work around them:
| Pattern | Where |
|---|---|
| Strategy | AnonymizationStrategy ABC in strategies.py |
| Template Method | FileProcessor ABC in processors.py |
| Repository | EntityRepository in repository.py |
| Dependency Injection | AnonymizationOrchestrator.__init__() parameters |
| Protocol-based inversion | src/anon/core/protocols.py |
New extension points must follow these patterns. See docs/developers/EXTENSIBILITY.md for worked examples for every extension point.
- Never hard-code secrets or PII in tests or examples.
- Supply the HMAC key via
ANON_SECRET_KEY(value) orANON_SECRET_KEY_FILE(path to a file containing the key); key loading is handled bysrc/anon/security.py. - Avoid
eval,exec,subprocesswith user-controlled strings, and any form of shell injection.
Tests use Python's standard unittest library. Run them with:
# Run all tests
uv run python -m unittest discover -s tests/
# Run a specific test file
uv run python -m unittest tests.test_securityInside Docker:
make testRequirements for new code:
- Tests for all new public functions and classes.
- Tests for new file processors and strategies should use the sample files in
examples/.
The most common extension points are:
| What you want to add | Where to look |
|---|---|
| New anonymization strategy | Section 2 of EXTENSIBILITY.md |
| New file format processor | Section 3 of EXTENSIBILITY.md |
| New entity type / regex | Section 4 of EXTENSIBILITY.md |
| New transformer model | Section 5 of EXTENSIBILITY.md |
| Custom cache / hash / storage | Sections 6–8 of EXTENSIBILITY.md |
| New SLM backend | Section 11 of EXTENSIBILITY.md |
For larger changes (new strategies, new processors), open an issue first to discuss the approach before writing code.
Use a short type prefix followed by a description:
<type>: <short description>
Types:
| Type | Use for |
|---|---|
feat |
New feature or extension point |
fix |
Bug fix |
perf |
Performance improvement |
refactor |
Code restructuring with no behaviour change |
test |
Adding or improving tests |
docs |
Documentation only |
chore |
Build, CI, dependency updates |
Examples:
feat: add ODS file processor
fix: handle empty text chunks in fallback path
docs: add worked example for custom SLM client
Keep the subject line under 72 characters. Use the commit body for motivation and context when needed.
- Branch off
mainwith a descriptive name:feat/xml-streaming,fix/csv-empty-header,docs/slm-guide-update. - Write tests that cover the change. All existing tests must continue to pass.
- Update documentation — if you change a public interface or add an extension point, update the relevant file in
docs/developers/. - Open the PR against
main. Fill in the PR description:- What problem does this solve?
- How was it tested?
- Any breaking changes?
Please include:
- AnonShield version / git commit hash.
- Python version and OS.
- Minimal command that reproduces the issue (redact any real PII from inputs).
- Full traceback or error output.
- Expected vs actual behaviour.
By contributing you agree that your contributions will be licensed under the GNU General Public License v3.0 that covers this project.