Contributing to Contextifier

Thank you for your interest in contributing to Contextifier! This document provides guidelines and instructions for contributing.

Development Setup

1. Clone & Create Environment

git clone https://github.com/your-org/contextifier.git
cd contextifier

python -m venv .venv
source .venv/bin/activate    # Linux/Mac
.venv\Scripts\activate       # Windows

pip install -e ".[dev]"

2. Project Structure

contextifier_new/           # v2 main package
├── document_processor.py   # Facade (public API)
├── config.py               # ProcessingConfig
├── types.py                # Shared types
├── errors.py               # Exception hierarchy
├── handlers/               # 14 format handlers
├── pipeline/               # 5-Stage ABCs
├── services/               # Shared services
├── chunking/               # Chunking subsystem
└── ocr/                    # OCR subsystem

Coding Conventions

General Rules

Python 3.12+ syntax
Type hints required on all public APIs
Docstrings required (Google style)
from __future__ import annotations at the top of every module

Architecture Rules

All handlers must follow the 5-stage pipeline:
- Converter → Preprocessor → MetadataExtractor → ContentExtractor → Postprocessor
- BaseHandler.process() enforces execution order — implement each stage only.
Do not create services directly:
- TagService, ImageService, etc. are created by DocumentProcessor and injected.
- Handlers access them via self._services["tag_service"], etc.
Pass all settings through ProcessingConfig:
- No hardcoded magic numbers.
- If you need a new setting, add a field to the appropriate *Config class.
Respect the Facade pattern:
- The only user-facing API is DocumentProcessor.
- Do not instruct users to import internal modules directly (OCR engines excepted).

Adding a New Handler

1. Create Directory

contextifier_new/handlers/myformat/
├── __init__.py
├── converter.py
├── preprocessor.py
├── metadata_extractor.py
├── content_extractor.py
└── postprocessor.py

2. Implement Each Pipeline Stage

# converter.py
from contextifier_new.pipeline.converter import BaseConverter

class MyFormatConverter(BaseConverter):
    def convert(self, file_context, **kwargs):
        # Binary → Format-specific object
        return parsed_object

3. Register the Handler

Add to contextifier_new/handlers/registry.py in register_defaults():

from contextifier_new.handlers.myformat import MyFormatHandler
self.register(MyFormatHandler, extensions=["myf", "myformat"])

Commit Convention

feat: add new feature
fix: bug fix
docs: documentation changes
refactor: refactoring (no behavior change)
test: add/modify tests
chore: build/config changes

Examples:

feat(handler): add EPUB handler with full pipeline
fix(chunking): preserve table structure in protected strategy
docs: update QUICKSTART with batch processing example

Pull Request Guide

Create a feature branch from main
Implement changes and test
Include rationale and test results in PR description
Squash merge after review

Reporting Issues

When reporting a bug, please include:

Python version
OS and version
Input file format and size
Full error message
Reproduction code (if possible)

License

All contributions are released under the project's Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to Contextifier

Development Setup

1. Clone & Create Environment

2. Project Structure

Coding Conventions

General Rules

Architecture Rules

Adding a New Handler

1. Create Directory

2. Implement Each Pipeline Stage

3. Register the Handler

Commit Convention

Pull Request Guide

Reporting Issues

License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to Contextifier

Development Setup

1. Clone & Create Environment

2. Project Structure

Coding Conventions

General Rules

Architecture Rules

Adding a New Handler

1. Create Directory

2. Implement Each Pipeline Stage

3. Register the Handler

Commit Convention

Pull Request Guide

Reporting Issues

License