Contributing to Mellea

Thank you for your interest in contributing to Mellea! This guide will help you get started with developing and contributing to the project.

Contribution Pathways

There are several ways to contribute to Mellea:

1. Contributing to This Repository

Contribute to the Mellea core, standard library, or fix bugs. This includes:

Core features and bug fixes
Standard library components (Requirements, Components, Sampling Strategies)
Backend improvements and integrations
Documentation and examples
Tests and CI/CD improvements

Process: See the Pull Request Process section below for detailed steps.

2. Applications & Libraries

Build tools and applications using Mellea. These can be hosted in your own repository. For observability, use a mellea- prefix.

Examples:

github.com/my-company/mellea-legal-utils
github.com/my-username/mellea-swe-agent

3. Community Components

Contribute experimental or specialized components to mellea-contribs.

Note: For general-purpose Components, Requirements, or Sampling Strategies, please open an issue first to discuss whether they should go in the standard library (this repository) or mellea-contribs.

Code of Conduct

This project adheres to the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to melleaadmin@ibm.com.

Getting Started

Prerequisites

Python 3.10 or higher (3.13+ requires Rust compiler for outlines)
uv (recommended) or conda/mamba
Ollama with required models (for local testing)

Installation with `uv` (Recommended)

Fork and clone the repository:

git clone ssh://git@github.com/<your-username>/mellea.git
cd mellea/

Setup virtual environment:

uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

# Install all dependencies (recommended for development)
uv sync --all-extras --all-groups

# Or install just the backend dependencies
uv sync --extra backends --all-groups

Install pre-commit hooks (Required):
```
pre-commit install
```
Note: Some hooks require tools in dev dependency groups to be on your PATH. Activate the virtual environment before committing to ensure they are available:
```
source .venv/bin/activate
```

Installation with `conda`/`mamba`

Fork and clone the repository:

git clone ssh://git@github.com/<your-username>/mellea.git
cd mellea/

Run the installation script:
```
conda/install.sh
```

This script handles environment setup, dependency installation, and pre-commit hook installation.

Verify Installation

# Start Ollama (required for most tests)
ollama serve

# Run fast tests (skip qualitative tests, ~2 min)
uv run pytest -m "not qualitative"

Directory Structure

Path	Contents
`mellea/core`	Core abstractions: Backend, Base, Formatter, Requirement, Sampling
`mellea/stdlib`	Standard library: Session, Context, Components, Requirements, Sampling, Intrinsics, Tools
`mellea/backends`	Backend providers: HF, OpenAI, Ollama, Watsonx, LiteLLM
`mellea/formatters`	Output formatters and parsers
`mellea/helpers`	Utilities, logging, model ID tables
`mellea/templates`	Jinja2 templates for prompts
`cli/`	CLI commands (`m serve`, `m alora`, `m decompose`, `m eval`)
`test/`	All tests (run from repo root)
`docs/`	Documentation, examples, tutorials

Coding Standards

Type Annotations

Required on all core functions:

def process_text(text: str, max_length: int = 100) -> str:
    """Process text with maximum length."""
    return text[:max_length]

Docstrings

Docstrings are prompts - the LLM reads them, so be specific.

Use Google-style docstrings:

def extract_entities(text: str, entity_types: list[str]) -> dict[str, list[str]]:
    """Extract named entities from text.

    Args:
        text: The input text to analyze.
        entity_types: List of entity types to extract (e.g., ["PERSON", "ORG"]).

    Returns:
        Dictionary mapping entity types to lists of extracted entities.

    Example:
        >>> extract_entities("Alice works at IBM", ["PERSON", "ORG"])
        {"PERSON": ["Alice"], "ORG": ["IBM"]}
    """
    ...

Class and `init` docstrings

Place Args: on the class docstring only. The __init__ docstring should be a single summary sentence with no Args: section. This keeps hover docs clean in IDEs and ensures the docs pipeline (which skips __init__) publishes the full parameter list.

class MyComponent(Component[str]):
    """A component that does something useful.

    Args:
        name (str): Human-readable label for this component.
        max_tokens (int): Upper bound on generated tokens.
    """

    def __init__(self, name: str, max_tokens: int = 256) -> None:
        """Initialize MyComponent with a name and token budget."""
        self.name = name
        self.max_tokens = max_tokens

Add an Attributes: section on the class docstring only when a stored attribute differs in type or behaviour from the constructor input — for example, when a str argument is wrapped into a CBlock, or when a class-level constant is relevant to callers. Pure-echo entries that repeat Args: verbatim should be omitted.

TypedDict classes are a special case. Their fields are the entire public contract, so when an Attributes: section is present it must exactly match the declared fields. The audit will flag:

typeddict_phantom — Attributes: documents a field that is not declared in the TypedDict
typeddict_undocumented — a declared field is absent from the Attributes: section

class ConstraintResult(TypedDict):
    """Result of a constraint check.

    Attributes:
        passed: Whether the constraint was satisfied.
        reason: Human-readable explanation.
    """
    passed: bool
    reason: str

Validating docstrings

Run the coverage and quality audit to check your changes before committing:

# Build fresh API docs then audit quality (documented symbols only)
uv run python tooling/docs-autogen/generate-ast.py
uv run python tooling/docs-autogen/audit_coverage.py \
    --quality --no-methods --docs-dir docs/docs/api

Key checks the audit enforces:

Check	Meaning
`no_class_args`	Class has typed `__init__` params but no `Args:` on the class docstring
`duplicate_init_args`	`Args:` appears in both the class and `__init__` docstrings (Option C violation)
`no_args`	Standalone function has params but no `Args:` section
`no_returns`	Function has a non-trivial return annotation but no `Returns:` section
`param_mismatch`	`Args:` documents names not present in the actual signature
`typeddict_phantom`	`TypedDict` `Attributes:` documents a field not declared in the class
`typeddict_undocumented`	`TypedDict` has a declared field absent from its `Attributes:` section

IDE hover verification — open any of these existing classes in VS Code and hover over the class name or a constructor call to confirm the hover card shows Args: once with no duplication:

ReactInitiator (mellea/stdlib/components/react.py) — Args: + Attributes: (goal: str → CBlock transform)
BaseSamplingStrategy (mellea/stdlib/sampling/base.py) — Args: only, no Attributes: (pure-echo removed)
TokenToFloat (mellea/formatters/granite/intrinsics/output.py) — Attributes: for YAML_NAME class constant

Code Style

Ruff for linting and formatting
Use ... in @generative function bodies
Prefer primitives over classes for simplicity
Keep functions focused and single-purpose
Avoid over-engineering

Formatting and Linting

# Format code
uv run ruff format .

# Lint code
uv run ruff check .

# Fix auto-fixable issues
uv run ruff check --fix .

# Type check
uv run mypy .

Development Workflow

Commit Messages

Follow Angular commit format:

<type>: <subject>

<body>

<footer>

Types: feat, fix, docs, test, refactor, release

Example:

feat: add support for streaming responses

Implements streaming for all backend types with proper
error handling and timeout management.

Closes #123

Important: Always sign off commits using -s or --signoff:

git commit -s -m "feat: your commit message"

AI Coding Assistants

AI-assisted development is welcome. You are responsible for reviewing and understanding every change before submitting.

AI coding assistants following project guidelines add an Assisted-by: trailer to commit messages by default, identifying which tool was used:

Assisted-by: Claude Code
Assisted-by: IBM Bob

Add one line per tool used, using its common name (GitHub Copilot, Cursor, etc.).

Pre-commit Hooks

Pre-commit hooks run automatically before each commit and check:

Ruff - Linting and formatting
mypy - Type checking
uv-lock - Dependency lock file sync
codespell - Spell checking

Bypass hooks (for intermediate commits):

git commit -n -m "wip: intermediate work"

Run hooks manually:

pre-commit run --all-files

⚠️ Warning: pre-commit --all-files may take several minutes. Don't cancel mid-run as it can corrupt state.

Pull Request Process

Create an issue describing your change (if not already exists)
Fork the repository (if you haven't already)
Create a branch in your fork using appropriate naming
Make your changes following coding standards
Add tests for new functionality
Run the test suite to ensure everything passes
Update documentation as needed
Push to your fork and create a pull request to the main repository
Follow the automated PR workflow instructions

Testing

Quick Reference

# Install all dependencies (required for tests)
uv sync --all-extras --all-groups

# Start Ollama (required for most tests)
ollama serve

# Default: qualitative tests, skip slow tests
uv run pytest

# Fast tests only (no qualitative, ~2 min)
uv run pytest -m "not qualitative"

# Unit tests only (self-contained, no services)
uv run pytest -m unit

# Run only slow tests (>1 min)
uv run pytest -m slow

# Run specific backend tests
uv run pytest -m "ollama"
uv run pytest -m "openai"

# CI/CD mode (skips qualitative tests)
CICD=1 uv run pytest

# Lint and format
uv run ruff format .
uv run ruff check .

Required Models

Ollama

HuggingFace and cloud backends download or host models automatically. Ollama models must be pulled locally before running the tests that need them.

CI (unit + integration tests):

granite4:micro — default model for start_session() and most examples
granite4:micro-h — hybrid variant used by conftest fixtures

Examples (docs/examples/):

deepseek-r1:8b — safety / guardian examples
granite3-guardian:2b — mini-researcher guardian backend
granite3.2-vision — vision (Ollama chat) example
granite3.3:8b — m_decompose example
granite4:latest — melp examples
llama3.2 — repair-with-guardian example
llama3.2:3b — tutorial / mify examples (via META_LLAMA_3_2_3B)
qwen2.5vl:7b — vision (OpenAI-via-Ollama) example

Additional test models (test/):

granite4:small-h — hybrid-small tests
llama3.2:1b — lightweight inference tests
llama3:8b — legacy Llama 3 tests
llava — multimodal tests
mistral:7b — Mistral backend tests
smollm2:1.7b — SmolLM tests

Pull everything:

for m in granite4:micro granite4:micro-h deepseek-r1:8b \
  granite3-guardian:2b granite3.2-vision granite3.3:8b granite4:latest \
  llama3.2 llama3.2:3b \
  qwen2.5vl:7b granite4:small-h llama3.2:1b llama3:8b llava mistral:7b \
  smollm2:1.7b; do ollama pull "$m"; done

Test Markers

Tests use a four-tier granularity system (unit, integration, e2e, qualitative) plus backend and resource markers. See test/MARKERS_GUIDE.md for the full marker reference, including tier definitions, backend markers, resource gates, and auto-skip logic.

CI/CD Tests

CI runs the following checks on every pull request:

Pre-commit hooks (pre-commit run --all-files) - Ruff, mypy, uv-lock, codespell
Test suite (CICD=1 uv run pytest) - Skips qualitative tests for speed

To replicate CI locally:

# Run pre-commit checks (same as CI)
pre-commit run --all-files

# Run tests with CICD flag (same as CI, skips qualitative tests)
CICD=1 uv run pytest

Timing Expectations

Fast tests (-m "not qualitative"): ~2 minutes
Default tests (qualitative, no slow): Several minutes
Slow tests (-m slow): >1 minute each
Pre-commit hooks: 1-5 minutes

⚠️ Don't cancel mid-run - canceling pytest or pre-commit can corrupt state.

Common Issues & Troubleshooting

Problem	Fix
`ComponentParseError`	LLM output didn't match expected type. Add examples to docstring.
`uv.lock` out of sync	Run `uv sync` to update lock file.
`Ollama refused connection`	Run `ollama serve` to start Ollama server.
`ConnectionRefusedError` (port 11434)	Ollama not running. Start with `ollama serve`.
`TypeError: missing positional argument`	First argument to `@generative` function must be session `m`.
Output is wrong/None	Model too small or needs better prompt. Try larger model or add `reasoning` field.
`error: can't find Rust compiler`	Python 3.13+ requires Rust for outlines. Install Rust or use Python 3.12.
Tests fail on Intel Mac	Use conda: `conda install 'torchvision>=0.22.0'` then `uv pip install mellea`.
Pre-commit hooks fail	Run `pre-commit run --all-files` to see specific issues. Fix or use `git commit -n` to bypass. If a tool reports `command not found`, activate the virtual environment before committing: `source .venv/bin/activate`.

Debugging Tips

# Enable debug logging
from mellea.core import MelleaLogger
MelleaLogger.get_logger().setLevel("DEBUG")

# See exact prompt sent to LLM
print(m.last_prompt())

Getting Help

Check this guide and test/MARKERS_GUIDE.md
Search existing issues
Check out Github Discussions
Open a new issue with the appropriate label

Additional Resources

Documentation

Docs writing guide - Conventions, PR checklist, and review process for documentation contributions
API Documentation - Published documentation site
Test Markers Guide - Detailed pytest marker documentation
AGENTS.md - Guidelines for AI assistants working on Mellea internals
AGENTS_TEMPLATE.md - Template for projects using Mellea

Community

GitHub Issues - Report bugs or request features
GitHub Discussions - Ask questions and share ideas

Related Repositories

mellea-contribs - Community contributions

Feedback Loop

Found a bug, workaround, or pattern while contributing?

Issue/workaround? → Add to Common Issues section
Usage pattern? → Add to docs/AGENTS_TEMPLATE.md
New pitfall? → Add warning to relevant section

Help us improve this guide by opening a PR with your additions!

Thank you for contributing to Mellea! 🎉

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History