Contributing to JustHTML

Thanks for considering contributing to JustHTML! This document explains how to set up your development environment and the standards we follow.

Development Setup

Clone the repository:

git clone https://github.com/emilstenstrom/justhtml.git
cd justhtml

Create a virtual environment and install dev dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Install pre-commit hooks:
```
pre-commit install
```

Running Tests

The test suite uses the html5lib test cases plus additional tests for selector functionality.

If you want to run the full html5lib test suite locally, clone html5lib-tests next to this repository and create the symlinks described in tests/README.md (tokenizer, tree-construction, and serializer).

# Run all tests
python run_tests.py

# Run one suite (faster iteration)
python run_tests.py --suite tree
python run_tests.py --suite justhtml
python run_tests.py --suite tokenizer

# Run with coverage report
coverage run run_tests.py && coverage report

# Run specific test file
python run_tests.py --test-specs test2.test:5,10 -v

# Quick iteration - test a snippet
python -c 'from justhtml import JustHTML, to_test_format; print(to_test_format(JustHTML("<html>").root))'

Coverage is required to be 100%. All new code must be fully tested.

Pre-commit Hooks

Pre-commit runs automatically on every commit and checks:

Trailing whitespace and end-of-file formatting
YAML and TOML validity
Ruff check - linting with auto-fix
Ruff format - code formatting
Tests & Coverage - full test suite with 100% coverage requirement

Run manually:

pre-commit run --all-files

Code Style

We use Ruff for linting and formatting:

Line length: 119 characters
Target: Python 3.10+
Rules: Nearly all Ruff rules enabled (see pyproject.toml for exceptions)

Key style points:

Use plain assert for tests, not self.assertEqual etc.
Comments explain why, not what
No typing annotations
Cite spec sections when relevant (e.g., "Per §13.2.5.72")

Benchmarking

After making changes, verify performance impact:

# Quick benchmark
python benchmarks/performance.py --iterations 1 --parser justhtml --no-mem

# Profile hotspots
python benchmarks/profile.py

Architecture Notes

Tokenizer (tokenizer.py): HTML5 spec state machine
Tree builder (treebuilder.py): Constructs DOM tree following HTML5 rules
Node tree (node.py): DOM-like structure, use append_child() / insert_before()
Selector (selector.py): CSS selector matching

Golden rules:

Follow WHATWG HTML5 spec exactly
No exceptions in hot paths
Minimal allocations in tokenizer
No hasattr/getattr/delattr - all structures are deterministic

Submitting Changes

Fork the repository
Create a feature branch
Make your changes with tests
Ensure pre-commit passes
Submit a pull request

Questions? Open an issue on GitHub. For security vulnerabilities, please see our Security Policy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to JustHTML

Development Setup

Running Tests

Pre-commit Hooks

Code Style

Benchmarking

Architecture Notes

Submitting Changes

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to JustHTML

Development Setup

Running Tests

Pre-commit Hooks

Code Style

Benchmarking

Architecture Notes

Submitting Changes