Thanks for considering contributing to JustHTML! This document explains how to set up your development environment and the standards we follow.
-
Clone the repository:
git clone https://github.com/emilstenstrom/justhtml.git cd justhtml -
Create a virtual environment and install dev dependencies:
python -m venv .venv source .venv/bin/activate pip install -e ".[dev]"
-
Install pre-commit hooks:
pre-commit install
The test suite uses the html5lib test cases plus additional tests for selector functionality.
If you want to run the full html5lib test suite locally, clone html5lib-tests next to this repository and create the symlinks described in tests/README.md (tokenizer, tree-construction, and serializer).
# Run all tests
python run_tests.py
# Run one suite (faster iteration)
python run_tests.py --suite tree
python run_tests.py --suite justhtml
python run_tests.py --suite tokenizer
# Run with coverage report
coverage run run_tests.py && coverage report
# Run specific test file
python run_tests.py --test-specs test2.test:5,10 -v
# Quick iteration - test a snippet
python -c 'from justhtml import JustHTML, to_test_format; print(to_test_format(JustHTML("<html>").root))'Coverage is required to be 100%. All new code must be fully tested.
Pre-commit runs automatically on every commit and checks:
- Trailing whitespace and end-of-file formatting
- YAML and TOML validity
- Ruff check - linting with auto-fix
- Ruff format - code formatting
- Tests & Coverage - full test suite with 100% coverage requirement
Run manually:
pre-commit run --all-filesWe use Ruff for linting and formatting:
- Line length: 119 characters
- Target: Python 3.10+
- Rules: Nearly all Ruff rules enabled (see
pyproject.tomlfor exceptions)
Key style points:
- Use plain
assertfor tests, notself.assertEqualetc. - Comments explain why, not what
- No typing annotations
- Cite spec sections when relevant (e.g., "Per §13.2.5.72")
After making changes, verify performance impact:
# Quick benchmark
python benchmarks/performance.py --iterations 1 --parser justhtml --no-mem
# Profile hotspots
python benchmarks/profile.py- Tokenizer (
tokenizer.py): HTML5 spec state machine - Tree builder (
treebuilder.py): Constructs DOM tree following HTML5 rules - Node tree (
node.py): DOM-like structure, useappend_child()/insert_before() - Selector (
selector.py): CSS selector matching
Golden rules:
- Follow WHATWG HTML5 spec exactly
- No exceptions in hot paths
- Minimal allocations in tokenizer
- No
hasattr/getattr/delattr- all structures are deterministic
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Ensure pre-commit passes
- Submit a pull request
Questions? Open an issue on GitHub. For security vulnerabilities, please see our Security Policy.