Skip to content

Latest commit

 

History

History
26 lines (21 loc) · 771 Bytes

File metadata and controls

26 lines (21 loc) · 771 Bytes

HTML Tokenizer

An HTML5 tokenizer, fully compliant with the WhatWG HTML specification, written in Rust.

Features

  • 100% WhatWG spec-compliant HTML tokenization
  • Passes all html5lib tokenizer test cases
  • Tokenizes HTML input into tag, comment, doctype and character tokens
  • Really fast, finishes all html5lib tokenizer test cases in 0.15 seconds

Usage

  1. Build the project
cargo build
  1. Run all testcases
cargo test

Future Improvements

The tokenizer is finished. The next step is to add tree construction (parsing). This effort has been started in the tree-construction branch.

Requirements

  • Cargo
  • Rust