Implements a hierarchical transformer for language modeling that operates directly on raw text without a tokenizer.
-
Updated
Jun 3, 2025 - Python
Implements a hierarchical transformer for language modeling that operates directly on raw text without a tokenizer.
Engram without a tokenizer
A tokenizer-free NLP library with T-FREE, CANINE, and byte-level approaches
Non-learned byte-level signal encoder for PyTorch - one modality-agnostic 27-D exact base (anchor rule, Delta = Gray code), losslessly invertible. pip install hsl-embedding
Byte Latent Transformer (BLT) LLM built from scratch in PyTorch — tokenizer-free, byte-level, trained end-to-end on TinyStories to 0.71 BPB. Clone & run.
Add a description, image, and links to the tokenizer-free topic page so that developers can more easily learn about it.
To associate your repository with the tokenizer-free topic, visit your repo's landing page and select "manage topics."