⚠️ Beta Notice: This project is in beta and the API may change before reaching stability. I will, of course, try to minimize breaking changes and followsemver. But until there's a 1.0.0, I will break things at my own discretion. If you want to be notified beforehand, let me know. Otherwise, I will assume no one is using this in anything mission critical.
A fast, flexible, error-tolerant USFM 3.x parser written in Rust, with CLI, Python, and WebAssembly bindings.
The public API is staged:
tokenize -> parse_cst -> parse_ast / lower_cst -> serialize
That lets editor-style integrations stay on the cheap token/CST path, while AST-dependent work like USJ, USX, USFM, vref, and diagnostics is only paid for when requested.
| Package | Description |
|---|---|
usfm3 |
Core Rust crate |
usfm3-cli |
CLI |
usfm3-python |
PyPI package |
usfm3-wasm |
npm package / WASM bindings |
usfm3 tokens path/to/file.usfm
usfm3 cst path/to/file.usfm
usfm3 ast path/to/file.usfm
usfm3 diagnostics path/to/file.usfm
usfm3 usj path/to/file.usfm
usfm3 usj path/to/file.usfm --inline-spans
usfm3 usx path/to/file.usfm
usfm3 usfm path/to/file.usfm
usfm3 vref path/to/file.usfmIf no input path is given, the CLI reads from stdin.
let parsed = usfm3::parse(
r#"\id GEN
\c 1
\p
\v 1 In the beginning God created the heavens and the earth.
"#,
usfm3::ParseOptions::default(),
);
let tokens = parsed.tokens();
let cst = parsed.cst();
let ast = parsed.ast();
let source_map = parsed.source_map();
let diagnostics = parsed.diagnostics();
let usj = parsed
.to_usj(usfm3::usj::UsjOptions { include_spans: false })
.unwrap();
let usx = parsed.to_usx().unwrap();
let usfm = parsed.to_usfm();
let vref = parsed.to_vref();For eager AST work:
let ast_document = usfm3::parse_ast(
text,
usfm3::ParseOptions {
diagnostics: true,
},
);
let ast = ast_document.ast;
let source_map = ast_document.source_map;
let diagnostics = ast_document.diagnostics;For CST-first lowering:
let cst = usfm3::parse_cst(text);
let ast_document = usfm3::lower_cst(
&cst,
usfm3::ParseOptions {
diagnostics: true,
},
);import usfm3
parsed = usfm3.parse(text)
tokens = usfm3.tokenize(text)
cst = usfm3.parse_cst(text)
ast_document = usfm3.parse_ast(text, diagnostics=True)
ast = parsed.ast()
source_map = parsed.source_map()
diagnostics = parsed.diagnostics
usj = parsed.to_usj()
usj_with_spans = parsed.to_usj(spans=True)
usx = parsed.to_usx()
usfm = parsed.to_usfm()
vref = parsed.to_vref()import { parse, parseAst, parseCst, tokenize } from "usfm3";
const parsed = parse(usfmText);
const tokens = tokenize(usfmText);
const cst = parseCst(usfmText);
const astDocument = parseAst(usfmText, { diagnostics: true });
const ast = parsed.ast();
const sourceMap = parsed.sourceMap();
const diagnostics = parsed.diagnostics();
const usj = parsed.toUsj();
const usjWithSpans = parsed.toUsj({ spans: true });
const usx = parsed.toUsx();
const usfm = parsed.toUsfm();
const vref = parsed.toVref();
parsed.free();- AST nodes do not carry spans.
- Source locations live in a parallel
source_maptree. - Diagnostics are a single flat list, sorted in document order.
- Diagnostics are only computed when
diagnostics: trueis requested. - Inline USJ spans are derived from the source map.
tokenize()is the lightest editor-facing path.parse_cst()preserves full source-backed structure and is the preferred CST/LSP path.parse()is lazy and diagnostics-off by default.parse_ast(..., diagnostics: true)orlower_cst(..., diagnostics: true)opt into the full semantic + diagnostic pass.
MIT