Tree-sitter grammar for OML (Output Mapping Language) — a domain-specific language for data extraction, transformation, and mapping.
OML defines declarative rules that map source data to target fields. It supports piping, pattern matching, SQL lookups, object/array aggregation, type casting, and privacy masking, making it suitable for log processing, ETL pipelines, and data normalization tasks.
An OML file is divided into sections separated by ---:
<header>
---
<static blocks>
<mapping rules>
---
<privacy configuration>
- Header — declares the mapping name, matching rule, and enable flag
- Static blocks — compile-time constants
- Mapping rules — the main data transformation logic
- Privacy section — optional field-level privacy masking
name : csv_example
rule : /csv/data
enable : true
# read from source field
host = read(hostname) ;
# read with fallback default
pos_sn = read() { _ : chars(FALLBACK) } ;
# take from parsed output
* = take() ;
# json path
deep = read(/user/info/name) ;
# option: try multiple keys
value = read(option:[id, uid, user_id]) ;
Chain transformations with the | operator:
url_host = read(http_url) | url(host) ;
ts = read(time) | Time::to_ts_zone(0, ms) ;
result = pipe @data | to_json | base64_encode ;
is_http = read(url) | starts_with('http://') ;
first_port = pipe read(ports) | nth(0) ;
Available pipe functions: nth(), get(), url(), path(), base64_decode(), base64_encode, starts_with(), map_to(), Time::to_ts, Time::to_ts_ms, Time::to_ts_us, Time::to_ts_zone(), to_json, to_str, skip_empty, ip4_to_int, html_escape, html_unescape, str_escape, json_escape, json_unescape, extract_main_word, extract_subject_object.
message = fmt("{}-{}", @user, read(city)) ;
id = fmt("{}:{}", read(host), read(port)) ;
values : obj = object {
cpu_free, memory_free : digit = read() ;
} ;
ports : array = collect read(keys:[sport, dport]) ;
Single source:
quarter : chars = match read(month) {
in (digit(1), digit(3)) => chars(Q1) ;
in (digit(4), digit(6)) => chars(Q2) ;
_ => chars(QX) ;
} ;
Multi source:
zone : chars = match (read(city), read(region), read(country)) {
(chars(bj), chars(north), chars(cn)) => chars(zone1) ;
(chars(sh), chars(east), chars(cn)) => chars(zone2) ;
_ => chars(unknown) ;
} ;
OR conditions:
tier : chars = match read(city) {
chars(bj) | chars(sh) | chars(gz) => chars(tier1) ;
chars(cd) | chars(wh) => chars(tier2) ;
_ => chars(other) ;
} ;
Match functions: starts_with(), ends_with(), contains(), regex_match(), iequals(), is_empty(), gt(), lt(), eq(), in_range(), in().
name, pinying = select name, pinying from example where pinying = read(py) ;
auto, chars, digit, float, bool, ip, ip_net, domain, url, time, time_iso, time_3339, time_2822, time_timestamp, time_clf, time/<format>, array, array/<type>, obj, json, kv, base64.
---
src_ip : privacy_ip
pos_sn : privacy_keymsg
Privacy types: privacy_ip, privacy_specify_ip, privacy_id_card, privacy_mobile, privacy_mail, privacy_domain, privacy_specify_domain, privacy_specify_name, privacy_specify_address, privacy_specify_company, privacy_keymsg.
Add to your Cargo.toml:
[dependencies]
tree-sitter = ">=0.22.6"
tree-sitter-oml = "0.2.0"let language = tree_sitter_oml::language();
let mut parser = tree_sitter::Parser::new();
parser.set_language(&language).unwrap();
let source = "name : example\n---\nhost = read(hostname) ;";
let tree = parser.parse(source, None).unwrap();
println!("{}", tree.root_node().to_sexp());const Parser = require("tree-sitter");
const OML = require("tree-sitter-oml");
const parser = new Parser();
parser.setLanguage(OML);
const tree = parser.parse("name : example\n---\nhost = read(hostname) ;");
console.log(tree.rootNode.toString());- Node.js (for
tree-sitter-cli) - Rust toolchain (for building the Rust binding)
# Install dependencies
npm install
# Generate the parser from grammar.js
npx tree-sitter generate
# Run tests
npx tree-sitter test
# Build the Rust binding
cargo build
# Run Rust tests
cargo testtree-sitter-oml/
├── grammar.js # Grammar definition
├── queries/
│ └── highlights.scm # Syntax highlighting queries
├── bindings/
│ └── rust/ # Rust language binding
├── src/
│ ├── parser.c # Generated parser
│ ├── grammar.json # Generated grammar schema
│ └── node-types.json # AST node type definitions
├── examples/
│ ├── test.oml # Example: comprehensive features
│ └── test2.oml # Example: format and pipe operations
├── Cargo.toml # Rust package manifest
├── package.json # Node.js package manifest
└── tree-sitter.json # Tree-sitter configuration
The queries/highlights.scm file provides syntax highlighting for the Zed editor. See the companion Zed extension for integration.
Apache License 2.0 — see LICENSE for details.