Version: 1.3.0
Date: 2025-12-01
Status: Stable Release
Authors: ZON Format Contributors
License: MIT
Zero Overhead Notation (ZON) is a compact, line-oriented text format that encodes the JSON data model with minimal redundancy optimized for large language model token efficiency. ZON achieves up to 23.8% token reduction compared to JSON through single-character primitives (T, F), null as null, explicit table markers (@), colon-less nested structures, and intelligent quoting rules. Arrays of uniform objects use tabular encoding with column headers declared once; metadata uses flat key-value pairs. This specification defines ZON's concrete syntax, canonical value formatting, encoding/decoding behavior, conformance requirements, and strict validation rules. ZON provides deterministic, lossless representation achieving 100% LLM retrieval accuracy in benchmarks.
This document is a Stable Release v1.1.0 and defines normative behavior for ZON encoders, decoders, and validators. Implementation feedback should be reported at https://github.com/ZON-Format/zon-TS.
Backward compatibility is maintained across v1.0.x releases. Major versions (v2.x) may introduce breaking changes.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
https://www.rfc-editor.org/rfc/rfc2119
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, May 2017.
https://www.rfc-editor.org/rfc/rfc8174
[RFC8259] Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, December 2017.
https://www.rfc-editor.org/rfc/rfc8259
[RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma-Separated Values (CSV) Files", RFC 4180, October 2005.
https://www.rfc-editor.org/rfc/rfc4180
[ISO8601] ISO 8601:2019, "Date and time — Representations for information interchange".
[UNICODE] The Unicode Consortium, "The Unicode Standard", Version 15.1, September 2023.
- Introduction
- Terminology and Conventions
- Data Model
- Encoding Normalization
- Decoding Interpretation
- Concrete Syntax
- Primitives
- Strings and Keys
- Objects
- Arrays
- Table Format
- Quoting and Escaping
- Whitespace
- Conformance
- Strict Mode Errors
- Security
- Internationalization
- Interoperability
- Media Type
- Error Handling
- Appendices
ZON addresses token bloat in JSON while maintaining structural fidelity. By declaring column headers once, using single-character tokens, and eliminating redundant punctuation, ZON achieves optimal compression for LLM contexts.
- Minimize tokens - Every character counts in LLM context windows
- Preserve structure - 100% lossless round-trip conversion
- Human readable - Debuggable, understandable format
- LLM friendly - Explicit markers aid comprehension
- Deterministic - Same input → same output
- Deep Nesting - Efficiently handles complex, recursive structures
✅ Use ZON for:
- LLM prompt contexts (RAG, few-shot examples)
- Log storage and analysis
- Configuration files
- Browser storage (localStorage)
- Tabular data interchange
- Complex nested data structures (ZON excels here)
❌ Don't use ZON for:
- Public REST APIs (use JSON for compatibility)
- Real-time streaming protocols (not yet supported)
- Files requiring comments (use YAML/JSONC)
JSON (118 chars):
{"users":[{"id":1,"name":"Alice","active":true},{"id":2,"name":"Bob","active":false}]}ZON (64 chars, 46% reduction):
users:@(2):active,id,name
T,1,Alice
F,2,Bob
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL are interpreted per [RFC2119] and [RFC8174].
ZON document - UTF-8 text conforming to this specification
Line - Character sequence terminated by LF (\n)
Key-value pair - Line pattern: key:value
Table - Array of uniform objects with header + data rows
Table header - Pattern: key:@(N):columns or @(N):columns
Meta separator - Colon (:) separating keys/values
Table marker - At-sign (@) indicating table structure
Primitive - Boolean, null, number, or string (not object/array)
Uniform array - All elements are objects with identical keys
Strict mode - Validation enforcing row/column counts
ZON encodes the JSON data model:
- Primitives:
string | number | boolean | null - Objects:
{ [string]: JsonValue } - Arrays:
JsonValue[]
- Arrays: Order MUST be preserved exactly
- Objects: Key order MUST be preserved
- Encoders SHOULD sort keys alphabetically
- Decoders MUST preserve document order
Requirements for ENCODER:
- No leading zeros:
007→ invalid - No trailing zeros:
3.14000→3.14 - No unnecessary decimals: Integer
5stays5, not5.0 - No scientific notation:
1e6→1000000,1e-3→0.001 - Special values map to null:
NaN→nullInfinity→null-Infinity→null
Implementation:
- Integers: Use standard string representation
- Floats: Ensure decimal point present, convert exponents to fixed-point
- Special values: Normalized to
nullbefore encoding
Examples:
1000000 ✓ (not 1e6 or 1e+6)
0.001 ✓ (not 1e-3)
3.14 ✓ (not 3.140000)
42 ✓ (integer, no decimal)
null ✓ (was NaN or Infinity)
Scientific notation:
1e6 ⚠️ Decoders MUST accept, encoders SHOULD avoid (prefer 1000000)
2.5E-3 ⚠️ Decoders MUST accept, encoders SHOULD avoid (prefer 0.0025)
Requirements:
- Encoders MUST ensure
decode(encode(x)) === x(round-trip fidelity) - No trailing zeros in fractional part (except
.0for float clarity) - No leading zeros (except standalone
0) -0normalizes to0
NaN→nullInfinity→null-Infinity→null
Encoders MUST normalize non-JSON types before encoding:
JavaScript/TypeScript:
| Input | ZON Output | Notes |
|---|---|---|
undefined |
null |
Null |
Symbol() |
null |
Not serializable |
function() {} |
null |
Not serializable |
new Date() |
"2025-11-28T10:00:00Z" |
ISO 8601 string |
new Set([1,2]) |
"[1,2]" |
Convert to array |
new Map([[k,v]]) |
"{k:v}" |
Convert to object |
BigInt(999) |
"999" |
String if outside safe range |
Python:
| Input | ZON Output | Notes |
|---|---|---|
None |
null |
Null |
datetime.now() |
"2025-11-28T10:00:00Z" |
ISO 8601 |
set([1,2]) |
"[1,2]" |
Convert to list |
Decimal('3.14') |
3.14 or "3.14" |
Number if no precision loss |
bytes(b'\x00') |
"<base64>" |
Base64 encode |
Implementations MUST document their normalization policy.
Unquoted tokens:
T → true (boolean)
F → false (boolean)
null → null
42 → 42 (integer)
3.14 → 3.14 (float)
1e6 → 1000000 (number)
05 → "05" (string, leading zero)
hello → "hello" (string)
Quoted tokens:
"T" → "T" (string, not boolean)
"123" → "123" (string, not number)
"hello" → "hello" (string)
"" → "" (empty string)
Only these escapes are valid:
\\→\\"→"\n→ newline\r→ carriage return\t→ tab
Invalid escapes MUST error:
"\x41" ❌ Invalid
"\u0041" ❌ Invalid (use literal UTF-8)
"\b" ❌ Invalid
Numbers with leading zeros are strings:
05 → "05" (string)
007 → "007" (string)
0 → 0 (number)
ZON documents are line-oriented:
- Lines end with LF (
\n) - Empty lines are whitespace-only
- Blank lines separate metadata from tables
Determined by first non-empty line:
Root table:
@(2):id,name
1,Alice
2,Bob
Root object:
name:Alice
age:30
Root primitive:
42
document = object-form / table-form / primitive-form
object-form = *(key-value / table-section)
table-form = table-header 1*data-row
primitive-form = value
key-value = key ":" value LF
table-header = [key ":"] "@" "(" count ")" ":" column-list LF
table-section = table-header 1*data-row
data-row = value *("," value) LF
key = unquoted-string / quoted-string
value = primitive / quoted-compound
primitive = "T" / "F" / "null" / number / unquoted-string
quoted-compound = quoted-string ; Contains JSON-like notation
column-list = column *("," column)
column = key
count = 1*DIGIT
number = ["-"] 1*DIGIT ["." 1*DIGIT] [("e"/"E") ["+"/"-"] 1*DIGIT]Encoding:
true→Tfalse→F
Decoding:
T(case-sensitive) →trueF(case-sensitive) →false
Rationale: 75% character reduction
Encoding:
null→null(4-character literal)
Decoding:
null→null- Also accepts (case-insensitive):
none,nil
Rationale: Clarity and readability over minimal compression
Examples:
age:30
price:19.99
score:-42
temp:98.6
large:1000000
Rules:
- Integers without decimal:
42 - Floats with decimal:
3.14 - Negatives with
-prefix:-17 - No thousands separators
- Decimal separator is
.(period)
Pattern: ^[a-zA-Z0-9_\-\.]+$
Examples:
name:Alice
full_name:Alice Smith
user_id:u123
version:v1.1.0
api-key:sk_test_key
Quote strings if they:
- Contain structural chars:
,,:,[,],{,}," - Match literal keywords:
T,F,true,false,null,none,nil - Look like PURE numbers:
123,3.14,1e6(Complex patterns like192.168.1.1orv1.1.0do NOT need quoting) - Have whitespace: Leading/trailing spaces (MUST quote to preserve). Internal spaces are allowed in unquoted strings.
- Are empty:
""(MUST quote to distinguish fromnull) - Contain escapes: Newlines, tabs, quotes (MUST quote to prevent structure breakage)
Examples:
message:"Hello, world"
path:"C:\Users\file"
empty:""
quoted:"true"
number:"123"
spaces:" padded "
ISO 8601 dates MAY be unquoted:
created:2025-11-28
timestamp:2025-11-28T10:00:00Z
time:10:30:00
Decoders interpret these as strings (not parsed as Date objects unless application logic does so).
active:T
age:30
name:Alice
Decodes to:
{"active": true, "age": 30, "name": "Alice"}Grouped object notation (preferred):
config{database{host:localhost,port:5432},cache{ttl:3600}}
Quoted compound notation (legacy/alternative):
config:"{database:{host:localhost,port:5432},cache:{ttl:3600}}"
Alternatively using JSON string:
config:"{"database":{"host":"localhost","port":5432}}"
metadata:"{}"
Decision algorithm:
- All elements are objects with same keys? → Table format
- Otherwise → Inline quoted format
Primitive arrays:
tags:"[nodejs,typescript,llm]"
numbers:"[1,2,3,4,5]"
flags:"[T,F,T]"
mixed:"[hello,123,T,null]"
Empty:
items:"[]"
Uniform detection:
Calculate irregularity score:
For each pair of objects (i, j):
similarity = shared_keys / (keys_i + keys_j - shared_keys) # Jaccard
Avg_similarity = mean(all_similarities)
Irregularity = 1 - avg_similarity
Threshold:
- If irregularity > 0.6 → Use inline format
- If irregularity ≤ 0.6 → Use table format
With key:
users:@(2):active,id,name
Root array:
@(2):active,id,name
Components:
users- Array key (optional for root)@- Table marker (REQUIRED)(2)- Row count (REQUIRED for strict mode):- Separator (REQUIRED)active,id,name- Columns, comma-separated (REQUIRED)
Columns SHOULD be sorted alphabetically:
users:@(2):active,id,name,role
T,1,Alice,admin
F,2,Bob,user
Each row is comma-separated values:
T,1,Alice,admin
Rules:
- One row per line
- Values encoded as primitives (§6-7)
- Field count MUST equal column count (strict mode)
- Missing values encode as
null
Optional fields append as key:value:
users:@(3):id,name
1,Alice
2,Bob,role:admin,score:98
3,Carol
Row 2 decodes to:
{"id": 2, "name": "Bob", "role": "admin", "score": 98}Nested objects in tables are flattened using dot notation for sparse columns.
users:@(2):id
1,meta.role:admin
2,meta.role:user,meta.active:T
For table values containing commas:
messages:@(1):id,text
1,"He said ""hello"" to me"
Rules:
- Wrap in double quotes:
"value" - Escape internal quotes by doubling:
"→""
multiline:"Line 1\nLine 2"
tab:"Col1\tCol2"
quote:"She said \"Hi\""
backslash:"C:\\path\\file"
Valid escapes:
\\→\\"→"\n→ newline\r→ CR\t→ tab
Use literal UTF-8 (no \uXXXX escapes):
chinese:王小明
emoji:✅
arabic:مرحبا
Encoders MUST:
- Use LF (
\n) line endings - NOT emit trailing whitespace on lines
- NOT emit trailing newline at EOF (RECOMMENDED)
- MAY emit one blank line between metadata and table
Decoders SHOULD:
- Accept LF or CRLF (normalize to LF)
- Ignore trailing whitespace per line
- Treat multiple blank lines as single separator
✅ A conforming encoder MUST:
- Emit UTF-8 with LF line endings
- Encode booleans as
T/F - Encode null as
null - Emit canonical numbers (§2.3)
- Normalize NaN/Infinity to
null - Detect uniform arrays → table format
- Emit table headers:
key:@(N):columns - Sort columns alphabetically
- Sort object keys alphabetically
- Quote strings per §7.2-7.3
- Use only valid escapes (§11.2)
- Preserve array order
- Preserve key order
- Ensure round-trip:
decode(encode(x)) === x
✅ A conforming decoder MUST:
- Accept UTF-8 (LF or CRLF)
- Decode
T→ true,F→ false,null→ null - Parse decimal and exponent numbers
- Treat leading-zero numbers as strings
- Unescape quoted strings
- Error on invalid escapes
- Parse table headers:
key:@(N):columns - Split rows by comma (CSV-aware)
- Preserve array order
- Preserve key order
- Error Codes:
E001: Row count mismatch (strict mode)E002: Field count mismatch (strict mode)E301: Document size > 100MBE302: Line length > 1MBE303: Array length > 1M itemsE304: Object key count > 100K
- Enforce row count (strict mode)
- Enforce field count (strict mode)
Enabled by default in reference implementation.
Enforces:
- Table row count = declared
(N) - Each row field count = column count
- No malformed headers
- No invalid escapes
- No unterminated strings
Non-strict mode MAY tolerate count mismatches.
ZON includes a runtime schema validation library designed for LLM guardrails. It allows defining expected structures and validating LLM outputs against them.
import { zon } from 'zon-format';
const UserSchema = zon.object({
name: zon.string().describe("Full name"),
age: zon.number(),
role: zon.enum(['admin', 'user']),
tags: zon.array(zon.string()).optional()
});Schemas can generate system prompts to guide LLMs:
const prompt = UserSchema.toPrompt();
// Output:
// object:
// - name: string - Full name
// - age: number
// - role: enum(admin, user)
// - tags: array of [string] (optional)import { validate } from 'zon-format';
const result = validate(llmOutputString, UserSchema);
if (result.success) {
console.log(result.data); // Typed data
} else {
console.error(result.error); // "Expected number at age, got string"
}| Code | Error | Example |
|---|---|---|
| E001 | Row count mismatch | @(2) but 3 rows |
| E002 | Field count mismatch | 3 columns, row has 2 values |
| E003 | Malformed header | Missing @, (N), or : |
| E004 | Invalid column name | Unescaped special chars |
| Code | Error | Example |
|---|---|---|
| E101 | Invalid escape | "\x41" instead of "A" |
| E102 | Unterminated string | "hello (no closing quote) |
| E103 | Missing colon | name Alice → name:Alice |
| E104 | Empty key | :value |
| Code | Error | Example |
|---|---|---|
| E201 | Trailing whitespace | Line ends with spaces |
| E202 | CRLF line ending | \r\n instead of \n |
| E203 | Multiple blank lines | More than one consecutive |
| E204 | Trailing newline | Document ends with \n |
Implementations SHOULD limit:
- Document size: 100 MB
- Line length: 1 MB
- Nesting depth: 100 levels
- Array length: 1,000,000
- Object keys: 100,000
Prevents denial-of-service attacks.
- Validate UTF-8 strictly
- Error on invalid escapes
- Reject malformed numbers
- Limit recursion depth
ZON does not execute code. Applications MUST sanitize before:
- SQL queries
- Shell commands
- HTML rendering
REQUIRED: UTF-8 without BOM
Decoders MUST:
- Reject invalid UTF-8
- Reject BOM (U+FEFF) at start
Full Unicode support:
- Emoji:
✅,🚀 - CJK:
王小明,日本語 - RTL:
مرحبا,שלום
- Decimal separator:
.(period) - No thousands separators
- ISO 8601 dates for internationalization
ZON → JSON: Lossless
JSON → ZON: Lossless, with 35-50% compression for tabular data
Example:
{"users": [{"id": 1, "name": "Alice"}]}↓ ZON (42% smaller)
users:@(1):id,name
1,Alice
CSV → ZON: Add type awareness ZON → CSV: Table rows export cleanly
Advantages over CSV:
- Type preservation
- Metadata support
- Nesting capability
Comparison:
- ZON: Flat,
@(N),T/F/null→ Better compression - TOON: Indented,
[N]{fields}:,true/false→ Better readability Both are LLM-optimized; choose based on data shape.
Extension: .zonf
ZON files use the .zonf extension (ZON Format) for all file operations.
Examples:
data.zonf
users.zonf
config.zonf
Status: Provisional (not yet registered with IANA)
Media type: text/zonf
Charset: UTF-8 (always)
ZON documents are always UTF-8 encoded. The charset=utf-8 parameter may be specified but defaults to UTF-8 when omitted.
Content-Type: text/zonf
Content-Type: text/zonf; charset=utf-8 # Explicit (optional)types {
text/zonf zonf;
}
server {
default_type text/zonf;
}AddType text/zonf .zonfFiles containing ZON data should use the .zonf extension.
If the Content-Type header is missing, clients may sniff the content. ZON files typically start with a header row or a ZON node.
HTTP/1.1 200 OK
Content-Type: text/zonf; charset=utf-8
Content-Length: 1234
users:@(2):id,name
1,Alice
2,BobNormative requirement: ZON files MUST be UTF-8 encoded.
Rationale:
- Universal support across programming languages
- Compatible with JSON (RFC 8259)
- No byte-order mark (BOM) required
- Supports full Unicode character set
Encoding declaration: Not required (always UTF-8)
Current status: Not registered
Future work: Formal registration with IANA is planned for v2.0.
Template for registration:
Type name: text
Subtype name: zon
Required parameters: None
Optional parameters: charset (default: utf-8)
Encoding considerations: Always UTF-8
Security considerations: See §15
Interoperability considerations: None known
Published specification: This document
Applications that use this media type: Data serialization for LLMs
Fragment identifier considerations: N/A
Additional information:
File extension: .zonf
Macintosh file type code: TEXT
Uniform Type Identifier: public.zon
Person & email address: See repository
Intended usage: COMMON
Restrictions on usage: None
Provisional (not yet IANA-registered). May pursue formal registration at v2.0.
A.1 Simple Object
active:T
age:30
name:Alice
A.2 Table
users:@(2):active,id,name
T,1,Alice
F,2,Bob
A.3 Mixed
tags:"[api,auth]"
version:1.0
users:@(1):id,name
1,Alice
A.4 Root Array
@(2):id,name
1,Alice
2,Bob
Coverage:
- ✅ 28/28 unit tests
- ✅ 27/27 roundtrip tests
- ✅ 100% data integrity
Test categories:
- Primitives (T, F, null, numbers, strings)
- Tables (uniform arrays)
- Quoting, escaping
- Round-trip fidelity
- Edge cases, errors
v1.1.0 (2025-12-01)
- Sparse Tables (
key:valuein rows) - Hierarchical Sparse Encoding (deep flattening)
- Metadata Optimization (grouped objects)
- Type Coercion (opt-in)
- LLM Optimizations (
encodeLLM)
v1.0.4 (2025-11-29)
- Disabled sequential column omission
- 100% LLM accuracy achieved
- All columns explicit
v1.0.2 (2025-11-27)
- Irregularity threshold tuning
- ISO date detection
- Sparse table encoding v1.0.1 (2025-11-26)
- License: MIT
- Documentation updates
v1.0.0 (2025-11-26)
- Initial stable release
- Single-character primitives
- Table format
- Lossless round-trip
Decoder flow:
1. Split by lines (LF/CRLF)
2. Detect root form (table / object / primitive)
3. If table:
a. Parse header: @(N):columns
b. Read N rows
c. Split by comma (CSV-aware)
d. Map to objects
4. If object:
a. Parse key:value pairs
b. Build object
5. Return decoded value
CSV-aware row splitting:
function parseRow(line, columns) {
const values = [];
let current = '';
let inQuotes = false;
for (let i = 0; i < line.length; i++) {
const char = line[i];
if (char === '"' && !inQuotes) {
inQuotes = true;
} else if (char === '"' && inQuotes) {
if (line[i+1] === '"') { // Escaped quote
current += '"';
i++;
} else {
inQuotes = false;
}
} else if (char === ',' && !inQuotes) {
values.push(parseValue(current.trim()));
current = '';
} else {
current += char;
}
}
values.push(parseValue(current.trim()));
return values;
}MIT License
Copyright (c) 2025 ZON-FORMAT (Roni Bhakta)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
End of Specification