A JSON formatting tool optimized for LLM consumption, balancing readability and token efficiency.
When feeding JSON data to LLMs, you face a trade-off:
- Pretty format: High readability but wastes tokens on whitespace
- Compact format: Token efficient but hard to read
jf solves this by intelligently detecting "entity objects" (like user records, order items) and keeping them on single lines while expanding the overall structure. This achieves ~40% token reduction compared to pretty format while maintaining excellent readability.
- Entity-Aware Formatting: Auto-detect "entity objects" in arrays and keep them on single lines
- Schema Statistics: Infer entities based on P90 length analysis
- LLM-Assisted Labeling: Generate prompts for LLM to identify entities, with manual override support
- Smart Key Sorting: Alphabetic or weighted sorting (id/name first)
- Multiple Format Modes: Smart / Compact / Pretty
- JSON5 Support: Accept JSON5 input and optionally emit JSON5 output
- Schema Extraction: Generate compact type schemas from JSON data
brew tap luw2007/tap
brew install jfDownload the matching archive from GitHub Releases, then extract and move jf into your PATH.
tar -xzf jf-vX.Y.Z-<target>.tar.gz
sudo install -m 0755 jf /usr/local/bin/jf
jf --helpgit clone https://github.com/luw2007/llm_json_formatter.git
cd llm_json_formatter
cargo build --release
# Binary: target/release/jfcargo install llm_json_formatter# Quick format - shortcut (automatically uses format command)
jf data.json
# Format multiple files
jf file1.json file2.json file3.json
# Default formatting (auto-detect entities) - explicit command
jf format data.json
# Output:
# {
# "users": [
# {"id":1,"name":"Alice"},
# {"id":2,"name":"Bob"}
# ]
# }jf format <INPUT> [OPTIONS] [-o <OUTPUT>]Options:
| Option | Default | Description |
|---|---|---|
-m, --mode |
smart | Format mode: smart / compact / pretty |
--sort |
alphabetic | Key sorting: alphabetic / smart |
--indent |
2 | Indentation spaces |
--inline-limit |
80 | Max line length for inline objects in Smart mode |
--array-item-inline-limit |
2048 | Max line length for array items (entities) |
--entity-threshold |
2000 | Length threshold for auto-detected entities |
--entities |
- | Comma-separated or JSON array of entity paths |
--output-syntax |
auto | Output syntax: auto / json / json5 |
Examples:
# Quick format (shortcut, defaults to smart mode)
jf data.json
# Compact mode (minimum tokens)
jf format data.json --mode compact
# Pretty mode (maximum readability)
jf format data.json --mode pretty
# Manually specify entities
jf format data.json --entities "users[*],orders[*]"
# JSON array format for entities
jf format data.json --entities '["users[*]","orders[*]"]'
# Disable auto entity detection
jf format data.json --entity-threshold 0
# Smart key sorting (id/name first)
jf format data.json --sort smart
# Auto output syntax (JSON in -> JSON out, JSON5 in -> JSON5 out)
jf format data.json5 --mode compact --output-syntax auto
# Force JSON5 output
jf format data.json --mode compact --output-syntax json5Generate a prompt containing schema structure and samples for LLM to identify "business entities".
jf prompt <INPUT>Example:
jf prompt data.jsonOutput:
Analyze the JSON schema below and identify 'Business Entities'...
Schema:
{
"users": [{...}]
}
Array paths and samples:
Path: users[*]
Samples:
- {"id":1,"name":"Alice"...
- {"id":2,"name":"Bob"...
Output ONLY a JSON array of entity paths...
Extract a compact type schema from JSON data.
jf schema <INPUT>Example:
jf schema data.jsonOutput:
{
"config": {
"debug": boolean
}
"users": [
{
"id": number
"name": string
}
]
}
jf analyze <INPUT>Output:
JSON Analysis:
Byte Size: 1234 bytes
Max Depth: 4
Object Count: 15
Total Keys: 42
Array Count: 3
Max Array Length: 100
jf search <INPUT> -p <PATH>Example:
jf search data.json -p "users[0].name"jf paths <INPUT>Based on schema analysis, calculate P90 length for each array path. If P90 ≤ entity-threshold, mark as entity.
# Default threshold 2000
jf format data.json
# Stricter (only short objects count as entities)
jf format data.json --entity-threshold 100# 1. Generate prompt
jf prompt data.json > prompt.txt
# 2. Send to LLM, get entity list
# LLM returns: ["users[*]", "orders[*]"]
# 3. Format with labeled entities
jf format data.json --entities '["users[*]","orders[*]"]'| Mode | Description | Token Efficiency | Readability |
|---|---|---|---|
| Smart | Entities inline, structure expanded | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Compact | Fully compressed | ⭐⭐⭐⭐⭐ | ⭐ |
| Pretty | Fully expanded | ⭐⭐ | ⭐⭐⭐⭐⭐ |
Add to your Cargo.toml:
[dependencies]
llm_json_formatter = "0.1"use llm_json_formatter::{LlmJsonFormatter, Config, FormatMode};
fn main() {
let config = Config {
mode: FormatMode::Smart,
..Default::default()
};
let mut formatter = LlmJsonFormatter::new(config);
let json = r#"{"users":[{"id":1,"name":"Alice"}]}"#;
let result = formatter.format(json).unwrap();
println!("{}", result);
}use llm_json_formatter::{LlmJsonFormatter, Config, FormatMode};
use std::collections::HashSet;
fn main() {
let mut entities = HashSet::new();
entities.insert("users[*]".to_string());
entities.insert("orders[*]".to_string());
let config = Config {
mode: FormatMode::Smart,
entities,
entity_threshold: 0, // Disable auto detection
..Default::default()
};
let mut formatter = LlmJsonFormatter::new(config);
let result = formatter.format(json).unwrap();
}use llm_json_formatter::generate_schema;
use serde_json::Value;
fn main() {
let json: Value = serde_json::from_str(r#"{"id":1,"name":"Alice"}"#).unwrap();
let schema = generate_schema(&json, 0);
println!("{}", schema);
// Output:
// {
// "id": number
// "name": string
// }
}use llm_json_formatter::{LlmJsonFormatter, Config};
fn main() {
let formatter = LlmJsonFormatter::new(Config::default());
let metadata = formatter.get_metadata(json).unwrap();
println!("Size: {} bytes", metadata.byte_size);
println!("Depth: {}", metadata.depth);
println!("Objects: {}", metadata.object_count);
}pub struct Config {
pub mode: FormatMode, // Smart | Compact | Pretty
pub output_syntax: OutputSyntax, // Auto | Json | Json5
pub sort_strategy: SortStrategy, // Alphabetic | Smart
pub indent: usize, // Default: 2
pub inline_limit: usize, // Default: 80
pub array_item_inline_limit: usize, // Default: 2048
pub entity_threshold: usize, // Default: 2000
pub entities: HashSet<String>, // User-specified entity paths
}Smart: Intelligent formatting with entity detectionCompact: Minimized single-line outputPretty: Standard indented output
Alphabetic: Sort keys alphabeticallySmart: Sort by importance (id/name/type first, _internal last)
Auto: Follow detected input syntax (JSON or JSON5)Json: Always emit JSONJson5: Always emit JSON5
Run all tests:
# Run all tests
cargo test
# Run integration tests only
cargo test --test cli_tests
# Run with output
cargo test -- --nocaptureThe test suite includes 17+ integration tests covering:
- ✅ Shortcut commands (single file, multiple files)
- ✅ Explicit format commands with various modes
- ✅ All CLI subcommands (analyze, schema, paths, search)
- ✅ Error handling (invalid JSON, nonexistent files)
- ✅ Help and version flags
Run benchmarks:
cargo benchMIT