jf - LLM-Optimized JSON Formatter

A JSON formatting tool optimized for LLM consumption, balancing readability and token efficiency.

Why jf?

When feeding JSON data to LLMs, you face a trade-off:

Pretty format: High readability but wastes tokens on whitespace
Compact format: Token efficient but hard to read

jf solves this by intelligently detecting "entity objects" (like user records, order items) and keeping them on single lines while expanding the overall structure. This achieves ~40% token reduction compared to pretty format while maintaining excellent readability.

Features

Entity-Aware Formatting: Auto-detect "entity objects" in arrays and keep them on single lines
Schema Statistics: Infer entities based on P90 length analysis
LLM-Assisted Labeling: Generate prompts for LLM to identify entities, with manual override support
Smart Key Sorting: Alphabetic or weighted sorting (id/name first)
Multiple Format Modes: Smart / Compact / Pretty
JSON5 Support: Accept JSON5 input and optionally emit JSON5 output
Schema Extraction: Generate compact type schemas from JSON data

Installation

Homebrew (macOS / Linux)

brew tap luw2007/tap
brew install jf

Prebuilt Binaries (GitHub Releases)

Download the matching archive from GitHub Releases, then extract and move jf into your PATH.

tar -xzf jf-vX.Y.Z-<target>.tar.gz
sudo install -m 0755 jf /usr/local/bin/jf
jf --help

From Source

git clone https://github.com/luw2007/llm_json_formatter.git
cd llm_json_formatter
cargo build --release
# Binary: target/release/jf

From Crates.io

cargo install llm_json_formatter

Quick Start

# Quick format - shortcut (automatically uses format command)
jf data.json

# Format multiple files
jf file1.json file2.json file3.json

# Default formatting (auto-detect entities) - explicit command
jf format data.json

# Output:
# {
#   "users": [
#     {"id":1,"name":"Alice"},
#     {"id":2,"name":"Bob"}
#   ]
# }

Commands

format - Format JSON

jf format <INPUT> [OPTIONS] [-o <OUTPUT>]

Options:

Option	Default	Description
`-m, --mode`	smart	Format mode: `smart` / `compact` / `pretty`
`--sort`	alphabetic	Key sorting: `alphabetic` / `smart`
`--indent`	2	Indentation spaces
`--inline-limit`	80	Max line length for inline objects in Smart mode
`--array-item-inline-limit`	2048	Max line length for array items (entities)
`--entity-threshold`	2000	Length threshold for auto-detected entities
`--entities`	-	Comma-separated or JSON array of entity paths
`--output-syntax`	auto	Output syntax: `auto` / `json` / `json5`

Examples:

# Quick format (shortcut, defaults to smart mode)
jf data.json

# Compact mode (minimum tokens)
jf format data.json --mode compact

# Pretty mode (maximum readability)
jf format data.json --mode pretty

# Manually specify entities
jf format data.json --entities "users[*],orders[*]"

# JSON array format for entities
jf format data.json --entities '["users[*]","orders[*]"]'

# Disable auto entity detection
jf format data.json --entity-threshold 0

# Smart key sorting (id/name first)
jf format data.json --sort smart

# Auto output syntax (JSON in -> JSON out, JSON5 in -> JSON5 out)
jf format data.json5 --mode compact --output-syntax auto

# Force JSON5 output
jf format data.json --mode compact --output-syntax json5

prompt - Generate LLM Prompt

Generate a prompt containing schema structure and samples for LLM to identify "business entities".

jf prompt <INPUT>

Example:

jf prompt data.json

Output:

Analyze the JSON schema below and identify 'Business Entities'...

Schema:
{
  "users": [{...}]
}

Array paths and samples:

Path: users[*]
Samples:
  - {"id":1,"name":"Alice"...
  - {"id":2,"name":"Bob"...

Output ONLY a JSON array of entity paths...

schema - Extract Schema

Extract a compact type schema from JSON data.

jf schema <INPUT>

Example:

jf schema data.json

Output:

{
  "config": {
    "debug": boolean
  }
  "users": [
    {
      "id": number
      "name": string
    }
  ]
}

analyze - Analyze JSON Structure

jf analyze <INPUT>

Output:

JSON Analysis:
  Byte Size: 1234 bytes
  Max Depth: 4
  Object Count: 15
  Total Keys: 42
  Array Count: 3
  Max Array Length: 100

search - Path Query

jf search <INPUT> -p <PATH>

Example:

jf search data.json -p "users[0].name"

paths - List All Paths

jf paths <INPUT>

Entity Detection

Option 1: Automatic Statistical Inference

Based on schema analysis, calculate P90 length for each array path. If P90 ≤ entity-threshold, mark as entity.

# Default threshold 2000
jf format data.json

# Stricter (only short objects count as entities)
jf format data.json --entity-threshold 100

Option 2: LLM-Assisted Labeling

# 1. Generate prompt
jf prompt data.json > prompt.txt

# 2. Send to LLM, get entity list
# LLM returns: ["users[*]", "orders[*]"]

# 3. Format with labeled entities
jf format data.json --entities '["users[*]","orders[*]"]'

Format Mode Comparison

Mode	Description	Token Efficiency	Readability
Smart	Entities inline, structure expanded	⭐⭐⭐⭐	⭐⭐⭐⭐
Compact	Fully compressed	⭐⭐⭐⭐⭐	⭐
Pretty	Fully expanded	⭐⭐	⭐⭐⭐⭐⭐

Library Usage

Add to your Cargo.toml:

[dependencies]
llm_json_formatter = "0.1"

Basic Usage

use llm_json_formatter::{LlmJsonFormatter, Config, FormatMode};

fn main() {
    let config = Config {
        mode: FormatMode::Smart,
        ..Default::default()
    };

    let mut formatter = LlmJsonFormatter::new(config);
    let json = r#"{"users":[{"id":1,"name":"Alice"}]}"#;

    let result = formatter.format(json).unwrap();
    println!("{}", result);
}

Custom Entity Detection

use llm_json_formatter::{LlmJsonFormatter, Config, FormatMode};
use std::collections::HashSet;

fn main() {
    let mut entities = HashSet::new();
    entities.insert("users[*]".to_string());
    entities.insert("orders[*]".to_string());

    let config = Config {
        mode: FormatMode::Smart,
        entities,
        entity_threshold: 0, // Disable auto detection
        ..Default::default()
    };

    let mut formatter = LlmJsonFormatter::new(config);
    let result = formatter.format(json).unwrap();
}

Generate Schema

use llm_json_formatter::generate_schema;
use serde_json::Value;

fn main() {
    let json: Value = serde_json::from_str(r#"{"id":1,"name":"Alice"}"#).unwrap();
    let schema = generate_schema(&json, 0);
    println!("{}", schema);
    // Output:
    // {
    //   "id": number
    //   "name": string
    // }
}

Analyze JSON Metadata

use llm_json_formatter::{LlmJsonFormatter, Config};

fn main() {
    let formatter = LlmJsonFormatter::new(Config::default());
    let metadata = formatter.get_metadata(json).unwrap();

    println!("Size: {} bytes", metadata.byte_size);
    println!("Depth: {}", metadata.depth);
    println!("Objects: {}", metadata.object_count);
}

API Reference

Config

pub struct Config {
    pub mode: FormatMode,           // Smart | Compact | Pretty
    pub output_syntax: OutputSyntax, // Auto | Json | Json5
    pub sort_strategy: SortStrategy, // Alphabetic | Smart
    pub indent: usize,              // Default: 2
    pub inline_limit: usize,        // Default: 80
    pub array_item_inline_limit: usize, // Default: 2048
    pub entity_threshold: usize,    // Default: 2000
    pub entities: HashSet<String>,  // User-specified entity paths
}

FormatMode

Smart: Intelligent formatting with entity detection
Compact: Minimized single-line output
Pretty: Standard indented output

SortStrategy

Alphabetic: Sort keys alphabetically
Smart: Sort by importance (id/name/type first, _internal last)

OutputSyntax

Auto: Follow detected input syntax (JSON or JSON5)
Json: Always emit JSON
Json5: Always emit JSON5

Testing

Run all tests:

# Run all tests
cargo test

# Run integration tests only
cargo test --test cli_tests

# Run with output
cargo test -- --nocapture

The test suite includes 17+ integration tests covering:

✅ Shortcut commands (single file, multiple files)
✅ Explicit format commands with various modes
✅ All CLI subcommands (analyze, schema, paths, search)
✅ Error handling (invalid JSON, nonexistent files)
✅ Help and version flags

Benchmarks

Run benchmarks:

cargo bench

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
benches		benches
docs		docs
examples		examples
packaging/homebrew		packaging/homebrew
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
QUICKSTART_RELEASE.md		QUICKSTART_RELEASE.md
README.md		README.md
SKILL.md		SKILL.md
test_data.json		test_data.json
test_map.json		test_map.json

Folders and files

Latest commit

History

Repository files navigation

jf - LLM-Optimized JSON Formatter

Why jf?

Features

Installation

Homebrew (macOS / Linux)

Prebuilt Binaries (GitHub Releases)

From Source

From Crates.io

Quick Start

Commands

format - Format JSON

prompt - Generate LLM Prompt

schema - Extract Schema

analyze - Analyze JSON Structure

search - Path Query

paths - List All Paths

Entity Detection

Option 1: Automatic Statistical Inference

Option 2: LLM-Assisted Labeling

Format Mode Comparison

Library Usage

Basic Usage

Custom Entity Detection

Generate Schema

Analyze JSON Metadata

API Reference

Config

FormatMode

SortStrategy

OutputSyntax

Testing

Benchmarks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages