Skip to content

zero-intelligence/graviton

Repository files navigation

graviton

Zero-Dependency Document Intelligence Substrate — Local semantic embeddings, multi-topic clustering, revision diffing, and structural signal extraction running offline at the edge in under 1 millisecond.

The semantic shape of a document exists in the geometry of its text, not in the remote weights of a billion-parameter model.

For years, we have done magnificent work with vector databases and API-based LLM embedding endpoints. We have mastered the art of shipping chunks to the cloud to retrieve context.

And yet, a ghost haunts the runtime. The developer pays a steep tax in API latency, cold-start delays, dependency bloat, and serverless compute costs just to understand if two paragraphs in the same process address the same topic.

graviton is a zero-dependency document intelligence substrate. It runs locally, offline, and at the edge in under 1 millisecond. It packs quantized semantic geometry directly into your runtime to chunk, cluster, extract, compare, and compress document signals with zero network requests and zero configuration.

Try the Live Web Sandbox & Visualizer →


The Vector Tax

Dimension graviton transformers.js OpenAI / API
Cold Start ~80ms (disk read) 2,000–8,000ms 50–200ms (network)
In-Process Latency <0.5ms 50–500ms (CPU) 100–300ms (HTTP)
Bundle Footprint ~13MB (vectors included) ~80MB+ 0 (but requires client library)
Edge Compatibility Out-of-the-box Requires WASM hacks Network dependent
Financial Cost $0.00 $0.00 Metered billing

Install

npm install @zero-intelligence/graviton

Quickstart

import { extractSignal } from '@zero-intelligence/graviton'

// Bundled semantic geometries are loaded automatically.
const packet = await extractSignal(documentText, 'deep')

console.log(packet.intrinsicRank)      // Semantic complexity: number of independent axes
console.log(packet.spectralEntropy)    // Trajectory focus: 0 (laser-focused) to 1 (scattered)
console.log(packet.documentShape)       // Structural categorization: 'focused' | 'structured' | 'fragmented'
console.log(packet.contextBlock)       // High-density content payload ready to inject into your LLM prompt

To avoid cold-start latency on the first lazy-load in production, pre-warm the memory store at startup:

import { preload } from '@zero-intelligence/graviton/signal'
await preload() // Reads and decompresses the 13MB binary into memory (~80ms)

Two Primitives

1. Embed Layer (@zero-intelligence/graviton/embed)

Low-level operations for vector math, search, clustering, and deduplication.

import { embed, nearestVectors, kmeans, simhash, ngramSimhash } from '@zero-intelligence/graviton/embed'

// Embed in hash mode (zero-data footprint, Johnson-Lindenstrauss distance preservation guarantee):
const vector = embed('zero-dependency local embedding', null)

// Scan a corpus in-memory (O(N) scan, resolves in <1ms for up to 50,000 vectors):
const nearest = nearestVectors(queryVector, corpusVectors, 5)

// Cluster semantic vectors:
const { assignments, centroids } = kmeans(corpusVectors, 3)

// Deduplicate documents using locality-sensitive hashing:
const hash = simhash(text)
const robustHash = ngramSimhash(text) // Robust against vocabulary edits

2. Signal Pipeline (@zero-intelligence/graviton/signal)

High-level document processing that chunk, projects, decomposes, and synthesizes raw text into structured signals.

import { extractSignal } from '@zero-intelligence/graviton/signal'

const packet = await extractSignal(documentText, 'deep')

packet.contextBlock             // Formatted text block optimized for LLM prompting
packet.contextLayers            // Multi-resolution semantic wavelet layers
packet.topics                   // Cluster themes and their proportional dominance
packet.keySentences             // Central sentences selected via Maximal Marginal Relevance
packet.keyTerms                 // Key vocabulary terms ranked by TF-IDF and TextRank
packet.facts                    // Extracted structured entities (SemVer, IPs, URLs, dates, numbers)
packet.sectionBreaks            // Indices representing semantic discontinuity / drift
packet.intrinsicRank            // SVD-derived count of independent document concepts
packet.spectralEntropy          // DFT-derived structural noise index [0.0 - 1.0]
packet.documentShape            // Coherence classification: 'focused' | 'structured' | 'fragmented'
packet.activationProfile        // Top GloVe dimensions driving the primary document topic
packet.phaseSpectrum            // DFT phase representation for structural alignment

SignalPacket Architecture

Structural Descriptors

Property Interface Meaning
intrinsicRank number The minimum number of singular axes required to capture ≥95% of document variance. Represents conceptual density.
spectralEntropy number The normalized entropy of the power spectrum of the semantic trajectory. Measures narrative linearity.
documentShape string 'focused' (single tight topic), 'structured' (well-partitioned multi-topic), or 'fragmented' (topic drift).
reconstructionConfidence number Score indicating the descriptive coverage of extracted key sentences over the raw document sections.

Architectural Discoveries

  • TextRank ≡ SVD U[0]: Traditional PageRank over document graphs is mathematically equivalent to the first left singular vector of the embedding matrix. graviton uses this relation to replace $O(N^2)$ iterations with a fast $O(N \cdot K)$ projection.
  • DFT Phase Alignment: The phase spectrum of the semantic trajectory encodes the narrative structure of a text. compareSignals(a, b) uses phase delta to classify relations (e.g., Q&A pairs align in complementary phase phase-shifted by $\pi$).
  • Multi-Resolution Wavelets: Documents carry information at different scales. contextLayers models this as a semantic wavelet, exposing layers from global summaries down to raw fact ledgers.

Multi-Resolution Prompting

SLMs and LLMs fail when flooded with raw noise. Compress your prompts dynamically based on token constraints:

const packet = await extractSignal(contractDraft, 'deep')

// 1. High-level conceptual summary only (Minimal tokens)
const summaryContext = packet.contextLayers
  .filter(layer => layer.resolution <= 1)
  .map(layer => `[${layer.label}]\n${layer.content}`)
  .join('\n\n')

// 2. Full semantic profile (Optimal density)
const fullContext = packet.contextLayers
  .map(layer => `[${layer.label}]\n${layer.content}`)
  .join('\n\n')

Version Tracking & Revision Audits

Track documents over time, detect drift, and compare revisions structurally.

1. Document Relationships (compareSignals)

import { extractSignal, compareSignals } from '@zero-intelligence/graviton/signal'

const qPacket = await extractSignal(questionText)
const aPacket = await extractSignal(answerText)

const relation = compareSignals(qPacket, aPacket)
// relation.phaseRelationship -> 'complementary' (Q&A align on opposite phases)
// relation.phaseRelationship -> 'aligned' (same content/stance)

2. Semantic and Fact Differentials (diffSignals)

import { extractSignal, diffSignals } from '@zero-intelligence/graviton/signal'

const v1 = await extractSignal(draftV1)
const v2 = await extractSignal(draftV2)

const diff = diffSignals(v1, v2)
console.log(diff.cosineSimilarity)  // Semantic similarity of content trajectories
console.log(diff.addedKeyTerms)     // Vocabulary introduced in V2
console.log(diff.addedFacts)        // Added structured elements (monetary values, IP addresses, dates)

3. Structural Fingerprinting (temporalFingerprint)

Avoid retaining raw texts or large JSON packets to run compliance history checks. Hash the document's phase spectrum into a 64-bit signature:

import { extractSignal, temporalFingerprint } from '@zero-intelligence/graviton/signal'
import { hammingDistance } from '@zero-intelligence/graviton/embed'

const fpV1 = temporalFingerprint(v1.phaseSpectrum) // "ffff000001ffffff"
const fpV2 = temporalFingerprint(v2.phaseSpectrum) // "ffff00000fffffff"

const edits = hammingDistance(fpV1, fpV2) // Distance <= 3 indicates structural preservation

Command Line Interface (CLI)

graviton compiles to a zero-dependency CLI binary. It includes an in-process regex-driven PDF text extractor to analyze files directly from the shell.

# Get a styled console summary of any text or PDF file:
npx graviton analyze legal_contract.pdf

# Dump raw JSON metadata with specialized presets:
npx graviton analyze source_code.py --format json --presets dev --dynamic-threshold 0.4

Noise Controls & Fact Presets

Isolate the signals that matter by controlling noise dynamically.

  • DEV_PRESETS: Matches SemVer, Docker images, SQL statements, IP:port combinations, and CSS colors.
  • FINANCIAL_PRESETS: Matches IBANs, credit card numbers, SWIFT codes, and Tax IDs.
  • ACADEMIC_PRESETS: Matches DOIs, arXiv IDs, LaTeX equations, and measurement metrics.
import { extractSignal, DEV_PRESETS } from '@zero-intelligence/graviton/signal'

const packet = await extractSignal(codeText, 'deep', {
  customFactPatterns: [...DEV_PRESETS],
  customStopwords: ['import', 'require', 'const'],
  dynamicStopwordThreshold: 0.4 // Drop any term appearing in >40% of code chunks
})

Framework Integrations

Native adapters to compress prompt sizes dynamically before invocation:

// 1. Vercel AI SDK
import { gravitonAiMiddleware } from '@zero-intelligence/graviton/integrations'
import { wrapLanguageModel } from 'ai'
import { openai } from '@ai-sdk/openai'

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: gravitonAiMiddleware({ minLength: 500, format: 'latent' })
})

// 2. LangChain JS
import { GravitonDocumentCompressor } from '@zero-intelligence/graviton/integrations'
const compressor = new GravitonDocumentCompressor({ preset: 'standard' })
const denseDocs = await compressor.transformDocuments(docs)

// 3. Mastra JS
import { gravitonMastraExecute } from '@zero-intelligence/graviton/integrations'
import { createTool } from '@mastra/core/tools'

const compressTool = createTool({
  id: 'compress-document',
  description: 'Mathematically compress a text document',
  inputSchema: z.object({ text: z.string() }),
  execute: gravitonMastraExecute({ preset: 'deep' })
})

License

MIT © zero-intelligence

About

Zero-dependency local document intelligence substrate. Extract semantic embeddings, multi-topic clusters, revision diffs, and structural signal packets offline at the edge in <1ms. Ships pre-trained vector geometry directly in the runtime.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors