Skip to content

emremy/ColQL

Repository files navigation

banner

CI npm version npm downloads CodSpeed Badge license

ColQL is a zero-dependency, process-local, in-memory columnar query engine for TypeScript apps that need compact storage, typed schemas, explicit indexes, and safe mutations.

It is not a SQL database, persistence layer, distributed system, or durable storage format. ColQL is for data you already want to keep inside a Node.js process.

Why ColQL?

  • Compact columnar storage backed by typed arrays, dictionaries, and bit-packed booleans
  • Lazy queries with filtering, projection, aggregation, streaming, limit, and offset
  • Object predicates plus tuple-style where(column, operator, value)
  • Explicit equality indexes and sorted numeric indexes for hot predicates
  • Internal background-indexing architecture for large dirty-index rebuilds
  • Unique indexes for stable ID lookups and duplicate-key protection
  • Public query.explain() diagnostics for planner visibility without executing queries
  • JS Array migration helpers such as fromRows, firstWhere, countWhere, and exists
  • Mutable tables with updateMany and deleteMany
  • Runtime validation with structured ColQLError codes
  • Binary serialization for table data
  • TypeScript inference for rows, predicates, projections, and mutation payloads
  • Zero runtime dependencies

Install

npm install @colql/colql

Quick Example

import { column, table } from "@colql/colql";

const users = table({
  id: column.uint32(),
  age: column.uint8(),
  status: column.dictionary(["active", "passive", "archived"] as const),
  score: column.float64(),
  verified: column.boolean(),
});

users.insertMany([
  { id: 1, age: 29, status: "active", score: 91.5, verified: true },
  { id: 2, age: 17, status: "passive", score: 72.0, verified: false },
  { id: 3, age: 44, status: "active", score: 88.2, verified: true },
]);

users.createIndex("status");
users.createSortedIndex("age");
users.createUniqueIndex("id");

const activeAdults = users
  .where({
    status: "active",
    age: { gte: 18 },
  })
  .select(["id", "age", "score"])
  .toArray();

console.log(
  users.where({ status: "active", age: { gte: 18 } }).select(["id"]).explain(),
);

const result = users.updateMany(
  { status: "passive", age: { lt: 18 } },
  { status: "archived" },
);

console.log(activeAdults);
console.log(result.affectedRows);

console.log(users.findBy("id", 1));

Index Lifecycle

Users define indexes; ColQL manages index lifecycle internally. Equality and sorted indexes are derived performance structures, and query results must match full-scan results whether an index is fresh, dirty, queued, rebuilding, failed, or absent.

v0.6.0 adds the internal storage descriptors, lifecycle states, generation checks, bounded worker infrastructure, real Node worker_threads executor, atomic apply guards, diagnostics, and benchmarks needed for background equality and sorted rebuilds. Normal public query APIs remain synchronous, and the public default execution path still preserves the v0.5-style synchronous dirty-index rebuild fallback until automatic background scheduling is exposed.

Background worker rebuilds are not a universal speedup. Small and medium datasets may be faster with synchronous rebuild because worker startup, message passing, and merge overhead can dominate. The worker path is intended to isolate user-facing latency for large dirty index rebuilds, especially sorted indexes.

Queued, rebuilding, and failed indexes are not used for query results. ColQL falls back to a scan or another fresh index to preserve correctness. query.explain() remains non-executing: it does not scan rows, materialize rows, call onQuery, rebuild dirty indexes, or schedule background jobs.

Serialization stores table data only. It does not serialize trusted indexes, index lifecycle state, or worker jobs; recreate indexes after restore when indexed performance or unique-index enforcement is needed.

Performance Snapshot

ColQL includes a Fastify example that can boot with 1M deterministic rows and exercise indexed, range, scan, callback-filter, mutation, stress, and memory paths.

These are local reference numbers from examples/fastify-api after the example's mutation validation path, not guarantees. Actual numbers vary by Node.js version, CPU, memory pressure, data distribution, query selectivity, projection size, and mutation frequency.

1M-row workload Local avg Expected shape
Selective equality query with createIndex() 2.06ms Fastest path when candidate sets are small
Numeric range query with createSortedIndex() 27.69ms Helps selective ranges; broad ranges may resemble scans
Broad structured predicate 25.31ms May intentionally scan when an index is not selective
filter(fn) callback predicate 218.93ms Full-scan escape hatch; not index-aware

ColQL also includes a JS Array comparison benchmark. It is meant to show tradeoffs, not to prove that ColQL is always faster. In local 1M-row runs, JS arrays are often faster for simple broad scans and callback predicates, while ColQL is strongest when compact storage, projection with limits, structured predicates, or explicit indexed lookups matter.

Run the example locally:

cd examples/fastify-api
npm install
npm run test:large

For benchmark scripts and interpretation notes, see Performance and Benchmarks. For PR-level regression checks, run npm run bench:codspeed; CodSpeed uses smaller deterministic datasets as regression signals, not absolute throughput claims. Setup is excluded where possible, while destructive mutation benchmarks that need fresh tables use setup-inclusive names. Larger 1M-row comparisons remain covered by the local/manual benchmark scripts. For JS Array comparisons, run npm run benchmark:array-comparison; results are local guidance, not universal promises or CI requirements. For a scenario-style local workload, run npm run benchmark:session-analytics. For background-indexing regression visibility, run npm run benchmark:background-indexing -- --json and npm run benchmark:worker-runtime -- --json.

When To Use ColQL

Use ColQL when:

  • you need to keep thousands to millions of records in memory
  • JavaScript object arrays use too much memory
  • filters and aggregations should avoid intermediate arrays
  • a TypeScript schema can describe your columns
  • explicit indexes are acceptable for hot equality or range predicates
  • stable identity can be modeled with an explicit ID column and unique index
  • runtime validation matters because data may come from untyped sources

Avoid ColQL when:

  • you need durable storage, transactions, joins, or SQL
  • data must be shared across pods, workers, processes, or machines
  • writes dominate the workload and broad indexes are frequently dirtied
  • row indexes must be stable external identifiers
  • a small/simple JavaScript array is already clear and fast enough
  • every query requires arbitrary sorting or grouping
  • you need concurrent writers or multi-process coordination
  • you want automatic indexes, compound indexes, or query planning across tables
  • you want analytical SQL over files or large columnar datasets, where DuckDB may be a better fit

Row indexes are physical positions and can change after deletes. Use an explicit id column for stable identity.

Decision Guide

Tool Good fit Not a good fit
ColQL Process-local TypeScript data that benefits from compact in-memory columns, explicit indexes, validation, and inspectable query plans Persistence, SQL, joins, transactions, shared state, or distributed coordination
JavaScript arrays Small or simple datasets, ad hoc transforms, or write-heavy logic where object arrays are already clear and fast enough Memory-sensitive data, repeated projections, structured indexed lookups, or runtime schema validation
SQLite Durable embedded relational storage with SQL, transactions, and indexes Pure process-local ephemeral caches where a database file and SQL layer are unnecessary
DuckDB Analytical SQL, file-based analytics, large columnar datasets, and ad hoc aggregations TypeScript-first mutable process-local tables with explicit in-memory indexes

Examples

The Fastify example demonstrates HTTP query params mapped to object predicates, range queries, filter(fn), updateMany, deleteMany, query diagnostics, index stats, and memory counters.

Documentation

Detailed documentation is available under docs/doc.

Recommended reading:

Common APIs

users.insert(row);
users.insertMany(rows);

users.where({ status: "active", age: { gte: 18 } }).toArray();
users.where("age", ">=", 18).select(["id"]).toArray();
users.whereIn("status", ["active", "passive"]);
users.filter((row) => row.score > 90);
users.where({ status: "active" }).explain();

users.count();
users.avg("age");
users.top(10, "score");

users.update(0, { status: "active" });
users.updateMany({ status: "passive" }, { status: "active" });
users.deleteMany({ status: "archived" });

users.createIndex("id");
users.createSortedIndex("age");
users.createUniqueIndex("id");
users.findBy("id", 123);

const buffer = users.serialize();
const restored = table.deserialize(buffer);
restored.createIndex("status");

filter(fn) is intentionally a full-scan escape hatch. Prefer structured predicates when you want index planning. query.explain() returns structured diagnostics without executing the query, scanning rows, materializing rows, calling onQuery, or rebuilding dirty indexes.

Error Handling

ColQL validates schemas, inserted rows, query predicates, mutation payloads, indexes, and serialized input at runtime. Failures throw ColQLError with a stable code, a message, and optional details.

import { ColQLError } from "@colql/colql";

try {
  users.insert({ id: 4, age: 300, status: "active", score: 1, verified: true });
} catch (error) {
  if (error instanceof ColQLError) {
    console.log(error.code); // COLQL_OUT_OF_RANGE
  }
}

Development

npm install
npm run check
npm test
npm run test:types
npm run build
npm run test:worker-runtime
npm run bench:codspeed
npm run benchmark:memory
npm run benchmark:query
npm run benchmark:indexed
npm run benchmark:range
npm run benchmark:optimizer
npm run benchmark:serialization
npm run benchmark:delete
npm run benchmark:array-comparison
npm run benchmark:session-analytics
npm run benchmark:background-indexing -- --json
npm run benchmark:worker-runtime -- --json

Status

ColQL v0.6.0 focuses on zero-copy background-indexing architecture while preserving synchronous public query APIs and v0.5-style synchronous dirty-index fallback behavior. Breaking changes may still happen before 1.0.0, but the project is moving toward API stabilization.

Limitations

ColQL intentionally does not include SQL parsing, joins, transactions, concurrency control, automatic index creation, compound indexes, or durable storage. Equality and sorted indexes are derived performance structures; query results must be the same whether ColQL uses an index or a full scan. Unique indexes are derived too, but they also enforce uniqueness while present and are not serialized.