Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,8 @@ jobs:
- name: Build
run: npm run build

- name: Type tests
run: npm run test:types

- name: Test
run: npm test
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,9 @@ try {

```sh
npm install
npm run check
npm test
npm run test:types
npm run build
npm run bench:codspeed
npm run benchmark:memory
Expand All @@ -238,7 +240,7 @@ npm run benchmark:session-analytics

## Status

ColQL v0.4.x introduces public query diagnostics and continues moving toward API stabilization, but breaking changes may still happen before 1.0.0. The API is not fully frozen.
ColQL v0.5.x focuses on trust and stability hardening: narrower index invalidation after updates, stricter snapshot deserialization, a type-test gate for public TypeScript behavior, and clearer diagnostics. Breaking changes may still happen before 1.0.0, but the project is moving toward API stabilization.

## Limitations

Expand Down
11 changes: 11 additions & 0 deletions benchmarks/codspeed/mutation.bench.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import {

let lazyRebuildTenantId = 62;
let batchScore = 25;
let unrelatedDuration = 30_000;

describe("mutation", () => {
bench("mutation/updateBy/single/10k", () => {
Expand Down Expand Up @@ -46,4 +47,14 @@ describe("mutation", () => {
mediumSessions.indexed.updateBy("id", 7_501, { tenantId: lazyRebuildTenantId });
mediumSessions.indexed.where("tenantId", "=", lazyRebuildTenantId).count();
});

bench("index/no-rebuild/after-unindexed-column-mutation/10k", () => {
unrelatedDuration = unrelatedDuration === 30_000 ? 45_000 : 30_000;
mediumSessions.indexed.updateBy("id", 7_502, { durationMs: unrelatedDuration });
mediumSessions.indexed.where("tenantId", "=", dashboardTenantId).count();
});

bench("index/requery/after-lazy-rebuild/10k", () => {
mediumSessions.indexed.where("tenantId", "=", lazyRebuildTenantId).count();
});
});
2 changes: 2 additions & 0 deletions docs/doc/01-installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ When contributing locally:

```sh
npm install
npm run check
npm test
npm run test:types
npm run build
```

Expand Down
6 changes: 5 additions & 1 deletion docs/doc/06-indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,11 @@ users.where("status", "in", ["active", "passive"]).count();

## Dirty and Lazy Rebuilds

Inserts, deletes, and updates can change internal row positions or indexed values. Row positions are not stable IDs and should not be used as external identifiers. If stable identity is required, define and index an ID column. ColQL marks existing indexes dirty after nonzero mutations. When an indexed query requires a dirty index, ColQL rebuilds it before use. The first indexed query after a mutation may be slower than later queries.
Inserts, deletes, and updates can change internal row positions or indexed values. Row positions are not stable IDs and should not be used as external identifiers. If stable identity is required, define and index an ID column.

Updates dirty only indexes whose indexed columns changed. Updating an unrelated column does not dirty equality or sorted indexes for other columns. Deletes still dirty equality and sorted indexes broadly because physical row positions shift. Inserts update clean equality indexes incrementally and mark sorted indexes dirty.

When an indexed query requires a dirty index, ColQL rebuilds it before use. The first indexed query after a relevant indexed-column mutation may be slower than later queries.

You can rebuild explicitly:

Expand Down
2 changes: 1 addition & 1 deletion docs/doc/07-sorted-indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Candidate row positions are returned in scan order so query output preserves log

## Dirty and Lazy Rebuilds

Sorted indexes are marked dirty after inserts, deletes, and updates. When an indexed query requires a dirty sorted index, ColQL rebuilds it before use. Dirty sorted indexes are not used to return stale results. You can also rebuild eagerly with:
Sorted indexes are marked dirty after inserts, deletes, and updates to that sorted column. Updates to unrelated columns do not dirty the sorted index. Deletes still dirty sorted indexes because physical row positions shift. When an indexed query requires a dirty sorted index, ColQL rebuilds it before use. Dirty sorted indexes are not used to return stale results. You can also rebuild eagerly with:

```ts
users.rebuildSortedIndex("age");
Expand Down
6 changes: 3 additions & 3 deletions docs/doc/08-mutations.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,10 @@ ColQL applies mutation safety rules internally:
- unique-index violations are checked before writing and keep bulk updates all-or-nothing
- predicate deletes delete matched row indexes from highest to lowest
- no-match predicate update/delete returns `{ affectedRows: 0 }`
- nonzero update/delete mutations mark existing indexes dirty
- incremental index maintenance is not attempted
- nonzero updates dirty only indexes for columns changed by the patch
- deletes dirty equality and sorted indexes broadly because row positions shift

Dirty indexes are rebuilt before an indexed query uses them, so index dirtiness affects rebuild cost, not query correctness.
Dirty indexes are rebuilt before an indexed query uses them, so index dirtiness affects rebuild cost, not query correctness. Updating an unindexed column should not make later indexed reads pay a lazy rebuild.

Unique indexes are stricter than equality and sorted indexes. They enforce uniqueness for indexed columns and reject duplicate-producing inserts or updates with `COLQL_DUPLICATE_KEY`. Deletes free unique keys for reuse.

Expand Down
14 changes: 10 additions & 4 deletions docs/doc/11-serialization.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Serialization

ColQL can serialize a table to an `ArrayBuffer`:
ColQL can serialize a process-local table snapshot to an `ArrayBuffer`:

```ts
const buffer = users.serialize();
Expand All @@ -9,7 +9,7 @@ const restored = table.deserialize(buffer);

## What Is Serialized

Serialization stores:
Snapshot serialization stores:

- schema metadata
- row count
Expand All @@ -18,7 +18,7 @@ Serialization stores:
- dictionary column codes and dictionary values
- boolean bit storage

Serialization does not materialize row objects.
Serialization does not materialize row objects. It is not durable storage, a database file format, or a cross-process coordination mechanism.

## What Is Not Serialized

Expand Down Expand Up @@ -79,6 +79,12 @@ console.log(restored.toArray());

## Validation

Deserialization validates the input buffer shape, magic header, version, metadata, and column payload sizes. Invalid input throws `ColQLError` with `COLQL_INVALID_SERIALIZED_DATA`.
Deserialization validates the input buffer shape, magic header, wire-format version, metadata, column names, row-count/capacity relationship, payload offsets, alignment, payload lengths, dictionary values, and dictionary codes. Invalid input throws `ColQLError` with `COLQL_INVALID_SERIALIZED_DATA`.

## Wire Format Policy

The serialized wire-format version is independent from the npm package version. Patch and minor releases should preserve the current wire format when possible, but ColQL is still pre-1.0 and unsupported snapshot versions fail loudly with `COLQL_INVALID_SERIALIZED_DATA`. Snapshots produced by v0.4.x are not guaranteed to be compatible with v0.5.0.

Indexes are never trusted from serialized input. If future metadata contains serialized index state, current ColQL versions reject it rather than loading stale derived row positions.

See [Error Handling](./10-error-handling.md) and [Indexing](./06-indexing.md).
2 changes: 1 addition & 1 deletion docs/doc/12-memory-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ For very broad changes, expect temporary row-index snapshot memory.
- Use indexes for selective hot queries.
- Avoid indexes for columns with low selectivity unless queries prove useful.
- Drop indexes to recover derived-memory overhead.
- Expect the first indexed query after mutation to include lazy rebuild cost if the needed index is dirty.
- Expect the first indexed query after a relevant indexed-column mutation to include lazy rebuild cost if the needed index is dirty. Updates to unrelated columns should not dirty unrelated indexes.
- Avoid `toArray()` for huge result sets when counting or streaming is enough.
- Remember that `heapUsed` alone can under-report typed-array storage; inspect `arrayBuffers` too.

Expand Down
30 changes: 30 additions & 0 deletions docs/doc/13-performance-and-benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ These benchmarks live in `benchmarks/codspeed/` and intentionally use smaller de

CodSpeed covers representative equality-index, `in`, sorted-range, compound filter, projection pushdown, larger filtered materialization, aggregation, scan fallback, mutation, lazy index rebuild, serialization, and backend-style dashboard query scenarios.

For v0.5.x, CodSpeed also tracks the difference between updates that touch indexed columns and updates that touch unrelated columns. Updating an unrelated column should not force a later indexed query to pay lazy rebuild cost.

Treat CodSpeed results as PR-level regression signals, not absolute production throughput claims. The existing manual benchmarks remain the source for larger local runs, memory analysis, 1M-row comparisons, and workload-specific investigation.

Setup is excluded from measured hot paths where doing so keeps the benchmark valid. Some destructive mutation benchmarks cannot safely reuse a table after each iteration, so they include fresh table setup by design; those benchmarks include `setup-inclusive` in their names. High-RME benchmarks should be treated carefully and improved before they are used as release claims.
Expand Down Expand Up @@ -154,6 +156,34 @@ tracked total = heapUsed + arrayBuffers

Use tracked total when comparing ColQL storage with object arrays.

## Release Benchmark Checklist

Before a release, run the correctness gates first:

```sh
npm run build
npm run test:types
npm test
npm run bench:codspeed
```

Then run the local/manual benchmark suite as release evidence, not hard pass/fail thresholds:

```sh
npm run benchmark:memory
npm run benchmark:query
npm run benchmark:indexed
npm run benchmark:range
npm run benchmark:optimizer
npm run benchmark:serialization
npm run benchmark:delete
npm run benchmark:physical-delete
npm run benchmark:array-comparison
ROWS=100000 npm run benchmark:session-analytics
```

Use `COLQL_BENCH_LARGE=1` for indexed and range benchmarks when checking larger local datasets. Memory-sensitive release notes should include `heapUsed`, `rss`, `external`, `arrayBuffers`, and tracked total when those metrics are available.

In a local stabilization run on 2026-04-29, `benchmark:memory` reported for 100,000 rows:

| Storage | heapUsed | arrayBuffers | tracked total |
Expand Down
8 changes: 7 additions & 1 deletion docs/doc/14-typescript-type-safety.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,4 +158,10 @@ Structured helper predicates use `where(...)`; callback predicates use `filter(f

## Type Tests

The repository includes `tests/type-inference.test-d.ts` with `@ts-expect-error` examples. These are useful references for the intended type surface.
The repository includes `tests/type-inference.test-d.ts` with positive inference checks and `@ts-expect-error` examples. These are part of the release gate:

```sh
npm run test:types
```

This protects the public TypeScript surface for predicates, projections, mutations, unique indexes, `query.explain()`, and `onQuery`.
2 changes: 1 addition & 1 deletion docs/doc/15-limitations-and-design-decisions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

ColQL intentionally keeps a narrow, explicit feature set.

ColQL aims to keep the public API reasonably stable, and v0.4.x adds a public diagnostics API with `query.explain()`. Breaking changes may still happen before 1.0.0; the API is not fully frozen.
ColQL aims to keep the public API reasonably stable, and v0.5.x continues hardening public diagnostics, serialization validation, and type/API gates. Breaking changes may still happen before 1.0.0; the API is not fully frozen.

## Not Included

Expand Down
14 changes: 13 additions & 1 deletion docs/doc/16-api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,18 @@ const restored = table.deserialize(buffer);
```ts
type QueryInfo = {
duration: number;
durationMs?: number;
rowsScanned: number;
indexUsed: boolean;
scanType?: "index" | "full";
selectedIndex?: string;
reasonCode?: QueryExplainReasonCode;
candidateRows?: number;
materializedRows?: number;
resultCount?: number;
projectionPushdown?: boolean;
dirtyIndexRebuildPaid?: boolean;
dirtyIndexReason?: "equality" | "sorted" | "unique";
};

type QueryHook = (info: QueryInfo) => void;
Expand Down Expand Up @@ -234,6 +244,7 @@ type QueryExplainReasonCode =
type QueryExplainPlan = {
scanType: "index" | "full";
indexesUsed: readonly string[];
selectedIndex?: string;
predicates: number;
predicateOrder: readonly string[];
projectionPushdown: boolean;
Expand All @@ -248,6 +259,7 @@ Fields:

- `scanType`: whether execution is expected to use an index or full scan.
- `indexesUsed`: selected index labels such as `equality:status` or `sorted:startedAt`.
- `selectedIndex`: the selected index label when an index plan is expected.
- `predicates`: structured predicates plus callback predicates.
- `predicateOrder`: structured predicate evaluation order after planner ordering.
- `projectionPushdown`: `true` when `select(...)` limits materialized columns.
Expand Down Expand Up @@ -371,7 +383,7 @@ users.getIndexedCandidatePlan(filters);
users.getIndexDebugPlan(filters);
```

Use `query.explain()` for public query diagnostics. Queries still expose `__debugPlan()` for internal tests and low-level debugging, but application code should not depend on it as a stable planning contract.
Use `query.explain()` for stable public query diagnostics. The scan/materialization counters and typed reads are advanced diagnostics. `getIndexedCandidatePlan()`, `getIndexDebugPlan()`, and query `__debugPlan()` are unstable internal diagnostics retained for tests and low-level debugging; application code should not depend on them as stable planning contracts.

## Errors

Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@colql/colql",
"version": "0.4.0",
"version": "0.5.0",
"description": "Memory-efficient in-memory columnar query engine for TypeScript",
"main": "dist/index.js",
"module": "dist/index.mjs",
Expand All @@ -12,6 +12,8 @@
"build": "tsup src/index.ts --format cjs,esm --dts",
"dev": "tsup src/index.ts --watch",
"test": "vitest run",
"test:types": "tsc --noEmit -p tsconfig.type-tests.json",
"check": "npm run build && npm run test:types && npm test",
"bench:codspeed": "vitest bench --run --config vitest.bench.config.mts",
"benchmark:memory": "node --expose-gc benchmarks/memory.mjs",
"benchmark:query": "node benchmarks/query.mjs",
Expand Down
Loading
Loading