Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/doc/00-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Use ColQL when:
Avoid ColQL when:

- you need durable storage, transactions, joins, or SQL
- row indexes must be stable identifiers
- row indexes must be stable external identifiers
- every query requires arbitrary sorting or grouping
- you need concurrent writers or multi-process coordination

Expand All @@ -59,6 +59,8 @@ Avoid ColQL when:
- Runtime validation and structured `ColQLError` failures
- Binary serialization and deserialization

Indexes are derived performance structures. Query results must be the same whether ColQL uses an index or a full scan.

## Quick Example

```ts
Expand Down
2 changes: 1 addition & 1 deletion docs/doc/03-inserts-and-bulk-loading.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ users.insertMany([
]);
```

`insertMany` validates every row before inserting any row. If one row is invalid, the table is not partially mutated.
`insertMany` validates every row before inserting any row. If one row is invalid, the table is not partially mutated. It is also optimized for bulk insertion, so prefer it over repeated `insert` calls when loading batches.

```ts
try {
Expand Down
46 changes: 46 additions & 0 deletions docs/doc/04-querying.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,21 @@ users.where("status", "=", "active");
users.where("is_active", "=", true);
```

Object predicates are also supported:

```ts
people.where({
age: { gt: 25 },
active: true,
});

people.where({
country: { in: ["TR", "US"] },
});
```

Object `where` is syntactic sugar over the same structured predicates as tuple `where(column, operator, value)`. It can still use equality and sorted indexes when the translated predicates are indexable.

Supported operators:

```txt
Expand All @@ -25,6 +40,35 @@ not in

Range operators (`>`, `>=`, `<`, `<=`) are supported for numeric columns. Equality and membership are supported for numeric, dictionary, and boolean columns, subject to validation.

Object predicates use these operator names:

```ts
users.where({ age: { eq: 25 } });
users.where({ age: { gt: 25, lte: 65 } });
users.where({ country: { in: ["TR", "US"] } });
```

Numeric columns support `eq`, `gt`, `gte`, `lt`, `lte`, and `in`. Boolean and dictionary columns support equality/default values and `in`.

## Callback Filters

Use `filter(fn)` as a full-scan escape hatch when a predicate is easier to express in TypeScript:

```ts
users.filter((row) => row.age > 25);
```

Callback filters run after structured predicates:

```ts
const rows = users
.where({ status: "active" })
.filter((row) => row.age > 25)
.toArray();
```

`filter(fn)` is not index-aware. Structured predicates run first; callback filters then run as a full-scan callback pass over rows that remain eligible.

## Membership Helpers

```ts
Expand Down Expand Up @@ -90,4 +134,6 @@ for (const row of users.where("status", "=", "active")) {

Without a usable index, queries scan row indexes from `0` to `rowCount - 1`. If an equality or sorted index exists, ColQL may use it automatically when the planner estimates the candidate set is selective enough. Broad indexed queries may still fall back to scan to avoid index overhead.

Indexes and planner choices affect performance only. A query must return the same result whether ColQL uses an index or a full scan.

See [Equality Indexes](./06-indexing.md) and [Sorted Indexes](./07-sorted-indexes.md).
14 changes: 7 additions & 7 deletions docs/doc/06-indexing.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Equality Indexes

Equality indexes are optional derived structures for selective equality and membership queries.
Equality indexes are optional derived performance structures for selective equality and membership queries. A query must return the same result whether ColQL uses an index or a full scan.

```ts
users.createIndex("id");
Expand All @@ -25,7 +25,7 @@ users.rebuildIndexes();

## Supported Columns and Operators

Equality indexes support numeric and dictionary columns. Boolean columns are not supported by equality indexes.
Equality indexes support numeric and dictionary columns. Boolean columns are not supported by equality indexes because scanning low-cardinality boolean values is often as efficient as indexing them.

Indexed operators:

Expand All @@ -38,13 +38,13 @@ Not indexed:
- `!=`
- `not in`
- boolean columns
- compound predicates as a combined compound index
- multi-column compound indexes

Queries can still use unsupported predicates; they scan instead.
ColQL does not build a combined index for compound predicates. Multiple predicates are still combined at query time. Queries can still use unsupported predicates; they scan instead. Fallback to scan affects performance only, not correctness.

## Planner Behavior

ColQL uses a cost-aware planner. If an index exists but would return too many candidate rows, the planner can fall back to a scan. This avoids allocating or iterating a broad index candidate set when a scan is likely cheaper.
ColQL uses a cost-aware planner. If an index exists but would return too many candidate rows, the planner can fall back to a scan. This avoids allocating or iterating a broad index candidate set when a scan is likely cheaper. Planner decisions affect performance only, not query results.

Indexes are most useful for selective predicates:

Expand All @@ -62,7 +62,7 @@ users.where("status", "in", ["active", "passive"]).count();

## Dirty and Lazy Rebuilds

Deletes and updates can change row indexes or indexed values. ColQL marks existing indexes dirty after nonzero mutations and rebuilds them lazily when an indexed query needs them. The first indexed query after a mutation may be slower than later queries.
Inserts, deletes, and updates can change internal row positions or indexed values. Row positions are not stable IDs and should not be used as external identifiers. If stable identity is required, define and index an ID column. ColQL marks existing indexes dirty after nonzero mutations. When an indexed query requires a dirty index, ColQL rebuilds it before use. The first indexed query after a mutation may be slower than later queries.

You can rebuild explicitly:

Expand All @@ -75,7 +75,7 @@ users.rebuildIndex("status");

## Serialization

Indexes are not serialized. They are derived data and can be recreated:
Indexes are not serialized. They are derived performance data and can be recreated:

```ts
const restored = table.deserialize(buffer);
Expand Down
14 changes: 7 additions & 7 deletions docs/doc/07-sorted-indexes.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Sorted Indexes

Sorted indexes accelerate selective numeric range queries.
Sorted indexes are optional derived performance structures that accelerate selective numeric range queries. A query must return the same result whether ColQL uses a sorted index or a full scan.

```ts
users.createSortedIndex("age");
Expand All @@ -23,11 +23,11 @@ users.rebuildSortedIndex("age");
users.rebuildIndexes();
```

Sorted indexes are separate from equality indexes because they store row IDs ordered by numeric column value instead of buckets by exact value.
Sorted indexes are separate from equality indexes because they store internal row positions ordered by numeric column value instead of buckets by exact value.

## Supported Columns and Operators

Sorted indexes are numeric-only.
Sorted indexes are numeric-only. Dictionary and boolean columns do not support sorted range indexes.

Supported range operators:

Expand All @@ -36,11 +36,11 @@ Supported range operators:
- `<`
- `<=`

Equality on a numeric column can use an equality index, not a sorted index.
Equality on a numeric column can use an equality index, not a sorted index. Multi-column compound indexes are not supported; multiple predicates are combined at query time.

## Planner Behavior

The planner estimates the number of matching rows from sorted-index bounds. If the range is selective enough, ColQL scans the candidate row IDs. If the range is broad, ColQL may fall back to a table scan.
The planner estimates the number of matching rows from sorted-index bounds. If the range is selective enough, ColQL scans the candidate row positions. If the range is broad, ColQL may fall back to a table scan. Planner decisions affect performance only, not query results.

```ts
users.createSortedIndex("score");
Expand All @@ -49,11 +49,11 @@ const highScores = users.where("score", ">", 900).toArray();
const manyRows = users.where("score", ">", 10).count(); // may scan
```

Candidate row IDs are returned in scan order so query output preserves logical row order.
Candidate row positions are returned in scan order so query output preserves logical row order.

## Dirty and Lazy Rebuilds

Sorted indexes are marked dirty after inserts, deletes, and updates. They are rebuilt lazily when a query needs them, or eagerly with:
Sorted indexes are marked dirty after inserts, deletes, and updates. When an indexed query requires a dirty sorted index, ColQL rebuilds it before use. Dirty sorted indexes are not used to return stale results. You can also rebuild eagerly with:

```ts
users.rebuildSortedIndex("age");
Expand Down
24 changes: 23 additions & 1 deletion docs/doc/08-mutations.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ users.update(rowIndex, partialRow);
users.where(...).update(partialRow);
users.where(...).delete();

users.updateMany(predicate, partialRow);
users.deleteMany(predicate);

users.updateWhere(column, operator, value, partialRow);
users.deleteWhere(column, operator, value);
```
Expand All @@ -29,6 +32,8 @@ users.delete(rowIndex); // returns the table instance

`users.update(rowIndex, partialRow)` returns `{ affectedRows: 1 }` when successful. Predicate update/delete return `{ affectedRows: number }`; no-match predicate mutations return `{ affectedRows: 0 }`.

`updateMany` and `deleteMany` are preferred table-level convenience wrappers for common predicate mutations. Existing query mutation APIs remain available, and `updateWhere`/`deleteWhere` remain legacy convenience aliases. No mutation APIs are removed.

## Single-Row Update

```ts
Expand All @@ -46,6 +51,15 @@ const result = users.updateWhere("status", "=", "passive", {
});
```

Object predicate form:

```ts
const result = users.updateMany(
{ status: "passive", age: { gte: 18 } },
{ status: "active" },
);
```

Query form:

```ts
Expand All @@ -72,6 +86,12 @@ users
const result = users.deleteWhere("age", "<", 18);
```

Object predicate form:

```ts
const result = users.deleteMany({ status: "archived" });
```

Query form:

```ts
Expand All @@ -82,7 +102,7 @@ const result = users
.delete();
```

Predicate deletes physically remove rows. Row indexes after deleted rows may shift.
Predicate deletes physically remove rows. Row indexes are not stable external identifiers and may shift after mutations.

## Safety Rules

Expand All @@ -96,6 +116,8 @@ ColQL applies mutation safety rules internally:
- nonzero update/delete mutations mark existing indexes dirty
- incremental index maintenance is not attempted

Dirty indexes are rebuilt before an indexed query uses them, so index dirtiness affects rebuild cost, not query correctness.

Snapshotting matters when an update changes the predicate column:

```ts
Expand Down
4 changes: 2 additions & 2 deletions docs/doc/09-physical-deletes.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ users.delete(3);

## Row Index Semantics

Logical row order is preserved, but row indexes after the deleted row may shift.
Logical row order is preserved, but row indexes after the deleted row may shift. Row indexes are internal positions, not stable IDs.

```ts
const before = users.get(4);
Expand Down Expand Up @@ -46,7 +46,7 @@ Descending deletion avoids accidentally skipping rows as lower row indexes shift

## Indexes After Delete

Deletes mark existing equality and sorted indexes dirty. ColQL rebuilds dirty indexes lazily on the next indexed query, or explicitly:
Deletes mark existing equality and sorted indexes dirty. If an indexed query needs a dirty index, ColQL rebuilds it before use. Dirty indexes are not used to return stale results. You can also rebuild explicitly:

```ts
users.deleteWhere("age", "<", 18);
Expand Down
10 changes: 10 additions & 0 deletions docs/doc/10-error-handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Query errors:

- `COLQL_INVALID_COLUMN`
- `COLQL_INVALID_OPERATOR`
- `COLQL_INVALID_PREDICATE`
- `COLQL_INVALID_LIMIT`
- `COLQL_INVALID_OFFSET`
- `COLQL_INVALID_ROW_INDEX`
Expand Down Expand Up @@ -81,6 +82,15 @@ users.where("status", ">", "active");
// COLQL_INVALID_OPERATOR because range operators require numeric columns
```

Invalid object predicate:

```ts
users.where({});
// COLQL_INVALID_PREDICATE
```

`COLQL_INVALID_PREDICATE` is thrown for empty object predicates, invalid object predicate operators, and invalid predicate shapes.

Invalid row index:

```ts
Expand Down
2 changes: 1 addition & 1 deletion docs/doc/11-serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Indexes are not serialized:
- equality indexes
- sorted indexes

They are derived data and can be rebuilt after deserialization.
They are derived performance data and can be rebuilt after deserialization. Recreating indexes after deserialization affects performance only, not query correctness.

```ts
const restored = table.deserialize(buffer);
Expand Down
9 changes: 5 additions & 4 deletions docs/doc/12-memory-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,12 @@ Chunking lets columns grow and physically delete rows without depending on one g

## Index Memory

Indexes are separate derived structures:
Indexes are separate derived performance structures:

- equality indexes store row-ID buckets by value
- sorted indexes store row IDs sorted by numeric value
- equality indexes store row-position buckets by value
- sorted indexes store row positions sorted by numeric value

Indexes improve selected query shapes but increase memory. Drop indexes if memory matters more than indexed lookup speed:
Indexes improve selected query shapes but increase memory. They do not change query correctness; the same query must return the same result through an index or a full scan. Drop indexes if memory matters more than indexed lookup speed:

```ts
users.dropIndex("status");
Expand Down Expand Up @@ -89,6 +89,7 @@ For very broad changes, expect temporary row-index snapshot memory.
- Use indexes for selective hot queries.
- Avoid indexes for columns with low selectivity unless queries prove useful.
- Drop indexes to recover derived-memory overhead.
- Expect the first indexed query after mutation to include lazy rebuild cost if the needed index is dirty.
- Avoid `toArray()` for huge result sets when counting or streaming is enough.
- Remember that `heapUsed` alone can under-report typed-array storage; inspect `arrayBuffers` too.

Expand Down
6 changes: 3 additions & 3 deletions docs/doc/13-performance-and-benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,13 @@ users.createIndex("id");
users.where("id", "=", 123).first();
```

Broad predicates may fall back to scan by planner choice. This is expected, not a failed index.
Broad predicates may fall back to scan by planner choice. This is expected, not a failed index. Planner choices affect performance only, not query results.

In the same local run, `benchmark:indexed` showed selective `id = 99990` queries benefiting from the equality index, while `status in all` was close to scan time because the planner avoids broad index work. This is the expected tradeoff: indexes help selective lookups and cost memory.

`benchmark:range` showed sorted indexes helping selective ranges such as `age > 90`, while broad `age > 10` was similar to scan. It also showed that combining a selective equality index with an additional range filter can be much faster than scanning the broad range first.

`benchmark:optimizer` measures the planner choosing the smallest useful indexed candidate source and then applying remaining filters.
`benchmark:optimizer` measures the planner choosing the smallest useful indexed candidate source and then applying remaining filters. Multiple predicates are combined at query time; ColQL does not build multi-column compound indexes.

## Interpreting Delete and Mutation Benchmarks

Expand All @@ -87,7 +87,7 @@ The delete benchmark separates phases:
- materialized query output
- index drop

The first indexed query after mutation may include lazy index rebuild cost.
The first indexed query after mutation may include lazy index rebuild cost. Dirty indexes are rebuilt before use and are not used to return stale results.

In the local delete/mutation run, the first indexed query after dirtying indexes was much slower than the second indexed query because it paid lazy rebuild cost. The benchmark also shows `toArray()` as a separate memory phase because it materializes row objects.

Expand Down
Loading
Loading