A database is the top-level container in NodeDB—a unit of deployment, collection namespace, quota parent, and the atomic unit for clone, mirror, and backup operations. One database hosts multiple tenants, but a tenant's data never spans databases (use MOVE TENANT to relocate). This design enables multi-tenancy within a shared binary while preserving strong isolation boundaries.
NodeDB reserves DatabaseId(0) for the special database named "default". This database:
- Is created automatically on first boot against any storage.
- Cannot be dropped (migration target for administrative operations).
- Can be renamed cosmetically; the numeric identity
DatabaseId(0)is immutable. - Becomes the implicit target when no database is specified in a connection.
Database identifiers in the range DatabaseId(0..=1023) are reserved for system use. User-created databases start at DatabaseId(1024). This allows NodeDB to grow future system databases without colliding with application namespaces.
CREATE DATABASE sales_prod;
CREATE DATABASE staging WITH (priority_class='bulk', max_memory_bytes=1073741824);Options:
priority_class—'critical','standard'(default), or'bulk'(affects WAL commit ordering and SPSC scheduling)max_memory_bytes— initial quota; can be changed later withALTER DATABASEmax_storage_bytes,max_qps,max_connections,cache_weight,maintenance_cpu_pct— quota fields (defaults inherited from cluster config)
DROP DATABASE staging;
DROP DATABASE staging CASCADE; -- drops all collections in it
DROP DATABASE staging FORCE; -- force-materializes all clones before droppingRestrictions:
- Cannot drop
default(returnsCANNOT_DROP_DEFAULT_DATABASE) - Non-CASCADE drop fails if the database has collections (returns
COLLECTION_EXISTS) - FORCE triggers materialization of all databases cloning from this one, blocks until complete, then drops
ALTER DATABASE staging RENAME TO staging_old;The numeric DatabaseId remains unchanged; only the name changes in the catalog.
A session binds to exactly one database for its lifetime. The binding is immutable and set at connection time via the resolution chain:
- Explicit parameter —
psql -d production, HTTPX-NodeDB-Database: production, native handshakedatabasefield - User default —
ALTER USER alice SET DEFAULT DATABASE prod_emea - Tenant default — per-tenant fallback (set via admin DDL)
- System default —
"default"
Once bound, the session is locked to that database. Switching databases requires USE DATABASE (or \c in psql), which aborts the current transaction, invalidates prepared statements, and rebinds to the new database.
USE DATABASE staging;- pgwire —
psql -d emp-prodordatabase=emp-prodin connection string;\c db_nameswitches (rebinds) - Native —
ConnectionBuilder::new(...).database("staging") - HTTP —
X-NodeDB-Databaseheader (preferred) or?database=stagingquery param - nodedb-cli —
--database stagingflag or\c stagingcommand
Access to a database is controlled via roles. Three new role types were introduced with database scoping:
DatabaseOwner(DatabaseId)— full control: alter config, materialize clones, backupDatabaseEditor(DatabaseId)— read and write collectionsDatabaseReader(DatabaseId)— read-only access
GRANT DATABASE_READER ON DATABASE staging TO alice;
GRANT DATABASE_EDITOR ON DATABASE staging TO bob;
GRANT DATABASE_OWNER ON DATABASE sales_prod TO carol;
ALTER USER alice SET DEFAULT DATABASE staging; -- sets alice's preferred databaseA user without an explicit grant cannot bind to a database (returns ACCESS_DENIED at session bind).
Accessing a collection in a different database than your bound database returns COLLECTION_NOT_FOUND — indistinguishable from a collection that doesn't exist anywhere. This prevents accidental cross-database leaks and simplifies the security model. Applications requiring cross-database operations must open two separate connections.
The only exception: privileged admin DDL (CLONE, MIRROR, MOVE TENANT) can reference multiple databases explicitly.
Quotas form a three-tier hierarchy: global (cluster-wide), database (per database), and tenant (per tenant within a database). Each tier has its own budget for memory, storage, queries-per-second, and connections. A request is admitted only if it passes all three tiers.
Set database quotas at creation or modify them later:
CREATE DATABASE analytics WITH (
max_memory_bytes=5368709120, -- 5 GB
max_storage_bytes=10737418240, -- 10 GB
max_qps=5000,
max_connections=500,
cache_weight=2,
priority_class='standard',
maintenance_cpu_pct=50
);
ALTER DATABASE analytics SET QUOTA (
max_qps=10000,
max_connections=1000
);Quota fields:
max_memory_bytes— total in-memory budget (L0 cache, memtables, indexes)max_storage_bytes— total durable storage (WAL, LSM segments, snapshots)max_qps— queries per second (rate limiter capacity)max_connections— concurrent connections (semaphore capacity)cache_weight— share of the global doc cache (relative weight; default 1)priority_class—'critical','standard','bulk'(affects WAL group commit order and SPSC scheduling fairness)maintenance_cpu_pct— max CPU percentage for background tasks like compaction (default 25%)
Tenants can have their own sub-quotas within a database:
ALTER TENANT acme IN DATABASE prod SET QUOTA (
max_memory_bytes=1073741824,
max_qps=500
);
SHOW TENANT QUOTA FOR acme IN DATABASE prod;Constraint: sum of all tenant quotas in a database ≤ the database's quota. Violated adds return QUOTA_OVERCOMMIT.
SHOW DATABASE QUOTA FOR analytics; -- configured limits
SHOW DATABASE USAGE FOR analytics; -- current consumption (memory, storage, active connections, qps)
SHOW TENANT QUOTA FOR acme IN DATABASE prod;
SHOW TENANT USAGE FOR acme IN DATABASE prod;CLONE DATABASE creates a copy-on-write (CoW) read-write database from a source, returning in milliseconds regardless of source size. The clone starts with a point-in-time snapshot of the source catalog and delegates reads to source storage up to that point. Writes go to the clone's own storage. The source is unaffected.
CLONE DATABASE preview FROM prod AS OF SYSTEM TIME 1730000000000;
CLONE DATABASE latest FROM prod; -- uses prod's current LSN
CLONE DATABASE snapshot FROM prod AS OF LATEST;Uses:
- Staging: clone prod for testing DDL changes or risky migrations
- Point-in-time recovery: clone at a past timestamp, inspect, then swap databases
- CI/CD: per-test-run clone from a baseline
Initially, the clone's collections carry a cloned_from marker pointing to the source collection at a specific LSN. Reads first check the clone's own storage; if a row is not found, they delegate to source storage (capped at the as_of LSN). Writes to the clone allocate fresh surrogates in clone storage, leaving source untouched.
A background materializer gradually copies remaining rows from source to clone. Once materialization completes, source delegation stops and the clone is fully independent.
AS OF SYSTEM TIME queries on a clone are correct across all time regimes:
- Query time > as_of: Returns post-clone writes from clone storage, querying at query time
- Query time ≤ as_of: Returns source state at query time (clone did not exist yet)
- Query time < clone_created_at: Returns empty (clone predates query point)
This enables realistic point-in-time staging: clone prod as of yesterday, inspect yesterday's state, make corrections, promote.
Clones can be nested (clone-of-clone). The lineage forms a tree. Hard limit: MAX_CLONE_DEPTH = 8. Exceeding returns CLONE_DEPTH_EXCEEDED. Force materialization with ALTER DATABASE name MATERIALIZE to flatten the lineage.
ALTER DATABASE my_clone MATERIALIZE; -- blocks until all rows copied from sourceA source database cannot be dropped while any database clones from it. Attempting DROP DATABASE source returns CLONE_DEPENDENCY with a list of dependent clones. Use DROP DATABASE source FORCE to force-materialize all dependents (blocking) before dropping.
SHOW DATABASE LINEAGE FOR my_clone;
-- Output: my_clone → staging → prod (ancestor chain)MIRROR DATABASE creates a continuously-updated read-only replica of a source database in a different region or cluster. The mirror is a Raft observer of the source: it applies the source's log entries under its own catalog but does not participate in quorum. Promotion to writable is one-way and permanent.
MIRROR DATABASE prod_eu FROM prod_us MODE = async;
MIRROR DATABASE prod_eu FROM prod_us MODE = sync;Modes:
async— mirror trails source asynchronously; lag is observable via metricssync— source waits for mirror acknowledgment before commit (latency cost; not recommended cross-region)
Uses:
- Disaster recovery: read-only replica in a different region; promote on regional failure
- Reporting: offload analytics to a replica
- Testing: read-only staging environment that auto-updates with production data
BoundedStaleness— served at mirror'slast_appliedLSNStrong— returnsSTALE_READ_NOT_LEADERwith source endpoint hintEventual— served immediately
SELECT * FROM orders CONSISTENCY='bounded_staleness'; -- served from mirrorBootstrap transfers a snapshot from source to mirror over QUIC, then log-streams changes. Lag is observable:
SHOW DATABASE MIRROR STATUS FOR prod_eu;
-- Output: status=Following, lag_ms=150, last_applied=12345, mode=asyncALTER DATABASE prod_eu PROMOTE;Promotion:
- Stops observing the source
- Becomes a normal Raft group with its own leader election
- Starts accepting writes
- Is one-way and permanent (no DEMOTE)
To re-mirror after promotion: drop the database and CLONE DATABASE from source.
- A mirror cannot be cloned (rejected with
CANNOT_CLONE_MIRROR); promote first if needed - A clone can be mirrored
- Pre-promotion writes return
MIRROR_READ_ONLY
MOVE TENANT relocates a tenant's data from one database to another. Offline in v1 (drains sessions, snapshots data, cuts over atomically).
MOVE TENANT acme FROM prod_us TO prod_eu;Internally:
- Pre-flight — verify target database has compatible schemas for all collections containing tenant data
- Drain — revoke tenant's active sessions on source; reject new writes; wait for in-flight requests with timeout
- Snapshot — backup tenant data to an in-cluster temp location (durable when backup is fsynced)
- Cutover — in a single Raft proposal, drop tenant from source and restore into target (atomic)
- Resume — writes accepted on target
Moving an already-moved tenant returns MOVE_TENANT_ALREADY_AT_TARGET:
MOVE TENANT acme FROM prod_us TO prod_eu; -- succeeds, or already-at-targetThis allows safe retry after network failures without duplicating the move.
SHOW AUDIT IN DATABASE staging [WHERE event_type = 'database_quota_changed'];Events tagged with database_id include: DatabaseCreated, DatabaseDropped, DatabaseRenamed, DatabaseQuotaChanged, DatabaseCloned, DatabaseMirrored, DatabasePromoted, TenantMoved, etc.
Enable per-database DML audit to track every row change:
ALTER DATABASE prod SET AUDIT_DML = 'all'; -- all DML + SELECT
ALTER DATABASE prod SET AUDIT_DML = 'writes'; -- INSERT / UPDATE / DELETE only
ALTER DATABASE prod SET AUDIT_DML = 'none'; -- disabled (default)Audit events include (database_id, tenant_id, user_id, statement_digest, row_count). Cost is the operator's choice; enabled via AUDIT_DML='all'|'writes'.
ALTER DATABASE analytics SET IDLE_TIMEOUT 1800; -- 30 minutes
ALTER DATABASE analytics SET IDLE_TIMEOUT 0; -- disabledSessions idle longer than the configured time are automatically closed with SESSION_IDLE_TIMEOUT. Tracks are monitored per-database; timeout is checked at request boundaries.
SHOW SESSIONS [IN DATABASE name];
-- Output: session_id, user, database, tenant, idle_seconds, bytes_in, bytes_out, current_statement_digest
KILL SESSION 'sid_abc123';KILL SESSION requires Superuser or DatabaseOwner of the session's database. Idle sessions are force-closed; active sessions are marked for revocation and close at the next request boundary.
NodeDB exposes per-database metrics for observability:
nodedb_database_qps{database="prod"}
nodedb_database_memory_bytes{database="prod"}
nodedb_database_storage_bytes{database="prod"}
nodedb_database_connections{database="prod"}
nodedb_database_mirror_lag_ms{database="prod_eu"} -- if mirrored
nodedb_database_wal_commit_latency_p99{database="prod"}
nodedb_database_maintenance_cpu_seconds{database="prod"}
Similarly, tenant metrics are labeled by both database and tenant:
nodedb_tenant_qps{database="prod", tenant="acme"}
nodedb_tenant_memory_bytes{database="prod", tenant="acme"}
Metrics are updated continuously and exposed via /metrics on the HTTP port.
Database operations are gated by role:
| DDL Operation | Required Role |
|---|---|
CREATE DATABASE |
ClusterAdmin or Superuser |
DROP DATABASE (non-default) |
Superuser |
DROP DATABASE … FORCE |
Superuser |
ALTER DATABASE … RENAME |
DatabaseOwner or Superuser |
ALTER DATABASE … SET QUOTA |
ClusterAdmin or Superuser |
ALTER DATABASE … MATERIALIZE |
DatabaseOwner or higher |
ALTER DATABASE … PROMOTE |
Superuser |
CLONE DATABASE |
Superuser |
MIRROR DATABASE |
Superuser |
MOVE TENANT |
Superuser |
BACKUP DATABASE |
DatabaseOwner or higher |
RESTORE DATABASE |
Superuser |
See Roles & Permissions for full role definitions.
| Error Code | Cause | Mitigation |
|---|---|---|
DATABASE_NOT_FOUND |
Database does not exist | Check database name; list with SHOW DATABASES |
ACCESS_DENIED |
User lacks grant for the database | Ask admin to GRANT DATABASE_READER ON DATABASE |
COLLECTION_NOT_FOUND |
Collection in another database, or truly absent | Open a second connection to the other database if needed |
MIRROR_READ_ONLY |
Writing to a pre-promotion mirror | Promote the mirror first, or write to source |
STALE_READ_NOT_LEADER |
Strong consistency requested on non-leader mirror |
Use BoundedStaleness or read from source |
CLONE_DEPTH_EXCEEDED |
Clone tree exceeds 8 levels | Materialize clones to flatten the lineage |
CANNOT_CLONE_MIRROR |
Attempting to clone from a mirror before promotion | Promote the mirror first |
CLONE_DEPENDENCY |
Cannot drop source while clones depend on it | Use DROP DATABASE source FORCE or drop clones first |
MOVE_TENANT_ALREADY_AT_TARGET |
Tenant already in target database (safe retry) | No action needed; operation succeeded |
*_QUOTA_EXCEEDED |
Memory, storage, QPS, connections, or budget limit hit | Increase quota or reduce load |
QUOTA_OVERCOMMIT |
Sum of tenant quotas exceeds database quota | Reduce or rebalance tenant quotas |
- Roles & Permissions — full GRANT/REVOKE reference
- Audit & Change Tracking — audit log format and filtering
- Multi-Tenancy — tenant isolation and lifetime management
- Architecture — how databases interact with the execution model