feat: Export events to JSON Lines#451
Conversation
0ad3e13 to
8ed6a89
Compare
|
Hi @cameri , Tested the exporter with the same events you sent :) |
There was a problem hiding this comment.
Pull request overview
Adds a new CLI entrypoint to export all non-deleted events from the relay database into a NIP-01 compliant JSON Lines (.jsonl) file, intended for portable backups/migration and large datasets via streaming.
Changes:
- Introduces
src/scripts/export-events.ts, a streaming exporter that writes events as JSONL and logs progress. - Adds
npm run exportscript to run the exporter viats-node. - Documents the export command in
README.mdand ignores*.jsonloutputs in.gitignore.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/scripts/export-events.ts | Implements streaming DB export of events to NIP-01 JSONL with progress logging. |
| package.json | Adds export npm script to run the new exporter. |
| README.md | Documents how to use the export command and notes DB env var usage. |
| .gitignore | Ignores generated .jsonl export output files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async function exportEvents(): Promise<void> { | ||
| const filename = process.argv[2] || 'events.jsonl' | ||
| const outputPath = path.resolve(filename) | ||
| const db = knex(getDbConfig()) | ||
|
|
There was a problem hiding this comment.
PR description mentions cleaning up the DB connection “on exit”, but the script doesn’t currently trap SIGINT/SIGTERM. If the process is interrupted mid-export, the transaction/stream and file descriptor may not be closed cleanly. Consider adding signal handlers to destroy the db stream, close the output stream, and db.destroy() before exiting.
| const getDbConfig = () => ({ | ||
| client: 'pg', | ||
| connection: process.env.DB_URI || { | ||
| host: process.env.DB_HOST ?? 'localhost', | ||
| port: Number(process.env.DB_PORT ?? 5432), | ||
| user: process.env.DB_USER ?? 'postgres', | ||
| password: process.env.DB_PASSWORD ?? 'postgres', | ||
| database: process.env.DB_NAME ?? 'nostream', | ||
| }, | ||
| }) | ||
|
|
There was a problem hiding this comment.
getDbConfig() duplicates the repo’s Knex configuration logic and doesn’t honor several DB_* settings the relay supports (e.g. pool sizing / acquire timeout). It also introduces default host/user/password/db values, which can make the script silently export from an unexpected database when env vars are missing. Consider reusing src/database/client.ts (or factoring out a shared config helper) so the export command uses the same connection behavior as the relay and fails fast when required env vars aren’t set.
| const getDbConfig = () => ({ | |
| client: 'pg', | |
| connection: process.env.DB_URI || { | |
| host: process.env.DB_HOST ?? 'localhost', | |
| port: Number(process.env.DB_PORT ?? 5432), | |
| user: process.env.DB_USER ?? 'postgres', | |
| password: process.env.DB_PASSWORD ?? 'postgres', | |
| database: process.env.DB_NAME ?? 'nostream', | |
| }, | |
| }) | |
| const parseOptionalNumber = (name: string): number | undefined => { | |
| const value = process.env[name] | |
| if (value == null || value === '') { | |
| return undefined | |
| } | |
| const parsed = Number(value) | |
| if (!Number.isFinite(parsed)) { | |
| throw new Error(`Invalid ${name}: expected a number`) | |
| } | |
| return parsed | |
| } | |
| const requireEnv = (name: string): string => { | |
| const value = process.env[name] | |
| if (value == null || value === '') { | |
| throw new Error(`Missing required environment variable: ${name}`) | |
| } | |
| return value | |
| } | |
| const getDbConfig = () => { | |
| const acquireConnectionTimeout = parseOptionalNumber('DB_ACQUIRE_TIMEOUT') | |
| const poolMin = parseOptionalNumber('DB_POOL_MIN') | |
| const poolMax = parseOptionalNumber('DB_POOL_MAX') | |
| return { | |
| client: 'pg', | |
| connection: process.env.DB_URI | |
| ? process.env.DB_URI | |
| : { | |
| host: requireEnv('DB_HOST'), | |
| port: parseOptionalNumber('DB_PORT') ?? 5432, | |
| user: requireEnv('DB_USER'), | |
| password: requireEnv('DB_PASSWORD'), | |
| database: requireEnv('DB_NAME'), | |
| }, | |
| ...(acquireConnectionTimeout === undefined | |
| ? {} | |
| : { acquireConnectionTimeout }), | |
| ...((poolMin === undefined && poolMax === undefined) | |
| ? {} | |
| : { | |
| pool: { | |
| ...(poolMin === undefined ? {} : { min: poolMin }), | |
| ...(poolMax === undefined ? {} : { max: poolMax }), | |
| }, | |
| }), | |
| } | |
| } |
| const [{ count }] = await db('events') | ||
| .whereNull('deleted_at') | ||
| .count('* as count') | ||
| const total = Number(count) | ||
|
|
||
| if (total === 0) { | ||
| console.log('No events to export.') | ||
| return | ||
| } | ||
|
|
||
| console.log(`Exporting ${total} events to ${outputPath}`) | ||
|
|
||
| const output = fs.createWriteStream(outputPath) | ||
| let exported = 0 | ||
|
|
||
| const trx = await db.transaction(null, { isolationLevel: 'repeatable read' }) |
There was a problem hiding this comment.
The count(*) is executed outside the repeatable read transaction, so total may not match the snapshot being streamed (new inserts/soft-deletes between the count and BEGIN can cause progress to be misleading and exported !== total). If you want a consistent snapshot, run the count inside the same read-only transaction before starting the stream; otherwise consider dropping the transaction / total and just log exported rows.
| 'event_signature', | ||
| ) | ||
| .whereNull('deleted_at') | ||
| .orderBy('event_created_at', 'asc') |
There was a problem hiding this comment.
orderBy('event_created_at', 'asc') does not guarantee deterministic ordering when multiple events share the same event_created_at value, so repeated exports can legitimately produce different line orders. If stable output is desired, add a secondary tie-breaker (e.g. event_id or the PK id) to the ORDER BY.
| .orderBy('event_created_at', 'asc') | |
| .orderBy('event_created_at', 'asc') | |
| .orderBy('event_id', 'asc') |
| const output = fs.createWriteStream(outputPath) | ||
| let exported = 0 | ||
|
|
||
| const trx = await db.transaction(null, { isolationLevel: 'repeatable read' }) |
There was a problem hiding this comment.
Running a long-lived repeatable read transaction for a multi-million-row export can hold an old MVCC snapshot for the duration of the export, which can increase bloat and interfere with vacuum on busy relays. Consider using READ COMMITTED (still READ ONLY) or avoiding an explicit transaction unless a consistent snapshot is strictly required; alternatively document this operational impact and recommend running against a read replica.
| const trx = await db.transaction(null, { isolationLevel: 'repeatable read' }) | |
| const trx = await db.transaction(null, { isolationLevel: 'read committed' }) |
|
@Anshumancanrock please address copilot comments I've thumbed up with an emoji |
|
hi @cameri , Thanks for the review. I've addressed all approved Copilot comments:
please take a look when you have a moment. Thanks again ! |
Description
This PR adds
npm run export [filename.jsonl], a streaming CLI command that writes all stored events to a.jsonlfile in NIP-01 format.pg-query-streamto read rows via cursor (no full dataset in RAM)id,pubkey,created_at,kind,tags,content,sigRelated Issue
Closes #405
Motivation and Context
Relay operators currently have no built-in way to export events in a portable format.
pg_dumpworks but is Postgres-specific and not useful for migrating between relay implementations or doing data analysis. This gives them a single command that outputs standard NIP-01 JSON Lines.How Has This Been Tested?
tsc --noEmit( compiles clean )eslint( no warnings )npm run exportagainst a local Postgres seeded with 3 million eventsnpm run export -- backup.jsonlScreenshots (if appropriate):
Types of changes
Checklist: