Skip to content

feat: Export events to JSON Lines#451

Merged
cameri merged 6 commits intocameri:mainfrom
Anshumancanrock:export-events
Apr 18, 2026
Merged

feat: Export events to JSON Lines#451
cameri merged 6 commits intocameri:mainfrom
Anshumancanrock:export-events

Conversation

@Anshumancanrock
Copy link
Copy Markdown
Collaborator

@Anshumancanrock Anshumancanrock commented Apr 10, 2026

Description

This PR adds npm run export [filename.jsonl] , a streaming CLI command that writes all stored events to a .jsonl file in NIP-01 format.

  • Uses pg-query-stream to read rows via cursor (no full dataset in RAM)
  • Each line is a JSON object with the 7 NIP-01 fields: id, pubkey, created_at, kind, tags, content, sig
  • Skips soft-deleted events
  • Logs progress every 10k rows
  • Cleans up DB connection on exit

Related Issue

Closes #405

Motivation and Context

Relay operators currently have no built-in way to export events in a portable format. pg_dump works but is Postgres-specific and not useful for migrating between relay implementations or doing data analysis. This gives them a single command that outputs standard NIP-01 JSON Lines.

How Has This Been Tested?

  • tsc --noEmit ( compiles clean )
  • eslint ( no warnings )
  • Ran npm run export against a local Postgres seeded with 3 million events
  • Validated every line in the output is parseable JSON with exactly the 7 NIP-01 fields
  • Tested custom filename: npm run export -- backup.jsonl
  • Verified both runs produce identical output

Screenshots (if appropriate):

image

Types of changes

  • Non-functional change (docs, style, minor refactor)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my code changes.
  • All new and existing tests passed.

@Anshumancanrock
Copy link
Copy Markdown
Collaborator Author

Hi @cameri , Tested the exporter with the same events you sent :)

@Anshumancanrock
Copy link
Copy Markdown
Collaborator Author

hi @cameri , Could you please take a look at this PR when possible so I can start working on the compressor issue (#407)?

@Anshumancanrock
Copy link
Copy Markdown
Collaborator Author

Hi @cameri , could you please review this PR so I can proceed with issue #407, which is currently blocked.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new CLI entrypoint to export all non-deleted events from the relay database into a NIP-01 compliant JSON Lines (.jsonl) file, intended for portable backups/migration and large datasets via streaming.

Changes:

  • Introduces src/scripts/export-events.ts, a streaming exporter that writes events as JSONL and logs progress.
  • Adds npm run export script to run the exporter via ts-node.
  • Documents the export command in README.md and ignores *.jsonl outputs in .gitignore.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/scripts/export-events.ts Implements streaming DB export of events to NIP-01 JSONL with progress logging.
package.json Adds export npm script to run the new exporter.
README.md Documents how to use the export command and notes DB env var usage.
.gitignore Ignores generated .jsonl export output files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/scripts/export-events.ts
Comment on lines +22 to +26
async function exportEvents(): Promise<void> {
const filename = process.argv[2] || 'events.jsonl'
const outputPath = path.resolve(filename)
const db = knex(getDbConfig())

Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions cleaning up the DB connection “on exit”, but the script doesn’t currently trap SIGINT/SIGTERM. If the process is interrupted mid-export, the transaction/stream and file descriptor may not be closed cleanly. Consider adding signal handlers to destroy the db stream, close the output stream, and db.destroy() before exiting.

Copilot uses AI. Check for mistakes.
Comment thread src/scripts/export-events.ts Outdated
Comment on lines +11 to +21
const getDbConfig = () => ({
client: 'pg',
connection: process.env.DB_URI || {
host: process.env.DB_HOST ?? 'localhost',
port: Number(process.env.DB_PORT ?? 5432),
user: process.env.DB_USER ?? 'postgres',
password: process.env.DB_PASSWORD ?? 'postgres',
database: process.env.DB_NAME ?? 'nostream',
},
})

Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDbConfig() duplicates the repo’s Knex configuration logic and doesn’t honor several DB_* settings the relay supports (e.g. pool sizing / acquire timeout). It also introduces default host/user/password/db values, which can make the script silently export from an unexpected database when env vars are missing. Consider reusing src/database/client.ts (or factoring out a shared config helper) so the export command uses the same connection behavior as the relay and fails fast when required env vars aren’t set.

Suggested change
const getDbConfig = () => ({
client: 'pg',
connection: process.env.DB_URI || {
host: process.env.DB_HOST ?? 'localhost',
port: Number(process.env.DB_PORT ?? 5432),
user: process.env.DB_USER ?? 'postgres',
password: process.env.DB_PASSWORD ?? 'postgres',
database: process.env.DB_NAME ?? 'nostream',
},
})
const parseOptionalNumber = (name: string): number | undefined => {
const value = process.env[name]
if (value == null || value === '') {
return undefined
}
const parsed = Number(value)
if (!Number.isFinite(parsed)) {
throw new Error(`Invalid ${name}: expected a number`)
}
return parsed
}
const requireEnv = (name: string): string => {
const value = process.env[name]
if (value == null || value === '') {
throw new Error(`Missing required environment variable: ${name}`)
}
return value
}
const getDbConfig = () => {
const acquireConnectionTimeout = parseOptionalNumber('DB_ACQUIRE_TIMEOUT')
const poolMin = parseOptionalNumber('DB_POOL_MIN')
const poolMax = parseOptionalNumber('DB_POOL_MAX')
return {
client: 'pg',
connection: process.env.DB_URI
? process.env.DB_URI
: {
host: requireEnv('DB_HOST'),
port: parseOptionalNumber('DB_PORT') ?? 5432,
user: requireEnv('DB_USER'),
password: requireEnv('DB_PASSWORD'),
database: requireEnv('DB_NAME'),
},
...(acquireConnectionTimeout === undefined
? {}
: { acquireConnectionTimeout }),
...((poolMin === undefined && poolMax === undefined)
? {}
: {
pool: {
...(poolMin === undefined ? {} : { min: poolMin }),
...(poolMax === undefined ? {} : { max: poolMax }),
},
}),
}
}

Copilot uses AI. Check for mistakes.
Comment thread src/scripts/export-events.ts Outdated
Comment on lines +28 to +43
const [{ count }] = await db('events')
.whereNull('deleted_at')
.count('* as count')
const total = Number(count)

if (total === 0) {
console.log('No events to export.')
return
}

console.log(`Exporting ${total} events to ${outputPath}`)

const output = fs.createWriteStream(outputPath)
let exported = 0

const trx = await db.transaction(null, { isolationLevel: 'repeatable read' })
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count(*) is executed outside the repeatable read transaction, so total may not match the snapshot being streamed (new inserts/soft-deletes between the count and BEGIN can cause progress to be misleading and exported !== total). If you want a consistent snapshot, run the count inside the same read-only transaction before starting the stream; otherwise consider dropping the transaction / total and just log exported rows.

Copilot uses AI. Check for mistakes.
Comment thread src/scripts/export-events.ts Outdated
'event_signature',
)
.whereNull('deleted_at')
.orderBy('event_created_at', 'asc')
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orderBy('event_created_at', 'asc') does not guarantee deterministic ordering when multiple events share the same event_created_at value, so repeated exports can legitimately produce different line orders. If stable output is desired, add a secondary tie-breaker (e.g. event_id or the PK id) to the ORDER BY.

Suggested change
.orderBy('event_created_at', 'asc')
.orderBy('event_created_at', 'asc')
.orderBy('event_id', 'asc')

Copilot uses AI. Check for mistakes.
Comment thread src/scripts/export-events.ts Outdated
const output = fs.createWriteStream(outputPath)
let exported = 0

const trx = await db.transaction(null, { isolationLevel: 'repeatable read' })
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running a long-lived repeatable read transaction for a multi-million-row export can hold an old MVCC snapshot for the duration of the export, which can increase bloat and interfere with vacuum on busy relays. Consider using READ COMMITTED (still READ ONLY) or avoiding an explicit transaction unless a consistent snapshot is strictly required; alternatively document this operational impact and recommend running against a read replica.

Suggested change
const trx = await db.transaction(null, { isolationLevel: 'repeatable read' })
const trx = await db.transaction(null, { isolationLevel: 'read committed' })

Copilot uses AI. Check for mistakes.
@cameri
Copy link
Copy Markdown
Owner

cameri commented Apr 18, 2026

@Anshumancanrock please address copilot comments I've thumbed up with an emoji

@Anshumancanrock
Copy link
Copy Markdown
Collaborator Author

hi @cameri , Thanks for the review. I've addressed all approved Copilot comments:

  • Reused the shared DB client config (removed duplicate/fallback setup)
  • Removed the count-before-stream and long repeatable-read transaction
  • Added deterministic ordering by event_created_at, then event_id
  • Added graceful handling for SIGINT/SIGTERM with cleanup and proper exit

please take a look when you have a moment. Thanks again !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Export events to JSON Lines

3 participants