Skip to content

Latest commit

Β 

History

History
318 lines (244 loc) Β· 9.16 KB

File metadata and controls

318 lines (244 loc) Β· 9.16 KB

Confluence Export Tool - CLI Usage Guide

Overview

The Confluence Export Tool v2.0 provides an interactive command-line interface for selectively exporting Confluence pages with meaningful content (HTML, Markdown) and comprehensive error handling.

Features

  • 🎯 Interactive Page Selection: Choose specific pages to export rather than exporting everything
  • πŸ“ Multiple Content Formats: Export as HTML, Markdown, or both
  • ⚑ Rate Limiting: Intelligent API rate limiting with exponential backoff
  • πŸ“Š Progress Tracking: Real-time progress bars and detailed statistics
  • πŸ›‘οΈ Error Handling: Comprehensive error reporting with retry logic
  • πŸ“‹ Export Index: Optional index file with export metadata
  • πŸ’Ύ Smart Caching: Local page cache for instant startup (24-hour TTL)
  • πŸ“„ Pagination: Browse large spaces 25 pages at a time
  • πŸ“… Chronological Sorting: Pages sorted by last modified date (newest first)

Installation

npm install

Configuration

Set up your environment variables in a .env file:

CONFLUENCE_BASE_URL=https://your-domain.atlassian.net
CONFLUENCE_EMAIL=your-email@domain.com
CONFLUENCE_API_TOKEN=your-api-token
SPACE_KEY=YOUR_SPACE_KEY

Getting Your API Token

  1. Go to Atlassian Account Settings
  2. Click "Create API token"
  3. Give it a label and copy the token
  4. Use this token as your CONFLUENCE_API_TOKEN

Usage

Interactive Mode (Default)

npm start

This will:

  1. Fetch all pages from your Confluence space
  2. Present an interactive list for page selection
  3. Allow you to configure export options
  4. Export selected pages with progress tracking

Non-Interactive Mode

Export all pages without prompts:

npm start -- --no-interactive

Command Line Options

Option Description Default
--dev Enable development mode with verbose logging false
--output <dir> Output directory ./output
--concurrency <number> Max concurrent downloads (1-10) 5
--no-interactive Disable interactive mode (export all pages) false
--all-pages Export all pages (not blog posts) without user input - exports as markdown and HTML, skips attachments false
--format <format> Content format: both, html, markdown, storage both
--no-attachments Skip downloading attachments false
--no-index Skip creating index file false
--clear-cache Clear the page cache before starting false
--cache-info Show cache information and exit false

Examples

Basic interactive export:

npm start

Export all pages as Markdown only:

npm start -- --no-interactive --format markdown

Export all pages (not blog posts) without user input:

npm start -- --all-pages

Development mode with verbose output:

npm start -- --dev

Custom output directory with limited concurrency:

npm start -- --output ./my-export --concurrency 2

Export without attachments or index:

npm start -- --no-attachments --no-index

Cache management:

# Show cache information
npm start -- --cache-info

# Clear cache and refresh data
npm start -- --clear-cache

# Force refresh without interactive cache choice
npm start -- --clear-cache --no-interactive

Interactive Features

Page Selection

When running in interactive mode, you'll see:

  1. Page List: All pages in your space, sorted alphabetically
  2. Special Options:
    • βœ“ Select All Pages
    • βœ— Cancel Export
  3. Search/Filter: Use arrow keys and space to select pages
  4. Confirmation: Review your selection before export

Export Configuration

Choose from:

  • Content Format:

    • Both HTML and Markdown
    • HTML only
    • Markdown only
    • Raw Confluence storage format
  • Options:

    • Include attachments (recommended)
    • Create export index file
    • Set concurrency level

Output Structure

output/
β”œβ”€β”€ export-index.json          # Export metadata and page index
└── pages/
    β”œβ”€β”€ 123456-Page-Title/
    β”‚   β”œβ”€β”€ Page-Title-metadata.json   # Page metadata
    β”‚   β”œβ”€β”€ Page-Title.html             # HTML content (if selected)
    β”‚   β”œβ”€β”€ Page-Title.md               # Markdown content (if selected)
    β”‚   └── attachments/                # Page attachments (if enabled)
    β”‚       β”œβ”€β”€ image1.png
    β”‚       └── document.pdf
    └── 789012-Another-Page/
        β”œβ”€β”€ Another-Page-metadata.json
        β”œβ”€β”€ Another-Page.html
        └── Another-Page.md

Smart Caching System

The tool now includes a sophisticated caching system to dramatically improve startup times:

How It Works

  • First Run: Fetches all pages from Confluence (takes time) and caches them locally
  • Subsequent Runs: Loads from cache instantly (under 1 second vs 30+ seconds)
  • Cache Location: ./output/.cache/pages-cache.json
  • Cache TTL: 24 hours (automatically expires)

Cache Management

  • Automatic: Tool asks if you want to use cache or refresh on each run
  • Manual: Use --cache-info to check cache status
  • Clear: Use --clear-cache to force refresh
  • Smart: Cache includes modification dates for accurate chronological sorting

Cache Benefits

  • ⚑ 99% faster startup for large spaces (1500+ pages)
  • πŸ”„ Automatic expiration ensures data freshness
  • πŸ’Ύ Minimal storage - only essential page metadata cached
  • 🎯 Perfect for browsing - immediate access to paginated page lists

Rate Limiting & Error Handling

The tool implements several strategies to handle Confluence API limitations:

Rate Limiting

  • Minimum 100ms between requests
  • Exponential backoff on rate limit errors
  • Automatic retry with increasing delays
  • Clear communication when rate limits are hit

Error Handling

  • Comprehensive error reporting for each page
  • Continues export even if individual pages fail
  • Detailed error summary at the end
  • Specific guidance for common issues (auth, permissions, etc.)

Progress Tracking

  • Real-time progress bar with ETA
  • Current page being processed
  • Success/error indicators
  • Final statistics with timing information

Troubleshooting

Authentication Issues

❌ Export failed: Request failed with status code 401

Solution: Check your credentials:

  • Verify CONFLUENCE_EMAIL and CONFLUENCE_API_TOKEN
  • Ensure your API token hasn't expired
  • Test with: curl -u email:token https://your-domain.atlassian.net/wiki/rest/api/space

Permission Issues

❌ Export failed: Request failed with status code 403

Solution: Check permissions:

  • Ensure your account has read access to the space
  • Verify your API token has appropriate scopes
  • Contact your Confluence admin if needed

Space Not Found

❌ Export failed: Request failed with status code 404

Solution: Check configuration:

  • Verify SPACE_KEY is correct (case-sensitive)
  • Ensure CONFLUENCE_BASE_URL is correct
  • The space exists and you have access to it

Rate Limiting

⚠️ Rate limited. Waiting 60 seconds...

Solution: This is normal behavior. The tool will:

  • Automatically wait as requested by the API
  • Continue the export after the wait period
  • You can reduce --concurrency to make fewer parallel requests

Empty Pages

If exported pages appear empty (like your Landing Pages example):

  • The page might genuinely be empty in Confluence
  • Check if the page has content in the Confluence web interface
  • Some pages might only have metadata without body content

Best Practices

  1. Start Small: Test with a few pages first using interactive mode
  2. Monitor Rate Limits: Use --concurrency 2 for large exports
  3. Enable Dev Mode: Use --dev flag when troubleshooting
  4. Check Permissions: Ensure you have read access to all desired pages
  5. Regular Backups: Run exports regularly to maintain up-to-date backups

Advanced Usage

Environment Variables

You can override any configuration with environment variables:

export MAX_CONCURRENCY=2
export OUTPUT_DIR=./custom-output
export RETRY_ATTEMPTS=5
npm start

Automation

For automated exports (CI/CD, scheduled tasks):

# Non-interactive export with error handling
npm start -- --no-interactive --format both --output ./exports/$(date +%Y-%m-%d) || exit 1

Custom Scripts

You can also use the tool programmatically:

const { ConfluenceExporter } = require('./src/confluence-exporter');
const { Config } = require('./src/config');

async function customExport() {
  const config = new Config();
  config.loadFromEnv();
  
  const exporter = new ConfluenceExporter(config, {
    contentFormat: 'markdown',
    includeAttachments: false
  });
  
  // ... implement custom logic
}

Support

For issues and questions:

  1. Check the troubleshooting section above
  2. Run with --dev flag for detailed error information
  3. Verify your Confluence permissions and API token
  4. Check the export statistics for specific error details

The tool provides comprehensive error reporting to help diagnose and resolve issues quickly.