Skip to content

Latest commit

 

History

History
119 lines (78 loc) · 4.36 KB

File metadata and controls

119 lines (78 loc) · 4.36 KB

Google Docs BFS Exporter

Export Google Docs by following links (BFS crawl). Perfect for discovering and archiving scattered documentation.

Setup (5 minutes)

Run the setup script to install everything automatically:

Mac/Linux: ./setup.sh | Windows: setup.bat (double-click or run in cmd)

The script will:

  1. Install uv (Python package manager) if needed
  2. Install all dependencies via uv sync
  3. Prompt you to add credentials.json (see below)

Getting credentials.json

Option A: Colleague shares it with you via Slack/email (safe to share - it's just your OAuth app ID, not personal credentials)

Option B: Create your own (2 min):

  1. Go to Google Cloud Console
  2. Create project → Enable "Google Docs API" and "Google Drive API"
  3. Credentials → Create OAuth client ID → Desktop app
  4. Download JSON as credentials.json

First run

uv run python main.py --seed-id YOUR_DOC_ID

Browser opens for authorization first time. You authorize with YOUR Google account (each person gets their own access). After that, uses cached credentials in token.pickle.

Usage Examples

# Find your doc ID from the URL:
# https://docs.google.com/document/d/YOUR_DOC_ID_HERE/edit

# Export as Word docs locally (default)
uv run python main.py --seed-id YOUR_DOC_ID

# Export as markdown locally
uv run python main.py --seed-id YOUR_DOC_ID --format md

# Save copies to Google Drive folder (preserves as Google Docs)
uv run python main.py --seed-id YOUR_DOC_ID --drive YOUR_FOLDER_ID

# Save markdown to Drive with localized links (links point to other Drive files)
uv run python main.py --seed-id YOUR_DOC_ID --drive YOUR_FOLDER_ID --format md --localize-links

# Auto-request access to documents you don't have permission for
uv run python main.py --seed-id YOUR_DOC_ID --request-access

# Limit to 20 docs
uv run python main.py --seed-id YOUR_DOC_ID --max-docs 20

How It Works

BFS crawl: starts from seed doc → exports it → extracts Google Docs links → queues them → repeats. Files saved to exported_docs/ (or Drive with --drive).

Markdown support: Bold, italic, strikethrough, inline code, headings, links, lists (nested), basic tables. Missing: images, comments, footnotes. Use --format docx for perfect formatting.

Index CSV: Auto-generated at exported_docs/index.csv mapping doc IDs to filenames/Drive IDs.

Link Localization (--localize-links): Converts Google Docs URLs to local .md file links for offline browsing.

Options

uv run python main.py --help
  • --seed-id: Document ID to start from (required)
  • --format: md or docx (default: docx)
  • --drive FOLDER_ID: Save to Google Drive folder instead of local export
  • --localize-links: Convert Google Docs links to point to exported documents (works with Drive for markdown)
  • --request-access: Auto-request access to documents you can't view
  • --max-docs: Safety limit (default: 100)
  • --setup: Show OAuth setup instructions

Google Drive Export

Use --drive FOLDER_ID to save directly to a Google Drive folder instead of local export. Perfect for team archiving.

Get folder ID: Open folder in Drive, copy ID from URL: https://drive.google.com/drive/folders/YOUR_FOLDER_ID

What happens:

  • Markdown mode: Uploads .md files
  • Docx mode (default): Copies original Google Docs (preserves all formatting)
  • Index CSV saved locally with Drive file IDs

First time: Delete token.pickle to re-authenticate with Drive write permissions.

Troubleshooting

  • "Missing credentials.json": Run setup script again
  • "Access denied": Use --request-access to auto-request access from doc owners
  • Reset auth: Delete token.pickle and run again
  • "uv: command not found": Close/reopen terminal, run setup again

Files

  • setup.sh / setup.bat - One-click setup scripts
  • main.py - The script
  • credentials.json - OAuth app credentials (you provide, safe to share with colleagues)
  • token.pickle - Your personal auth tokens (auto-generated, never share)
  • exported_docs/ - Output folder
  • exported_docs/index.csv - Document index (auto-generated)

Known Limitations

Markdown export doesn't support: images, comments, complex table formatting, footnotes, drawings. Use --format docx for perfect formatting.

Development

See LLM.md for technical overview and architecture.