Skip to content

Add Google Voice Takeout import support#225

Open
sternryan wants to merge 1 commit intowesm:mainfrom
sternryan:gvoice-upstream
Open

Add Google Voice Takeout import support#225
sternryan wants to merge 1 commit intowesm:mainfrom
sternryan:gvoice-upstream

Conversation

@sternryan
Copy link
Copy Markdown

Summary

  • New sync-gvoice command that imports SMS, MMS, and call records from Google Takeout Voice exports
  • Follows the established adapter pattern, implementing gmail.API interface for plug-and-play integration
  • Deterministic message IDs via SHA-256 for idempotent re-imports

New files

  • internal/gvoice/client.go - gmail.API implementation over Takeout HTML
  • internal/gvoice/parser.go - HTML conversation parser
  • internal/gvoice/models.go - conversation and message types
  • internal/gvoice/parser_test.go - parser tests
  • cmd/msgvault/cmd/sync_gvoice.go - CLI command

Usage

msgvault sync-gvoice --takeout-dir ~/path/to/Takeout/Voice

Performance

  • Indexes ~120k entries from ~50k files in ~6 seconds
  • Full import at ~1,500 messages/sec

Implements sync-gvoice command that imports SMS, MMS, and call records
from a Google Takeout Voice export. Follows the established adapter
pattern from the iMessage integration, implementing gmail.API interface
to plug into the existing sync infrastructure.

Key features:
- Parses HTML conversation files for text messages and call logs
- Handles 1:1 texts, group conversations, and call records
- Deterministic message IDs via SHA-256 for idempotent re-imports
- Indexes ~120k entries from ~50k files in ~6 seconds
- Full import at ~1,500 messages/sec
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (cb5595d)

Verdict: The PR successfully introduces the Google Voice Takeout parser and sync pipeline, but requires crucial fixes for potential panics, data
races, and an O(N²) performance bottleneck before merging.

High Severity

  • Location: internal/gvoice/client.go:153 (in buildIndex)
    **
    Problem:** The error path dereferences entry.ID after c.indexCallFile(...) returns an error. In that case, entry is nil, so a single malformed or unsupported call HTML file will panic the entire import instead of being skipped.
    Fix: Log the filename/path
    already in scope instead of entry.ID, or guard the dereference before logging.

  • Location: internal/gvoice/client.go (in GetMessagesRawBatch)

Problem: Continuing after a GetMessageRaw error leaves nil pointers in the pre-allocated results slice. If callers iterate over the returned slice and assume all items are valid, it will trigger a panic.
Fix: Initialize results with zero length (make([]*gmail .RawMessage, 0, len(messageIDs))) and append messages on success.

Medium Severity

  • Location: internal/gvoice/client.go (in
    getCachedMessages and buildIndex)
    Problem: The file cache (lastFilePath, lastMessages) and the lazy index initialization (c.indexBuilt) lack mutex protection. If the syncer fetches message batches concurrently, this will cause a data race.
    Fix: Protect the cache
    fields with a sync.Mutex and use sync.Once for buildIndex.

  • Location: internal/gvoice/client.go (in GetMessageRaw)

    Problem: A linear scan over c.index is performed for every message fetched. Doing this for a large number of messages yields O(N²) complexity, which will cause severe CPU stalling during large syncs.
    Fix: Populate a map[string]*indexEntry during build Index() to allow O(1) lookups by ID.

  • Location: internal/gvoice/client.go:219 and internal/gvoice/client.go: 425
    Problem: For 1:1 conversations, the code derives the other participant only by scanning for a non-Me message. If a Takeout thread contains only outbound messages
    , indexTextFile falls back to the owner's own number for the thread ID and buildTextMessage emits no recipient at all. That can merge unrelated sent-only threads together and lose addressing metadata.
    Fix: Use the conversation/file metadata (contactName, filename, or parsed conversation header) as
    the fallback participant when no inbound message exists, and add a test for sent-only threads.

  • Location: internal/gvoice/client.go:449 and internal/g voice/client.go:461
    Problem: MMS attachments are detected during HTML parsing, but buildTextMessage only adds the mms label and generates a plain-text MIME body with no
    attachment parts. The import drops all image/video content from MMS messages.
    Fix: Build multipart MIME that includes the referenced media files (or persist them through the attachment pipeline) and cover this with tests for image/video MMS messages.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant