Skip to content

Yuutokata/Gmail-Exporter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Gmail Senders Extractor

A Python script that extracts and analyzes all unique email senders from your Gmail account, providing detailed statistics to help you organize and filter your email workflow.

Overview

This tool connects to your Gmail account via the Gmail API and extracts comprehensive information about all unique senders, including:

  • Sender names and email addresses
  • Email frequency (count of emails received)
  • Most recent email date
  • Complete sender history with progress tracking

Perfect for email migration analysis, spam filtering setup, or general email organization when moving to services like Proton Mail.

Features

  • Complete Sender Analysis: Extracts all unique senders from your Gmail history
  • Frequency Tracking: Counts how many emails you've received from each sender
  • Recency Analysis: Tracks the most recent email from each sender
  • Progress Persistence: Resumes processing from where it left off if interrupted
  • Multiple Export Formats: Generates sorted lists by email, frequency, and recency
  • Robust Error Handling: Continues processing even if individual messages fail
  • Detailed Logging: Comprehensive logs for debugging and progress monitoring
  • Rate Limiting Safe: Handles Gmail API rate limits gracefully

Prerequisites

  • Python 3.6 or higher
  • Google Account with Gmail
  • Google Cloud Console project with Gmail API enabled

Installation

  1. Clone the repository:

    git clone https://github.com/Yuutokata/Gmail-Exporter
    cd gmail-senders-extractor

  2. Install dependencies:

    pip install -r requirements.txt

  3. Set up Google API credentials:

  • Go to the Google Cloud Console
  • Create a new project or select an existing one
  • Enable the Gmail API
  • Create credentials (OAuth 2.0 Client ID) for a Desktop application
  • Download the credentials file and save it as credentials.json in the project directory

Usage

  1. Run the script:

    python main.py

  2. First-time authentication:

  • The script will open your browser for Google OAuth authentication
  • Grant the necessary permissions (read-only access to Gmail)
  • The authentication token will be saved for future runs
  1. Monitor progress:
  • The script provides real-time progress updates
  • You can safely interrupt (Ctrl+C) and resume later
  • Progress is automatically saved every 100 processed messages

Output Files

The script generates several JSON files:

  • unique_senders.json: All senders sorted alphabetically by email
  • frequent_senders.json: Senders sorted by email frequency (most frequent first)
  • recent_senders.json: Senders sorted by recency (most recent first)
  • sender_progress.json: Progress tracking file (for resuming)
  • gmail_extractor.log: Detailed execution logs

JSON Structure

Each sender entry contains:

{  
"name": "John Doe",  
"email": "[john.doe@example.com](mailto:john.doe@example.com)",  
"original_from": "John Doe  [john.doe@example.com](mailto:john.doe@example.com)",  
"count": 25,  
"last_date": "2024-01-15T10:30:00+00:00",  
"last_date_str": "2024-01-15 10:30:00"  
}

Configuration

You can modify these variables in main.py:

  • save_frequency: How often to save progress (default: 100 messages)
  • SCOPES: Gmail API permissions (default: read-only)
  • File names for output and progress files

Use Cases

  • Email Migration: Analyze your Gmail patterns before switching to Proton Mail or other providers
  • Filter Setup: Create email filters and folder structures based on sender frequency
  • Spam Analysis: Identify high-volume senders for potential filtering
  • Contact Management: Export your email contacts for import into other systems
  • Email Auditing: Understand your email communication patterns

Performance

  • Processing speed: ~10-50 messages/second (depending on API limits)
  • Memory usage: Minimal (processes messages one at a time)
  • Storage: JSON files are typically 1-10MB for most Gmail accounts
  • Resumability: Can handle Gmail accounts with 100k+ emails safely

Troubleshooting

Authentication Issues:

  • Ensure credentials.json is in the project directory
  • Delete token.json and re-authenticate if needed
  • Verify Gmail API is enabled in Google Cloud Console

Rate Limiting:

  • The script handles rate limits automatically
  • If you encounter persistent rate limits, the script will retry

Interrupted Processing:

  • Simply re-run the script - it will resume from where it left off
  • Check gmail_extractor.log for detailed error information

Large Mailboxes:

  • For Gmail accounts with 50k+ emails, expect several hours of processing
  • The script saves progress regularly, so interruptions are safe

Privacy & Security

  • Read-Only Access: The script only reads your Gmail data, never modifies or deletes
  • Local Processing: All data is processed locally on your machine
  • No Data Transmission: Sender information stays on your computer
  • Revocable Access: You can revoke API access anytime in your Google Account settings

About

Analyze and export all unique email senders from Gmail with frequency and recency data for email organization and migration planning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages