A Python script that extracts and analyzes all unique email senders from your Gmail account, providing detailed statistics to help you organize and filter your email workflow.
This tool connects to your Gmail account via the Gmail API and extracts comprehensive information about all unique senders, including:
- Sender names and email addresses
- Email frequency (count of emails received)
- Most recent email date
- Complete sender history with progress tracking
Perfect for email migration analysis, spam filtering setup, or general email organization when moving to services like Proton Mail.
- Complete Sender Analysis: Extracts all unique senders from your Gmail history
- Frequency Tracking: Counts how many emails you've received from each sender
- Recency Analysis: Tracks the most recent email from each sender
- Progress Persistence: Resumes processing from where it left off if interrupted
- Multiple Export Formats: Generates sorted lists by email, frequency, and recency
- Robust Error Handling: Continues processing even if individual messages fail
- Detailed Logging: Comprehensive logs for debugging and progress monitoring
- Rate Limiting Safe: Handles Gmail API rate limits gracefully
- Python 3.6 or higher
- Google Account with Gmail
- Google Cloud Console project with Gmail API enabled
-
Clone the repository:
git clone https://github.com/Yuutokata/Gmail-Exporter
cd gmail-senders-extractor -
Install dependencies:
pip install -r requirements.txt
-
Set up Google API credentials:
- Go to the Google Cloud Console
- Create a new project or select an existing one
- Enable the Gmail API
- Create credentials (OAuth 2.0 Client ID) for a Desktop application
- Download the credentials file and save it as
credentials.jsonin the project directory
-
Run the script:
python main.py
-
First-time authentication:
- The script will open your browser for Google OAuth authentication
- Grant the necessary permissions (read-only access to Gmail)
- The authentication token will be saved for future runs
- Monitor progress:
- The script provides real-time progress updates
- You can safely interrupt (Ctrl+C) and resume later
- Progress is automatically saved every 100 processed messages
The script generates several JSON files:
unique_senders.json: All senders sorted alphabetically by emailfrequent_senders.json: Senders sorted by email frequency (most frequent first)recent_senders.json: Senders sorted by recency (most recent first)sender_progress.json: Progress tracking file (for resuming)gmail_extractor.log: Detailed execution logs
Each sender entry contains:
{
"name": "John Doe",
"email": "[john.doe@example.com](mailto:john.doe@example.com)",
"original_from": "John Doe [john.doe@example.com](mailto:john.doe@example.com)",
"count": 25,
"last_date": "2024-01-15T10:30:00+00:00",
"last_date_str": "2024-01-15 10:30:00"
}
You can modify these variables in main.py:
save_frequency: How often to save progress (default: 100 messages)SCOPES: Gmail API permissions (default: read-only)- File names for output and progress files
- Email Migration: Analyze your Gmail patterns before switching to Proton Mail or other providers
- Filter Setup: Create email filters and folder structures based on sender frequency
- Spam Analysis: Identify high-volume senders for potential filtering
- Contact Management: Export your email contacts for import into other systems
- Email Auditing: Understand your email communication patterns
- Processing speed: ~10-50 messages/second (depending on API limits)
- Memory usage: Minimal (processes messages one at a time)
- Storage: JSON files are typically 1-10MB for most Gmail accounts
- Resumability: Can handle Gmail accounts with 100k+ emails safely
Authentication Issues:
- Ensure
credentials.jsonis in the project directory - Delete
token.jsonand re-authenticate if needed - Verify Gmail API is enabled in Google Cloud Console
Rate Limiting:
- The script handles rate limits automatically
- If you encounter persistent rate limits, the script will retry
Interrupted Processing:
- Simply re-run the script - it will resume from where it left off
- Check
gmail_extractor.logfor detailed error information
Large Mailboxes:
- For Gmail accounts with 50k+ emails, expect several hours of processing
- The script saves progress regularly, so interruptions are safe
- Read-Only Access: The script only reads your Gmail data, never modifies or deletes
- Local Processing: All data is processed locally on your machine
- No Data Transmission: Sender information stays on your computer
- Revocable Access: You can revoke API access anytime in your Google Account settings