This guide explains how to use semcode's lore.kernel.org email archive search features in both the query tool and the MCP server.
Semcode can index and search lore.kernel.org email archives, providing Full Text Search (FTS) with regex post-filtering and semantic search capabilities across mailing list archives. This is particularly useful for finding discussions about specific commits, patches, or topics in kernel development.
All lore searches use a two-phase approach for fast, precise results:
-
Phase 1: FTS (Full Text Search) - Fast keyword-based search using inverted indices
- Patterns are normalized by extracting keywords (special chars stripped)
- Example:
"[PATCH v2]"→"PATCH v2"for FTS - Returns a superset of candidates very quickly
-
Phase 2: Regex Post-Filter - Precise filtering in memory
- Original regex pattern applied to FTS results
- Ensures exact matching (e.g.,
"clm@meta.com"matches exactly) - Fast because operating on small FTS result set
All lore search commands (lore, dig, vlore) support date filtering with --since and --until flags to restrict results to specific time ranges.
The date filters accept flexible formats for user convenience:
Absolute Dates:
YYYY-MM-DDformat (e.g.,"2024-01-15")- Times are assumed to be midnight (00:00:00) UTC
Relative Dates:
"today"- Start of current day (00:00:00 UTC)"yesterday"- Start of previous day (00:00:00 UTC)"N days ago"- N days before current time (e.g.,"7 days ago","30 days ago")"N weeks ago"- N weeks before current time (e.g.,"2 weeks ago")"N months ago"- N months before current time (e.g.,"3 months ago")
- RFC 2822 Format: Email dates are stored in RFC 2822 format (e.g.,
"Thu, 21 Nov 2019 14:22:24 -0800") - Temporal Comparison: Dates are parsed and compared as datetime objects (not string comparison)
- Inclusive Ranges:
--sinceincludes emails from that date onwards (≥)--untilincludes emails up to and including that date (≤)
- Combine Both: Use both flags to define a specific date range
# Recent activity
lore -s "PATCH" --since "7 days ago" # Patches from last week
vlore --since "today" "memory leak" # Today's emails about memory leaks
# Specific time periods
lore -f torvalds --since "2024-01-01" --until "2024-03-31" # Q1 2024
dig --since "2023-01-01" --until "2023-12-31" HEAD # All of 2023
# Open-ended ranges
lore -b btrfs --since "2024-01-01" # From 2024 onwards
lore -g malloc --until "2023-12-31" # Before 2024
# Relative dates for recent searches
vlore --since "yesterday" "kernel bug" # Yesterday and today
dig --since "30 days ago" abc123 # Last month of discussion- Recent activity: Find current discussions with
--since "7 days ago" - Historical research: Study specific time periods with
--since YYYY-MM-DD --until YYYY-MM-DD - Version tracking: Filter by release dates to see discussion before/after a release
- Performance: Reduce result sets by limiting to relevant time periods
Before searching, you need to clone and index a lore archive:
# Index a lore archive (e.g., linux kernel mailing list)
semcode-index --lore lkml
# Index multiple lists at once
semcode-index --lore lkml,bpf
# The archive will be cloned to <db_dir>/lore/<repo_name>
# In this example: .semcode.db/lore/linux-kernelTo fetch new emails and index them without re-specifying archive names:
# Refresh all previously cloned lore archives
semcode-index --loreWhen called without arguments, --lore discovers all git repositories under
<db_dir>/lore/, fetches new commits from each remote, and indexes any emails
not yet in the database. Use this for a simple one-command workflow to keep
lore archives up to date.
To enable semantic search with the vlore command:
# Generate vector embeddings for indexed emails
semcode-index --lore lkml --vectorsNote: Vector generation is optional but required for the vlore command.
Search lore emails using regex patterns on different fields (from, subject, body, recipients, symbols).
Syntax:
lore [-v] [-m <message_id>] [-f <regex>] [-s <regex>] [-b <regex>] [-t <regex>] [-g <regex>] [--limit <N>] [--since <date>] [--until <date>] [--thread] [--replies] [-o <output_file>]
Options:
-v- Verbose mode: show full message bodies-m <message_id>- Look up a specific email by Message-ID-f <regex>- Filter by From address (can be specified multiple times)-s <regex>- Filter by Subject line (can be specified multiple times)-b <regex>- Filter by message Body (can be specified multiple times)-t <regex>- Filter by recipients (To/Cc) (can be specified multiple times)-g <regex>- Filter by symbols mentioned in any patches (can be specified multiple times)--limit <N>- Maximum number of results (default: 100)--since <date>- Only show emails from this date onwards (see Date Filtering section below)--until <date>- Only show emails up to this date (see Date Filtering section below)--thread- Show full email threads for each match (walks up to root, then shows all descendants)--replies- Show all replies/subthreads under each match (shows descendants only, not ancestors)-o <output_file>- Write output to file instead of stdout
Note: --thread and --replies are mutually exclusive.
Filter Logic:
- Multiple filters for the same field are combined with OR logic
- Example:
-f torvalds -f gregkhmatches emails from torvalds OR gregkh
- Example:
- Filters for different fields are combined with AND logic
- Example:
-f torvalds -b btrfsmatches emails from torvalds AND body contains btrfs
- Example:
Regex Tips:
- Case-insensitive by default: All regex patterns are automatically case-insensitive
- Example:
-s 'patch'matches "patch", "PATCH", "Patch", etc. - No need to use the
(?i)flag
- Example:
- For multiline matching (e.g., matching start of line within email body), use the
(?m)flag- Example:
-b '(?m)^Signed-off-by'matches "Signed-off-by" at the start of any line - Without
(?m),^and$only match the start/end of the entire field
- Example:
Examples:
# Search by subject
lore -s "memory leak"
# Search with verbose output and limit
lore -v -s "performance" --limit 50
# Search by sender
lore -f "torvalds@linux-foundation.org"
# Search by recipient
lore -t "netdev@vger.kernel.org"
# Search message body
lore -b "Signed-off-by.*Linus"
# Search by symbols mentioned in patches
lore -g "malloc"
lore -g "struct.*page"
# Look up specific email by Message-ID
lore -m "<20241201120000.12345@kernel.org>"
# Show threads (full thread including ancestors)
lore -v -f "torvalds" --thread
lore -v -s "memory leak" --thread --limit 5
# Show replies only (descendants, not ancestors)
lore -v -s "RFC" --replies
lore -m "<message.id@example.com>" --replies
# Combine filters (AND across fields)
lore -b btrfs -f clm@meta.com # Body contains btrfs AND from clm@meta.com
lore -f torvalds -f gregkh -b "memory leak" # From torvalds OR gregkh AND body contains memory leak
lore -g "schedule.*" -f "torvalds" # Symbols match schedule.* AND from torvalds
# Write output to file
lore -v -s "memory leak" -o results.txt # Save verbose results to file
lore -f torvalds --thread -o threads.txt # Save thread view to file
# Date filtering
lore -s "memory leak" --since "2024-01-01" # Emails from Jan 1, 2024 onwards
lore -f torvalds --until "2023-12-31" # Emails up to Dec 31, 2023
lore -b btrfs --since "7 days ago" # Emails from last week
lore -s "PATCH" --since "yesterday" # Emails from yesterday onwards
lore -g malloc --since "2024-01-01" --until "2024-06-30" # First half of 2024Output Format:
- Summary view (default): Date, subject, from, Message-ID, and threading info
- Verbose view (
-v): Includes full message body - Thread view (
--thread): Shows complete email threads in chronological order (walks up to root, then shows entire thread) - Replies view (
--replies): Shows all replies/subthreads under each match (descendants only, useful for seeing discussion that followed)
Search for lore emails related to a specific git commit by matching the commit's subject line. Results are ordered by date (newest first).
Syntax:
dig [-v] [-a] [--since <date>] [--until <date>] [--thread] [--replies] <commit>
Options:
-v- Verbose mode: show full message bodies-a- Show all matching emails (default: only most recent)--since <date>- Only show emails from this date onwards (see Date Filtering section below)--until <date>- Only show emails up to this date (see Date Filtering section below)--thread- Show full email threads for each match--replies- Show all replies/subthreads under each match<commit>- Any git reference (SHA, short SHA, branch name, HEAD, etc.)
Note: --thread and --replies are mutually exclusive.
Examples:
# Show most recent match thread for HEAD commit
dig HEAD
# Show most recent match with message body
dig -v abc123
# Show all matches (summary)
dig -a v6.5
# Show all matches with full threads
dig -a --thread HEAD
# Show all matches with threads and bodies
dig -v -a --thread abc123
# Show all matches with just replies (no ancestors)
dig -a --replies HEAD
# Show replies to most recent match
dig --replies abc123
# Date filtering
dig --since "2024-01-01" HEAD # Only emails from 2024 onwards
dig --until "2023-12-31" abc123 # Only emails from before 2024
dig -a --since "30 days ago" HEAD # All matches from last 30 days
dig --since "2024-01-01" --until "2024-06-30" v6.5 # First half of 2024How It Works:
- Resolves the git reference to a commit SHA
- Extracts the commit's subject line
- Searches lore emails for exact subject matches
- Shows results ordered by date (newest first)
- By default shows only the most recent match; use
-ato see all
Use Cases:
- Find mailing list discussion about a specific patch
- See review feedback for a commit
- Track the history of how a patch evolved from email to merge
Search for lore emails similar to the provided text using semantic vector embeddings. This allows you to find conceptually related discussions even when exact keywords don't match.
Syntax:
vlore [-f <from_regex>] [-s <subject_regex>] [-b <body_regex>] [-g <symbols_regex>] [-t <recipients_regex>] [--limit <N>] [--since <date>] [--until <date>] <query_text>
Options:
-f <from_regex>- Filter results by From address (can be specified multiple times)-s <subject_regex>- Filter results by Subject (can be specified multiple times)-b <body_regex>- Filter results by message Body (can be specified multiple times)-g <symbols_regex>- Filter results by symbols mentioned in patches (can be specified multiple times)-t <recipients_regex>- Filter results by Recipients/To/Cc (can be specified multiple times)--limit <N>- Maximum number of results (default: 20, max: 100)--since <date>- Only show emails from this date onwards (see Date Filtering section below)--until <date>- Only show emails up to this date (see Date Filtering section below)<query_text>- Search query (required)
Prerequisites: Vector embeddings must be generated first:
semcode-index --lore <url> --vectorsExamples:
# Basic semantic search
vlore "memory leak fix"
# With custom limit
vlore --limit 10 "performance optimization"
# Filter by sender
vlore -f "torvalds" "merge pull request"
# Multiple subject filters (OR logic)
vlore -s "RFC" -s "PATCH" "new feature"
# Body filter
vlore -b "Signed-off-by.*Linus" "kernel patch"
# Symbol filter
vlore -g "malloc" "memory management"
# Recipients filter
vlore -t "netdev@vger.kernel.org" "network patch"
# Date filtering
vlore --since "2024-01-01" "memory leak fix" # Emails from 2024 onwards
vlore --until "2023-12-31" "performance optimization" # Emails from before 2024
vlore --since "30 days ago" "kernel bug" # Recent emails from last month
vlore --since "2024-01-01" --until "2024-06-30" "btrfs" # First half of 2024When to Use Semantic vs Regex Search:
- Use
vlorefor: Finding conceptually similar discussions, broad topic searches, when you're not sure of exact keywords - Use
lorefor: Exact pattern matching, specific authors or subjects, precise filtering
Export all indexed lore emails to a JSON file for external processing.
Syntax:
dump-lore <output_file>
Example:
dump-lore emails.jsonOutput Format: JSON array of email objects with fields: message_id, subject, from, date, body, recipients, in_reply_to, references.
The semcode MCP server exposes the same lore search functionality for use with Claude Desktop and other MCP clients.
Search lore emails with regex filters. Same functionality as the query tool's lore command.
Parameters:
message_id(string, optional) - Specific Message-ID to look upfrom_patterns(array of strings, optional) - From address regex patterns (OR logic)subject_patterns(array of strings, optional) - Subject regex patterns (OR logic)body_patterns(array of strings, optional) - Body regex patterns (OR logic)recipient_patterns(array of strings, optional) - Recipient regex patterns (OR logic)symbols_patterns(array of strings, optional) - Symbols regex patterns (OR logic)limit(number, optional) - Maximum results (default: 100)since_date(string, optional) - Only show emails from this date onwards (see Date Filtering section below)until_date(string, optional) - Only show emails up to this date (see Date Filtering section below)verbose(boolean, optional) - Show full message bodies (default: false)show_thread(boolean, optional) - Show full threads (default: false)show_replies(boolean, optional) - Show all replies/subthreads (default: false, mutually exclusive with show_thread)
Example Usage in Claude Desktop:
When you ask Claude to search lore archives, it will automatically use this tool:
"Search lore for emails from Linus Torvalds about btrfs"
"Find emails with subject containing 'memory leak' and show threads"
"Look up email with message_id <20241201120000.12345@kernel.org>"
Search for lore emails related to a git commit. Same functionality as the query tool's dig command.
Parameters:
commit(string, required) - Git reference (SHA, short SHA, branch, HEAD, etc.)verbose(boolean, optional) - Show full message bodies (default: false)show_all(boolean, optional) - Show all matches vs most recent (default: false)since_date(string, optional) - Only show emails from this date onwards (see Date Filtering section below)until_date(string, optional) - Only show emails up to this date (see Date Filtering section below)show_thread(boolean, optional) - Show full threads (default: false)show_replies(boolean, optional) - Show all replies/subthreads (default: false, mutually exclusive with show_thread)
Example Usage in Claude Desktop:
"Find lore emails related to commit abc123"
"Show all lore discussions about HEAD commit with threads"
Semantic vector search for similar lore emails. Same functionality as the query tool's vlore command.
Parameters:
query_text(string, required) - Search queryfrom_patterns(array of strings, optional) - From address filterssubject_patterns(array of strings, optional) - Subject filtersbody_patterns(array of strings, optional) - Body filterssymbols_patterns(array of strings, optional) - Symbols filtersrecipients_patterns(array of strings, optional) - Recipients/To/Cc filterslimit(number, optional) - Maximum results (default: 20, max: 100)since_date(string, optional) - Only show emails from this date onwards (see Date Filtering section below)until_date(string, optional) - Only show emails up to this date (see Date Filtering section below)verbose(boolean, optional) - Show full message bodies (default: false)
Prerequisites:
Vector embeddings must be generated with semcode-index --lore <url> --vectors.
Example Usage in Claude Desktop:
"Find lore emails similar to 'memory leak fix'"
"Search for emails like 'performance optimization' from Linus"
Lore emails are stored in the lore table with the following structure:
| Field | Type | Description |
|---|---|---|
| message_id | string | Unique Message-ID (primary key) |
| subject | string | Email subject line |
| from | string | Sender email address |
| date | string | ISO 8601 timestamp |
| body | string | Full message body (headers stripped) |
| recipients | string | Comma-separated To/Cc recipients |
| symbols | JSON array | List of symbols (functions, types, macros) extracted from patches |
| in_reply_to | string | Message-ID of parent email (if reply) |
| references | string | Space-separated Message-IDs of thread ancestors |
| git_commit_subject | string | Extracted commit subject (for patches) |
| commit_sha | string | Git SHA (for patches) |
Vector embeddings (if generated) are stored in the lore_vectors table.
# Find discussion about a specific commit
dig abc123
# See all versions and reviews
dig -a --thread abc123# All emails from a specific maintainer about a topic
lore -f "torvalds@" -b "btrfs"
# Show full threads
lore -f "torvalds@" -b "btrfs" --thread
# Show just the replies to see discussion that followed
lore -f "torvalds@" -b "btrfs" --replies# Semantic search for broad topic
vlore "memory management improvements"
# Exact pattern search
lore -s "mm:" -b "page.*allocation"# Find all patches in a series
lore -s "\[PATCH.*\]" -f "developer@example.com" --limit 50
# Show as threads to see review flow
lore -s "\[PATCH v2" --thread
# Show just the replies to see what reviewers said
lore -s "\[PATCH v2" --replies# Find patches that modify a specific function
lore -g "malloc"
# Find patches touching memory management structures
lore -g "struct.*page"
# Combine symbol search with other filters
lore -g "schedule.*" -f "torvalds"
lore -g "mutex_lock" -s "\[PATCH" --limit 20
# Find patches modifying multiple related symbols (OR logic)
lore -g "kmalloc" -g "kfree" -g "vmalloc"-
Start broad, then narrow: Use semantic search (
vlore) to find topics, then use regex search (lore) for precision -
Use threading wisely: The
--threadflag is powerful but verbose. Use it when you need full context, not for initial exploration -
Choose between --thread and --replies:
- Use
--threadwhen you want to see the complete discussion from the beginning (includes ancestors) - Use
--replieswhen you want to see only the responses to a specific email (excludes ancestors) - Example: For a patch email,
--repliesshows you what reviewers said without the full version history
- Use
-
Leverage git integration: The
digcommand is the easiest way to find discussion about commits you're already looking at in git -
Combine filters effectively: Remember that same-field filters use OR logic, while different-field filters use AND logic
-
Watch your limits: Large result sets can be overwhelming. Use
--limitto control output size -
Message-ID lookups: If you see an interesting Message-ID in results, use
lore -m <message_id>to view the full email -
Export for analysis: Use
dump-lorewhen you need to process emails with external tools or scripts
You need to index a lore archive first:
semcode-index --lore https://lore.kernel.org/linux-kernelVector embeddings need to be generated:
semcode-index --lore <url> --vectors- Check your regex patterns - they may be too restrictive
- Try broadening your search or removing some filters
- Use semantic search (
vlore) if regex isn't finding what you need
FTS indices need to be created after indexing:
# FTS indices are automatically created during --lore indexing
# If you see this error, re-run the indexing
rm -rf .semcode.db
semcode-index --lore <url>You can omit the < > brackets when using -m:
lore -m message.id@domain.com # Works
lore -m <message.id@domain.com> # Also works- MCP Server Documentation - Full MCP server setup and usage
- Schema Documentation - Complete database schema details
- Query Tool Guide - General query tool usage