Skip to content

Latest commit

 

History

History
184 lines (168 loc) · 11.3 KB

File metadata and controls

184 lines (168 loc) · 11.3 KB

semcode usage guide

All semcode functions are git aware and default to lookups on the current commit. You can also pass a specific commit you're interested in, or a branch name.

Note on Regex Patterns: All regex patterns in semcode are case-insensitive by default. This applies to all pattern matching including function names, commit messages, symbols, and lore email searches. You don't need to use the (?i) flag.

Branch Support: Most query tools support a branch parameter as an alternative to git_sha. When you specify a branch name (e.g., "main", "develop"), it will be resolved to the current tip commit of that branch. Branch takes precedence over git_sha if both are provided.

find_function: search for functions and macros

  • git_sha: indicates which commit to search (default: current)
  • branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
  • name: function/macro name, or a regex
  • also displays details on callers and callees find_type: search for types and typedefs
  • git_sha: indicates which commit to search (default: current)
  • branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
  • name: type/typedef name or regex find_callers: find all functions that call a function or macro
  • git_sha: indicates which commit to search (default: current)
  • branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
  • name: function to search find_calls: find all functions called by a function or macro
  • git_sha: indicates which commit to search (default: current)
  • branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
  • name: function to search find_callchain: search complete function/macro call chain (forward and reverse)
  • git_sha: indicates which commit to search (default: current)
  • branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
  • name: function or macro to search
  • up_levels: number of caller levels to show (default: 2, 0 = unlimited)
  • down_levels: number of callee levels to show (default: 3, 0 = unlimited)
  • calls_limit: max calls to show per level (default: 15, 0 = unlimited) diff_functions: extract functions and types from a unified diff
  • diff_content: the string to analyze
  • Use this to determine which symbols are involved in a given diff grep_functions: search function/macro bodies for a regex
  • git_sha: indicates which commit to search (default: current)
  • branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
  • pattern: the regex to search for
  • verbose: boolean, if true show full function bodies (default: false)
  • path_pattern: optional regex to filter results by path
  • limit: max number of results to return (default: 100, 0 = unlimited)
  • this only searches inside functions or macros, there's no need to escape your pattern to limit the search. vgrep_functions: vector embedding search on functions/macros/types
  • git_sha: indicates which commit to search (default: current)
  • branch: branch name to search (alternative to git_sha, e.g., "main", "develop")
  • query_text: text describing the kind of functions to find (e.g., "memory allocation", "string comparison")
  • path_pattern: optional regex to filter results by path
  • limit: max number of results to return (default: 10, max: 100)
  • Embedding searches are only useful when you want to search for broad concepts that a regex won't find well.
  • The database might not have embeddings indexed find_commit: search for changes, potentially in a range of commits
  • This can return a large body of results. Use pagination to manage context
  • git_ref: single commit ref to lookup (sha, short sha, branch, HEAD etc)
  • git_range: optional git range to search multiple commits: HEAD~10..HEAD etc cannot be combined with git_ref
  • author_patterns: optional array of regex to filter by author name/email (OR logic)
  • subject_patterns: optional array of regex to filter by subject line (OR logic)
  • regex_patterns: optional array of regex patterns to filter commits.
    • All patterns are AND'd together
    • Applied against the combination of commit message and unified diff
  • symbol_patterns: optional array of regex of symbols to search for
    • Use this to quickly find commits changing a function or type (w/regex)
  • path_patterns: optional regex to filter commits based on which files they change. Multiple regex can be passed and will be OR'd together
  • page: optional page number for pagination (1-based). Each page contains 50 lines, results indicate current page and total pages. Default: full results
  • reachable_sha: optional git sha, filter results to only those reachable from the sha provided. Mutually exclusive with git_range
  • verbose: show full diff in addition to metadata (default: false) vcommit_similar_commits: search commits based on vector embeddings
  • git_range: optional git range to search multiple commits: HEAD~10..HEAD etc
  • query_text: search text
  • author_patterns: optional array of regex to filter by author name/email (OR logic)
  • subject_patterns: optional array of regex to filter by subject line (OR logic)
  • regex_patterns: array of regex AND'd together to limit search results
  • symbol_patterns: array of regex AND'd together to limit search results based on symbols changed in the commit
  • path_patterns: optional regex to filter commits based on which files they change. Multiple regex can be passed and will be OR'd together
  • limit: max results to return (default 10, max 50)
  • reachable_sha: optional git sha, filter results to only those reachable from the sha provided. Mutually exclusive with git_range
  • page: optional page number for pagination (1-based). Each page contains 50 lines, results indicate current page and total pages. Default: full results lore_search: search lore.kernel.org email archives
  • from_patterns: optional array of regex to filter by sender (OR logic)
  • subject_patterns: optional array of regex to filter by subject (OR logic)
  • body_patterns: optional array of regex to filter by message body (OR logic)
  • symbols_patterns: optional array of regex to filter by symbols in patches (OR logic)
  • recipients_patterns: optional array of regex to filter by recipients (OR logic)
  • message_id: optional exact message ID for direct lookup
  • verbose: show full message body (default: false)
  • show_thread: show full email thread for each match (default: false)
  • show_replies: show replies/subthreads under each match (default: false, mutually exclusive with show_thread)
  • limit: max number of results (default: 100, 0 = unlimited)
  • since_date: filter emails from this date onwards (e.g., "yesterday", "2 weeks ago", "2024-01-15")
  • until_date: filter emails up to this date
  • mbox: output in MBOX format with full headers and body (default: false)
  • page: optional page number for pagination (1-based). Each page contains 50 lines, results indicate current page and total pages. Default: full results dig: find lore.kernel.org emails related to a git commit
  • commit: git commit reference (SHA, short SHA, HEAD, branch name, etc.)
  • verbose: show full message body (default: false)
  • show_all: show all duplicate results, not just most recent (default: false)
  • show_thread: show full thread for each result (use with show_all, default: false)
  • show_replies: show replies/subthreads under each result (use with show_all, mutually exclusive with show_thread)
  • since_date: filter emails from this date onwards
  • until_date: filter emails up to this date
  • page: optional page number for pagination (1-based). Each page contains 50 lines, results indicate current page and total pages. Default: full results vlore_similar_emails: semantic vector search over lore.kernel.org emails
  • query_text: text describing the kind of emails to find (e.g., "memory leak fix", "performance optimization")
  • from_patterns: optional array of regex to filter by sender (OR logic)
  • subject_patterns: optional array of regex to filter by subject (OR logic)
  • body_patterns: optional array of regex to filter by message body (OR logic)
  • symbols_patterns: optional array of regex to filter by symbols in patches (OR logic)
  • recipients_patterns: optional array of regex to filter by recipients (OR logic)
  • limit: max number of results to return (default: 20, max: 100)
  • since_date: filter emails from this date onwards
  • until_date: filter emails up to this date
  • page: optional page number for pagination (1-based). Each page contains 50 lines, results indicate current page and total pages. Default: full results
  • The database might not have lore embeddings indexed list_branches: list all indexed branches with their status
  • No parameters required
  • Shows branch names, indexed commit SHAs, and freshness status
  • up-to-date: indexed commit matches current branch tip
  • outdated: branch has new commits since indexing (re-index to update)
  • Useful for tracking multiple stable branches (e.g., linux-5.10.y, 6.1.y, 6.12.y) and knowing when they need re-indexing after new releases compare_branches: compare two branches and show their relationship
  • branch1: first branch name (e.g., "main")
  • branch2: second branch name (e.g., "feature-branch")
  • Shows merge base, ahead/behind status, and indexing status for both branches indexing_status: check the status of background indexing operation
  • No parameters required
  • Shows current indexing progress, errors, and timing

Lazy Loading

To reduce the initial context size consumed by the MCP server (saving ~96% of initial tokens), you can start the server in lazy mode using the --lazy flag.

In lazy mode, the server initially exposes only 3 meta-tools:

list_categories: List available tool categories

  • No parameters required
  • Returns a list of categories (e.g., code_lookup, code_search) and their descriptions
  • Use this first to discover what semcode can do

get_tools: Get full schemas for tools in a category

  • category: The name of the category to inspect (from list_categories)
  • Returns the full tool definitions for all tools in that category
  • Use this to learn how to call specific tools

call_tool: Execute a specific tool

  • tool_name: Name of the tool to execute (e.g., find_function)
  • arguments: Object containing the arguments for the tool
  • Use this to run tools after you've discovered them

Workflow:

  1. Call list_categories to see available functionality
  2. Call get_tools for a relevant category (e.g., code_lookup)
  3. Call call_tool to execute the desired tool (e.g., find_function)

Recipes

Searching for commits reachable from HEAD (or any other git sha)

If a repository heavily cherry-picks patches, it might have a backported commit under a different git sha. This means the most effective way to find the backported commit is searching by commit subject:

semcode> commit -r "bnxt_en: Fix memory corruption when FW resources change during ifdown"
semcode> commit -r "bnxt_en: Fix memory corruption when FW resources change during ifdown" --reachable HEAD

❌ WRONG: reachable_sha=HEAD + git_range=HEAD5000..HEAD ❌ WRONG: git_range=HEAD5000..HEAD ✅ CORRECT: reachable_sha=HEAD only (no git_range)