This feature introduces the capability for users to seamlessly integrate content from GitHub repositories directly into their LLM prompts. By providing the LLM with specific, up-to-date code, documentation, or other textual content from a GitHub repository, we aim to significantly enhance the LLM's understanding, reduce hallucinations, and improve the relevance and accuracy of its responses, especially for code-related assistance.
1. Core Concept
- GitHub Node: A new type of node that users can add to their canvas graph. It represents a specific GitHub repository linked by the user.
2. GitHub Node Functionality & Lifecycle
The GitHub Node will manage the connection to and content of a specified GitHub repository:
-
Node Creation & Repository Input:
- Users can add a new "GitHub Repository" node to their canvas.
- The node requires a GitHub repository URL as input. This must support both public and private repositories.
- Accepted URL formats:
HTTPS (e.g., https://github.com/user/repo.git) and SSH (e.g., git@github.com:user/repo.git).
- Initial state: The node will be visually marked as "Unpulled" or "Disconnected".
-
Authentication for Private Repositories:
- Global Settings: A new section within the application's global settings will be dedicated to "Repository Credentials" or "GitHub SSH Keys".
- Users can securely add one or more SSH private keys. The system will store these keys encrypted at rest and use them to authenticate when cloning/pulling private repositories.
-
Pulling Repository Content:
- Trigger: A prominent "Pull" button will be present on the GitHub Node's UI. Clicking this button initiates the cloning/pulling process.
- Backend Process:
- The backend will clone the specified GitHub repository to a local directory.
- Each cloned repository will be stored in a unique folder, named with a UUID. This UUID will be associated with the user, the repository URL, and the node instance in the database to ensure isolation and traceability.
- Authentication for private repos will leverage the SSH keys configured in settings.
- Progress Feedback: The node's UI should display a loading indicator or progress bar during the pulling process.
- Visual Update: Upon successful completion, the node's visual state updates (e.g., "Pulled," green indicator, last pulled timestamp).
- Error Handling: If the pull fails (e.g., invalid URL, authentication error, network issue), the node should visually indicate an error state, and provide specific, user-friendly feedback (e.g., "Authentication Failed," "Repository Not Found").
-
Content Filtering & Configuration (Advanced Settings Pop-up):
- Access: Once a repository has been successfully pulled, a "Configure Content" or "Filter Files" button will appear on the GitHub Node. Clicking this opens a new modal/pop-up window.
- Purpose: This pop-up allows users to precisely control which files and directories from the cloned repository will be considered for LLM context.
- Filtering Logic: Implement a powerful filtering mechanism inspired by
.gitignore syntax. Users can define rules to include or exclude files based on:
- File Names:
README.md
- File Extensions:
*.py, !*.min.js
- Folder Paths:
src/components/, !node_modules/
- Specific Paths:
/config/secrets.json
- Rule Precedence: Rules defined later should override earlier rules.
- Default Behavior: If no custom configuration is provided, the system should default to including common text-based source code and documentation files, while excluding common binary files, large data files, and typical build/dependency directories (e.g.,
node_modules, dist, .git).
- Preview: The pop-up could show a real-time preview of which files would be included based on the current filtering rules.
3. LLM Prompt Integration
-
Mention Syntax in Chat Input:
- In the AI Chat view's text input area, users can reference specific files from a linked GitHub node or a global repository using a dedicated mention syntax:
@git:<repo-alias>:<file-path>
<repo-alias>: A user-defined short name for the attached GitHub node or the global repository (e.g., "my-project", "docs").
<file-path>: The full path to the file within the repository (e.g., src/main.py, docs/api.md).
- Autocompletion: As the user types
@git:, the system should provide intelligent autocompletion:
- First, suggest available
repo-aliases (from attached nodes and global repos).
- Once a
repo-alias is selected, suggest file paths within that repository, respecting the configured content filters.
-
Global Repositories (Application Settings):
- A new section in the application's global settings will allow users to define "Global GitHub Repositories."
- These repositories are always available for mention in any canvas or chat, without needing to attach a specific GitHub Node.
- Each global repository will also have its own content filtering configuration (similar to node-specific filtering).
- Authentication for global private repos will use the SSH keys configured in the global settings.
-
Backend Prompt Construction:
- Trigger: When a user sends a chat message.
- Context Check: The backend will perform the following checks:
- Is a GitHub Node attached to the current chat's Generation Node via an "attachment" handle?
- Does the user's message contain
@git: mentions (referencing either an attached node's repo or a global repo)?
- Content Aggregation:
- For each successfully mentioned file, its content is retrieved from the locally cloned repository.
- If the user intends to include the entire filtered repository (e.g., by mentioning the repo alias without a specific file, or if a "include all filtered files" option is set in the filtering config), all files respecting the content filters will be concatenated.
- LLM-Friendly Concatenation: The retrieved file contents must be concatenated into the LLM prompt in a structured, clear format that helps the LLM understand file boundaries and origins.
- Example Format:
--- Start of file: <repo-alias>/<file-path> ---
<file content>
--- End of file: <repo-alias>/<file-path> ---
This format aids the LLM in differentiating between multiple files.
-
Token Management (CRITICAL):
- Pre-flight Token Calculation: Before sending the prompt to the LLM, the system must calculate the total token count, including the user's message, aggregated GitHub content, and relevant chat history.
- Handling Token Overages:
- Prioritization: A clear strategy is needed for truncation. Generally, the user's explicit prompt takes highest priority. GitHub content might be truncated next, followed by older chat history.
- Truncation: If the total token count exceeds the LLM's limit, the aggregated GitHub content should be truncated first. This could involve:
- Truncating individual files.
- Prioritizing smaller files over larger ones.
- User Feedback: Inform the user if content was truncated due to token limits (e.g., via a toast notification or a message in the chat).
4. Advanced User Settings
5. Technical Considerations & Edge Cases
- Security:
- SSH Key Management: Implement robust security measures for storing and accessing SSH private keys (e.g., encrypted at rest, strict access controls, never exposed to frontend).
- SSRF Prevention: Validate and sanitize all repository URLs to prevent Server-Side Request Forgery attacks.
- Malicious Content: Implement safeguards against cloning excessively large repositories, infinite redirects, or potentially malicious content (e.g., file size limits per clone, timeouts).
- Performance:
- Cloning Large Repositories: Provide clear progress feedback. Consider optimizing for shallow clones if full history is not needed.
- File Reading: Efficiently read and concatenate potentially many files.
- Token Calculation: Optimize token counting for large context windows.
- Storage:
- Plan for local disk space usage for cloned repositories.
- Implement a cleanup strategy for inactive or deleted repository clones.
- Error Handling:
- Comprehensive error handling for all stages: invalid repo links, authentication failures, network issues, file not found after mention, parsing errors in content.
- Clear error messaging to the user in the UI.
- UI/UX:
- Clear visual states for the GitHub Node (unpulled, pulling, pulled, error).
- Intuitive and responsive autocompletion for
@git: mentions.
- User-friendly interface for the content filtering pop-up.
- Visual feedback for token limits and content truncation.
- Scalability: Consider the impact of many users cloning and updating numerous repositories on backend resources.
This feature introduces the capability for users to seamlessly integrate content from GitHub repositories directly into their LLM prompts. By providing the LLM with specific, up-to-date code, documentation, or other textual content from a GitHub repository, we aim to significantly enhance the LLM's understanding, reduce hallucinations, and improve the relevance and accuracy of its responses, especially for code-related assistance.
1. Core Concept
2. GitHub Node Functionality & Lifecycle
The GitHub Node will manage the connection to and content of a specified GitHub repository:
Node Creation & Repository Input:
HTTPS(e.g.,https://github.com/user/repo.git) andSSH(e.g.,git@github.com:user/repo.git).Authentication for Private Repositories:
Pulling Repository Content:
Content Filtering & Configuration (Advanced Settings Pop-up):
.gitignoresyntax. Users can define rules to include or exclude files based on:README.md*.py,!*.min.jssrc/components/,!node_modules//config/secrets.jsonnode_modules,dist,.git).3. LLM Prompt Integration
Mention Syntax in Chat Input:
@git:<repo-alias>:<file-path><repo-alias>: A user-defined short name for the attached GitHub node or the global repository (e.g., "my-project", "docs").<file-path>: The full path to the file within the repository (e.g.,src/main.py,docs/api.md).@git:, the system should provide intelligent autocompletion:repo-aliases (from attached nodes and global repos).repo-aliasis selected, suggest file paths within that repository, respecting the configured content filters.Global Repositories (Application Settings):
Backend Prompt Construction:
@git:mentions (referencing either an attached node's repo or a global repo)?Token Management (CRITICAL):
4. Advanced User Settings
"Always Resend GitHub Content" (Per Node/Conversation Setting):
"Auto Pull" (Per Repository Setting):
@git:mention in the chat input for a repository with "Auto Pull" enabled, triggering a check.5. Technical Considerations & Edge Cases
@git:mentions.