The Code Analysis Agent is a high-performance, AI-driven tool designed for automated code inspection, dependency analysis, and security research in controlled environments. It leverages modern Python frameworks (e.g., LangChain, Ollama, Tavily) to:
- Parse and analyze codebases for structural, logical, and security-related patterns.
- Automate dependency resolution and vulnerability assessment.
- Execute controlled experiments in isolated environments (e.g., sandboxed LLM interactions).
Key Use Cases:
- Security Research: Identify potential vulnerabilities in Python projects (e.g., SSRF risks, prompt injection vectors).
- Dependency Auditing: Enforce strict version pinning and detect supply-chain risks.
- Workflow Automation: Integrate with CI/CD pipelines for pre-deployment code analysis.
The agent is built on a modular architecture with the following core components:
- LangChain/LangGraph: Orchestrates multi-step analysis workflows (e.g., code parsing → LLM evaluation → report generation).
- Ollama Integration: Enables local LLM inference for offline analysis (e.g., Llama3, Mistral).
- Tavily Search: Augments analysis with real-time threat intelligence (e.g., CVE lookups).
uvPackage Manager: Ensures reproducible builds via locked dependency versions (uv.lock).- Pydantic: Validates configuration and input/output schemas.
- Structlog: Provides structured logging for auditing and debugging.
aiohttp: Async HTTP client for external API interactions (e.g., Tavily, SerpAPI).- Network Policies: Configurable allow/deny lists (e.g.,
deny-networks = ["*"]with exceptions for127.0.0.1).
- Environment Variables: Securely load API keys and settings via
python-dotenv. - Plugin System: Extend functionality with custom LangChain tools (e.g., static analysis, fuzzing).
- Python
3.10–3.12(seerequires-pythoninpyproject.toml). uv(recommended) orpipfor dependency management.- Optional: Local LLM (e.g., Ollama) for offline analysis.
-
Clone the Repository:
git clone https://github.com/your-repo/code-analysis-agent.git cd code-analysis-agent -
Install Dependencies:
- Using
uv(recommended):uv sync
- Using
pip:pip install -e .
- Using
-
Configure Environment Variables:
- Copy
.env.exampleto.envand populate with API keys (e.g.,TAVILY_API_KEY,SERPAPI_KEY). - Example:
TAVILY_API_KEY=your_key_here OLLAMA_MODEL=llama3
- Copy
-
Verify Installation:
python -c "from code_analysis import Agent; print('Agent loaded successfully')"
| Variable | Description | Default |
|---|---|---|
TAVILY_API_KEY |
API key for Tavily search (threat intelligence). | None |
SERPAPI_KEY |
API key for SerpAPI (Google search results). | None |
OLLAMA_MODEL |
Local LLM model name (e.g., llama3, mistral). |
llama3 |
LOG_LEVEL |
Logging verbosity (DEBUG, INFO, WARNING, ERROR). |
INFO |
ALLOW_NETWORK |
Comma-separated list of allowed domains (e.g., tavily.com,serpapi.com). |
127.0.0.1 |
Configure LLM behavior in config.yaml (or via CLI):
llm:
temperature: 0.2 # Lower for deterministic outputs
max_tokens: 4096 # Adjust based on model limits
tools: # Enable/disable LangChain tools
- "python_repl" # Caution: High-risk for code execution
- "tavily_search"Analyze a Python file for security risks:
python -m code_analysis analyze --file target.pyOutput:
- Dependency graph.
- Potential vulnerabilities (e.g., unsafe
eval()usage). - LLM-generated remediation suggestions.
Check for outdated or vulnerable dependencies:
python -m code_analysis audit --path /path/to/projectOutput:
- CVSS scores for vulnerable packages.
- Version upgrade recommendations.
Start an interactive session with the agent:
python -m code_analysis chat --model ollamaExample Prompt:
Analyze this code for SSRF risks:
```python
import aiohttp
url = input("Enter URL: ")
await aiohttp.get(url)
### **4. Batch Processing**
Analyze multiple repositories in parallel:
```bash
python -m code_analysis batch --repos repo1/ repo2/ --output reports/
- Input: Provide code (file/directory) or a Git repository URL.
- Parsing: The agent extracts dependencies, imports, and code structure.
- Analysis:
- Static analysis (e.g., AST parsing for unsafe functions).
- Dynamic analysis (e.g., LLM evaluation of code logic).
- Network checks (e.g., SSRF risks in HTTP clients).
- Reporting: Generate JSON/HTML reports with findings and remediations.
- Extend: Add custom LangChain tools for domain-specific analysis.
- Automate: Integrate with GitHub Actions or GitLab CI.
- Hardening: Review
IMPROVEMENTS.mdfor security best practices.