Skip to content

Latest commit

 

History

History
442 lines (325 loc) · 13.4 KB

File metadata and controls

442 lines (325 loc) · 13.4 KB

SQLMind logo

SQLMind

SQL intelligence layer for LLM agents

by VeloceAI.in

Python Tests License PyPI Dialects MCP

Stop writing SQL prompts. Start reasoning in execution order.
SQLMind gives agents a property graph schema, a 7-phase generation protocol,
and a 5-layer validation gate — across 10 SQL engines, in any framework.

Get Started · Docs · Contributing

SQLMind hero

How it works

Stage What happens Needs API key?
01 Property Graph DSL / live DB / DDL → TableNodes, ColumnNodes, FK edges No
02 Schema Linking NL question → matched tables, columns, BFS join paths No
03 Phase-locked SQL LLM reasons FROM→WHERE→GROUP BY→HAVING→SELECT→ORDER BY→LIMIT Yes (LLM only)
04 Validation Gate 5 layers: Phase · JOIN · Schema · Dialect · Syntax No

The API key only touches stage 3 — the LLM that decides which tool to call. All tools, the graph engine, validation, and dialect rules are pure Python.


Why SQLMind

Feature SQLMind Raw LLM LangChain SQL Agent
Phase-locked reasoning ✅ Built-in 7-phase protocol ❌ Write-order only ❌ Single prompt
Schema as property graph ✅ TableNodes + FK edges ❌ Raw text only ⚠️ DDL string injection
Join path discovery ✅ BFS over FK graph ❌ Guesses from text ❌ Not available
Pre-execution validation ✅ 5 independent layers ❌ None ⚠️ Basic syntax only
Self-correction loop ✅ Targeted, max 2 retries ❌ None ⚠️ Retry on exec error
Multi-dialect support ✅ 10 dialects, editable YAML ❌ Ad-hoc per prompt ⚠️ Dialect param only
MCP server ✅ 8 tools, editor-ready ❌ None ❌ None
Tools need API key ✅ Pure Python — no key ❌ Always ❌ Always

Quick Start

Install

SQLMind is not yet on PyPI. Install directly from source:

# Clone the repo
git clone https://github.com/Veloce-AI/sqlmind.git
cd sqlmind

# Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate        # Linux / Mac
.venv\Scripts\activate           # Windows

# Install with all extras
pip install -e ".[all]"

# Or just the core + database + graph
pip install -e ".[db,graph]"

PyPI release coming soon. Once published, pip install sqlmind will work.

Set your API key

cp .env.example .env
# Edit .env — add your key:
# ANTHROPIC_API_KEY=sk-ant-...

5 commands to get SQL generated

# 1. Inspect a schema file
python sqlmind_graph.py inspect examples/schema.sqlmind.yaml

# 2. Find a join path between two tables
python sqlmind_graph.py join-path examples/schema.sqlmind.yaml orders products

# 3. Schema-link a natural language question
python sqlmind_graph.py link examples/schema.sqlmind.yaml "revenue by customer region"

# 4. Export a Mermaid ERD
python sqlmind_graph.py erd examples/schema.sqlmind.yaml > schema.mmd

# 5. Generate SQL from natural language (calls Anthropic API)
python sqlmind_graph.py generate examples/schema.sqlmind.yaml \
  "top 10 customers by total revenue last month" \
  --dialect postgresql

Repo Structure

sqlmind/
├── sqlmind_graph.py              ← Property graph engine (SchemaGraph, DialectRegistry, CLI)
├── sqlmind_mcp_server.py         ← MCP server (8 tools)
├── sqlmind.py                    ← Standalone Python ADK agent
├── dialects.yaml                 ← User-editable dialect config (10 engines)
├── SKILL.md                      ← Claude Code skill (phase-locked protocol)
├── CLAUDE.md                     ← Drop in project root to activate skill
├── docs/
│   ├── sqlmind_hero.svg          ← Wide hero image (1360×680) for GitHub README top
│   └── logo.svg                  ← Standalone icon (200×200) for favicon / social preview
├── integrations/
│   ├── sqlmind_google_adk.py     ← Google ADK agent
│   ├── sqlmind_openai_agents.py  ← OpenAI Agents SDK agent
│   └── sqlmind_langgraph.py      ← LangGraph agent + FastAPI server
├── tests/
│   ├── test_graph.py             ← Graph engine tests (14 tests)
│   └── test_dialects.py          ← Dialect registry tests
├── examples/
│   └── schema.sqlmind.yaml       ← Example schema (orders/customers/products)
├── pyproject.toml
├── .env.example
├── CONTRIBUTING.md
└── .github/workflows/ci.yml      ← GitHub Actions CI (pytest on push)

How the Property Graph Works

from sqlmind_graph import SchemaGraph

# Load from .sqlmind.yaml
graph = SchemaGraph().load_from_yaml("examples/schema.sqlmind.yaml")

# Load from a live database (requires sqlalchemy)
graph = SchemaGraph().load_from_db("postgresql://user:pass@localhost/mydb")

# Load from DDL
graph = SchemaGraph().load_from_ddl("""
    CREATE TABLE orders (
        id          SERIAL PRIMARY KEY,
        customer_id INTEGER REFERENCES customers(id),
        amount      NUMERIC(10,2)
    );
""")

# BFS join path discovery
path = graph.find_join_path("orders", "products")
print(path.to_sql())
# orders
# INNER JOIN order_items ON orders.id = order_items.order_id
# INNER JOIN products ON order_items.product_id = products.id

# Alias-aware column validation
errors = graph.validate_sql_columns("SELECT o.bad_col FROM orders o")
# → [{"type": "COLUMN_NOT_FOUND", "table": "orders", "column": "bad_col", ...}]

Schema DSL Format

The .sqlmind.yaml file is the canonical schema format:

tables:
  orders:
    name: orders
    description: "Customer purchase orders"
    row_count: 2500000
    columns:
      - { name: id,          type: INT,       pk: true }
      - { name: customer_id, type: INT,       fk: true, references: customers.id, indexed: true }
      - { name: amount,      type: DECIMAL }
      - { name: status,      type: VARCHAR,   enum: [pending, confirmed, shipped, cancelled] }
      - { name: created_at,  type: TIMESTAMP, indexed: true }
edges:
  - { from_table: orders, from_col: customer_id, to_table: customers, to_col: id }

You can also write schemas in compact DSL notation (supported by load_from_dsl):

TABLE orders (
  id          INT     PK
  customer_id INT     FK→customers.id  IDX
  amount      DECIMAL
  status      VARCHAR [pending, confirmed, shipped, cancelled]
  created_at  TIMESTAMP IDX
)

Dialects

10 SQL engines supported out of the box — all rules live in dialects.yaml:

ID Engine Key differences
postgresql PostgreSQL ILIKE, :: cast, STRING_AGG, RETURNING
mysql MySQL / MariaDB Backticks, GROUP_CONCAT, no DATE_TRUNC
sqlite SQLite No RIGHT JOIN, strftime(), permissive GROUP BY
mssql SQL Server (T-SQL) TOP, GETDATE(), OFFSET/FETCH, [] identifiers
bigquery Google BigQuery Backtick tables, QUALIFY, ARRAY_AGG, EXCEPT DISTINCT
snowflake Snowflake QUALIFY, ILIKE, LISTAGG, TRY_CAST
redshift Amazon Redshift LISTAGG, GETDATE(), ILIKE, limited arrays
databricks Databricks SQL COLLECT_LIST, LATERAL VIEW EXPLODE, time travel
spark_sql Apache Spark SQL RLIKE, LATERAL VIEW EXPLODE, COLLECT_LIST
oracle Oracle Database FETCH FIRST, SYSDATE, NVL, CONNECT BY, MINUS

Add your own — copy any block in dialects.yaml and override the fields that differ.


MCP Server

Claude Code

# Add to Claude Code (stdio mode — works in any directory)
claude mcp add sqlmind --command python --args /path/to/sqlmind_mcp_server.py

# Or drop CLAUDE.md in your project root — Claude Code auto-discovers it

Once connected, 8 tools are available:

Tool What it does
sqlmind_introspect Connect to a live DB → returns SQLMind DSL schema
sqlmind_link_schema Map NL entities to tables, columns, and join paths
sqlmind_validate 7-phase + schema column check on any SQL string
sqlmind_build_prompt Build the phase-locked generation prompt
sqlmind_generate NL → SQL via Anthropic API, auto-validated
sqlmind_explain Run EXPLAIN ANALYZE and summarize the plan
sqlmind_transpile Convert SQL between dialects via sqlglot
sqlmind_score_complexity L1–L4 complexity score + generation strategy

Example prompt to Claude Code once connected:

Run sqlmind_introspect on postgresql://localhost/mydb, then use
sqlmind_generate to write SQL for "monthly revenue by customer tier".

Cursor

  1. Open Cursor Settings → MCP → Add Server
  2. Name: sqlmind | Command: python | Args: ["/absolute/path/sqlmind_mcp_server.py"]
  3. Restart Cursor. Tools appear in the Agent panel (Cmd+I / Ctrl+I).

Windsurf (Codeium)

Add to .windsurf/config.json in your project root:

{
  "mcp": {
    "servers": {
      "sqlmind": {
        "command": "python",
        "args": ["sqlmind_mcp_server.py"],
        "cwd": "/path/to/sqlmind"
      }
    }
  }
}

Reload Windsurf. The Cascade agent can now call all 8 sqlmind tools.

Codex / OpenAI Plugins / HTTP clients

# Start in HTTP mode
python sqlmind_mcp_server.py --transport http --port 8765

# Point any HTTP-based MCP client to:
# http://localhost:8765

Antispace / Any MCP-compatible client

Any tool that supports the MCP protocol can connect via stdio or HTTP mode. Use the command line above.


Agent Framework Integrations

Google ADK

from integrations.sqlmind_google_adk import create_sqlmind_agent

agent = create_sqlmind_agent(
    schema_path="examples/schema.sqlmind.yaml",
    dialect="bigquery",
)
result = await agent.run("top 5 products by revenue last quarter")
print(result.sql)

Deploy to Vertex AI Agent Engine:

gcloud ai agents deploy sqlmind-agent \
  --region=us-central1 \
  --source=integrations/sqlmind_google_adk.py

OpenAI Agents SDK

from integrations.sqlmind_openai_agents import build_sql_agent

agent = build_sql_agent(schema_yaml="examples/schema.sqlmind.yaml")
result = agent.run("monthly revenue by region")
print(result.final_output)

LangGraph + FastAPI

# Start the streaming FastAPI server
uvicorn integrations.sqlmind_langgraph:app --reload --port 8000

# POST a query
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"query": "orders placed last week", "dialect": "postgresql"}'

Running Tests

# Install dev dependencies
pip install -e ".[db,graph,dev]"

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=sqlmind_graph --cov-report=term-missing

Current suite: 18 tests, all passing (14 graph + 4 dialect).


Deploying the MCP Server

Docker

FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install -e ".[db,graph]"
EXPOSE 8765
CMD ["python", "sqlmind_mcp_server.py", "--transport", "http", "--port", "8765"]
docker build -t sqlmind-mcp .
docker run -p 8765:8765 -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY sqlmind-mcp

Railway

# railway.toml
[build]
builder = "NIXPACKS"

[deploy]
startCommand = "python sqlmind_mcp_server.py --transport http --port $PORT"
healthcheckPath = "/"
restartPolicyType = "ON_FAILURE"

CLI Reference

python sqlmind_graph.py <command> [options]

Commands:
  inspect    <file>              Print schema stats + DSL
  join-path  <file> <t1> <t2>   Find shortest join path between two tables
  link       <file> <query>      Schema-link a natural language query
  erd        <file>              Export Mermaid ERD to stdout
  generate   <file> <query>      Generate SQL via Anthropic API
               --dialect         Target SQL dialect (default: postgresql)
               --model           Model ID (default: claude-sonnet-4-6)

Contributors

Thanks to everyone who has contributed to SQLMind!

Contributions are welcome! See CONTRIBUTING.md for how to:

  • Add a new SQL dialect to dialects.yaml
  • Write a new validation rule
  • Add agent framework integrations
  • Run the test suite and open a PR

SQLMind · SQL intelligence layer for LLM agents

Developed with ♥ by VeloceAI.in — open source for the community

GitHub · Discord · MIT License