This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Correctness is our number one priority. No hacks, no workarounds, no shortcuts.
- Code must be maximally correct - if something feels wrong, it probably is
- Match established semantics (Kubernetes, SQL/PostgreSQL, Rust idioms)
- Special cases are acceptable ONLY when they reflect actual underlying semantics (e.g., ConfigMaps have
datanotspecbecause that's how K8s works) - NEVER assume or decide to take shortcuts without consulting me first
- Any deviations from standards, best practices, or established patterns must be discussed before implementation
- Anything that would trigger the "spidey senses" of a very senior engineer/architect should make you pause and ask
- When in doubt, ask - do not proceed with assumptions
Performance is our second priority - still very important.
- Minimize API calls to Kubernetes (push filters to API when possible)
- Parallel queries across clusters
- Efficient data structures and algorithms
- But never sacrifice correctness for performance
- Do NOT add Claude attributions to commit messages (no "Generated with Claude Code", no "Co-Authored-By: Claude")
- Write clear, conventional commit messages focusing on what changed and why
- Always run
cargo fmtbefore committing - Always verify with
cargo clippyfor lint issues before committing
- Use the Serena MCP with rust-analyzer for code navigation and symbol lookup
- Prefer symbolic tools (find_symbol, find_referencing_symbols, get_symbols_overview) over grep for Rust code
k8sql exposes the Kubernetes API as a SQL-compatible database using Apache DataFusion as the query engine. kubectl contexts are treated as databases (switchable with USE cluster1;) and Kubernetes resources are exposed as tables. The _cluster column is part of every table's primary key, enabling cross-cluster queries.
k8sql supports both SQL and PRQL (Pipelined Relational Query Language) with automatic detection - just write either language and it works.
cargo build # Debug build
cargo build --release # Release build
cargo test # Run all tests# Interactive REPL (default)
k8sql
# Batch mode - SQL queries
k8sql -q "SELECT * FROM pods"
k8sql -q "SELECT name, status->>'phase' as phase FROM pods WHERE namespace = 'kube-system'"
# Batch mode - PRQL queries (auto-detected)
k8sql -q "from pods | take 5"
k8sql -q "from pods | filter namespace == 'kube-system' | select {name, namespace} | take 10"
# Multi-cluster queries
k8sql -q "SELECT * FROM pods WHERE _cluster = 'prod'"
k8sql -q "SELECT * FROM pods WHERE _cluster = '*'" # All clusters
# Batch mode with context and output format
k8sql -c prod-cluster -q "SELECT * FROM deployments" -o json
# Daemon mode (PostgreSQL wire protocol)
k8sql daemon --port 15432src/
├── main.rs # Entry point, CLI argument parsing, mode dispatch
├── cli/
│ ├── mod.rs
│ ├── args.rs # Clap argument definitions
│ └── repl.rs # Rustyline REPL with completion/highlighting
├── datafusion_integration/
│ ├── mod.rs
│ ├── context.rs # K8sSessionContext - DataFusion session setup
│ ├── provider.rs # K8sTableProvider - TableProvider implementation
│ ├── convert.rs # JSON to Arrow RecordBatch conversion
│ ├── preprocess.rs # Query preprocessing (PRQL compilation, JSON paths, ->> precedence)
│ ├── prql.rs # PRQL detection and compilation to SQL
│ ├── json_path.rs # JSON path syntax preprocessor (.field, [n], [])
│ └── hooks.rs # Query hooks for filter extraction
├── kubernetes/
│ ├── mod.rs # ApiFilters type for filter pushdown
│ ├── cache.rs # Resource discovery cache persistence
│ ├── client.rs # K8sClientPool - multi-cluster connection caching
│ ├── discovery.rs # Dynamic resource discovery and schema generation
│ └── field_selectors.rs # Field selector registry and types for K8s API pushdown
├── output/
│ ├── mod.rs # QueryResult type and format dispatch
│ ├── table.rs # Pretty table output
│ ├── json.rs # JSON output
│ ├── csv.rs # CSV output
│ └── yaml.rs # YAML output
└── daemon/
├── mod.rs
└── pgwire_server.rs # PostgreSQL wire protocol server
k8sql uses Apache DataFusion as its SQL query engine:
-
K8sSessionContext (
context.rs): Wraps DataFusion's SessionContext, registers all K8s resources as tables, provides JSON functions for querying nested fields. -
K8sTableProvider (
provider.rs): Implements DataFusion's TableProvider trait. Fetches data from Kubernetes API on scan, extracts and pushes filters (namespace, cluster, label selectors) to the API. -
JSON to Arrow conversion (
convert.rs): Converts K8s JSON resources to Arrow RecordBatches. Metadata fields use native Arrow types (Timestamp for dates, Int64 for integers); labels/annotations/spec/status are stored as JSON strings. -
Query Preprocessing (
preprocess.rs): Three-stage pipeline: (1) PRQL-to-SQL compilation if needed, (2) JSON path syntax conversion, (3) arrow operator precedence fix. Ensures both SQL and PRQL users get intuitive JSON access. -
JSON Path Preprocessor (
json_path.rs): Converts intuitive dot notation (status.phase,spec.containers[0].image) to PostgreSQL arrow operators. Uses DataFusion's SQL tokenizer to avoid matching inside strings. Supports array indexing[n]and array expansion[]via UNNEST. -
PRQL Support (
prql.rs): Detects PRQL queries (starting withfrom,let, orprql) and compiles them to SQL using the prqlc compiler. Auto-detection means users can write either language without mode switching.
Dynamic resource discovery at startup:
- Discovers all available resources including CRDs
- Generates consistent schema:
_cluster,api_version,kind, metadata fields,spec/statusas JSON - Special schemas for ConfigMaps, Secrets, Events, and metrics resources
- Core resources (v1, apps, batch, etc.) take priority over conflicting names from other API groups
Schema definitions use a type-safe enum for Arrow data types:
pub enum ColumnDataType {
Text, // Arrow Utf8 - strings, JSON blobs
Timestamp, // Arrow Timestamp(ms) - creation/deletion timestamps
Integer, // Arrow Int64 - generation, count fields
}Helper constructors simplify schema definition:
ColumnDef::text("name", "metadata.name")
ColumnDef::timestamp("created", "metadata.creationTimestamp")
ColumnDef::integer("generation", "metadata.generation")
ColumnDef::text_raw("_cluster") // No JSON path (virtual column)Manages connections to multiple Kubernetes clusters:
- Caches clients by context name
- Caches resource registries with TTL
- Supports parallel queries across clusters
Discovery Cache (30-minute TTL):
- Resource discovery (CRDs, schemas, available tables) is cached for 30 minutes
USEcommands use cached discovery for instant context switching- Cache reduces API load and improves performance with large kubeconfigs
- Only discovery metadata is cached - actual resource queries always hit the API
- Force fresh discovery: restart k8sql or use
--refresh-crdsflag
Retry Logic:
- Connection and discovery failures retry up to 5 times with exponential backoff (100ms, 200ms, 400ms, 800ms, 1600ms base delays)
- ±25% jitter added to prevent thundering herd when multiple clients retry simultaneously
- Handles intermittent network issues and proxy problems
- All requested contexts must succeed - no partial failures (ensures predictability for scripting)
- datafusion: SQL query engine (Apache Arrow-based)
- datafusion-functions-json: JSON querying functions (
json_get_str,json_get_int, etc.) - datafusion-postgres: PostgreSQL wire protocol for daemon mode
- prqlc: PRQL compiler (compiles PRQL to SQL)
- kube: Kubernetes client (kube-rs)
- k8s-openapi: Kubernetes API type definitions (v1.32)
- rustyline: REPL with readline support
- clap: CLI argument parsing
- fastrand: Fast random number generation (for retry jitter)
All resources share a consistent schema:
| Column | Arrow Type | Description |
|---|---|---|
_cluster |
Utf8 | Kubernetes context name |
api_version |
Utf8 | API version (v1, apps/v1, etc.) |
kind |
Utf8 | Resource kind (Pod, Deployment, etc.) |
name |
Utf8 | Resource name |
namespace |
Utf8 | Namespace (null for cluster-scoped) |
uid |
Utf8 | Unique identifier |
created |
Timestamp(ms) | Creation timestamp (native, supports date comparisons) |
generation |
Int64 | Spec change counter (native, supports numeric comparisons) |
labels |
Utf8 (JSON) | JSON object, access with labels->>'key' |
annotations |
Utf8 (JSON) | JSON object, access with annotations->>'key' |
spec |
Utf8 (JSON) | Resource specification |
status |
Utf8 (JSON) | Resource status |
Special cases:
- ConfigMaps/Secrets:
datacolumn instead of spec/status - Events:
type,reason,message,count, etc. - PodMetrics/NodeMetrics:
timestamp,window,containers/usage - CustomResourceDefinitions:
group,scope,resource_kind,plural,singular,short_names,categories(queryable ascrdsorcustomresourcedefinitions)
| SQL Condition | K8s API Optimization |
|---|---|
namespace = 'x' |
Uses namespaced API endpoint |
labels->>'app' = 'nginx' |
K8s label selector |
labels->>'app' = 'x' AND labels->>'env' = 'y' |
Combined label selector: app=x,env=y |
status->>'phase' = 'Running' |
K8s field selector (pods only) |
spec->>'nodeName' = 'node-1' |
K8s field selector (pods only) |
type = 'Opaque' |
K8s field selector (secrets only) |
name = 'pod-123' |
K8s field selector (metadata.name) |
_cluster = 'prod' |
Only queries that cluster |
_cluster = '*' |
Queries all clusters in parallel |
Field Selectors (src/kubernetes/field_selectors.rs):
- Limited to
=and!=operators (K8s restriction) - Resource-specific support (e.g.,
status.phasefor pods,typefor secrets) - Hardcoded registry based on K8s API documentation
- Namespace field selector intentionally not used (namespaced API endpoints are more efficient)
Supported resources and fields:
- All resources:
metadata.name - Pods:
spec.nodeName,spec.restartPolicy,spec.schedulerName,spec.serviceAccountName,spec.hostNetwork,status.phase,status.podIP,status.nominatedNodeName - Events:
involvedObject.kind,involvedObject.namespace,involvedObject.name,involvedObject.uid,involvedObject.apiVersion,involvedObject.resourceVersion,involvedObject.fieldPath,reason,reportingComponent,source,type - Secrets:
type - Namespaces:
status.phase - ReplicaSets:
status.replicas - Jobs:
status.successful - Nodes:
spec.unschedulable - CertificateSigningRequests:
spec.signerName
- Unsupported field selectors (fields not in registry)
- LIKE patterns
- Complex expressions
- ORDER BY, LIMIT, GROUP BY
- Operators not supported by K8s field selectors (IN, NOT IN, etc.)
Use json_get_array() with UNNEST for array expansion:
SELECT json_get_str(UNNEST(json_get_array(spec, 'containers')), 'image') FROM podsProjection Pushdown (automatic optimization):
- k8sql automatically skips converting unrequested JSON columns
SELECT name FROM podsonly processes ~50 bytes per pod (vs ~2KB for full resource)SELECT name, namespace, labels FROM podsprocesses ~200 bytes per pod- Impact: 10-40x reduction in CPU/memory for selective queries
- No user action needed - optimization happens automatically
Query Performance Best Practices:
- Select only needed columns:
SELECT name, namespace FROM podsis much faster thanSELECT * FROM pods - Use namespace filters:
WHERE namespace = 'kube-system'uses efficient namespaced API endpoint - Use label selectors:
WHERE labels->>'app' = 'nginx'pushes to K8s API (reduces network transfer) - Use field selectors:
WHERE status->>'phase' = 'Running'for pods pushes to K8s API - Cluster filters:
WHERE _cluster = 'prod'avoids querying unnecessary clusters - Combine filters: Multiple AND conditions on labels/fields push to K8s as combined selectors
What to Avoid:
SELECT * FROM podswhen you only need a few columns (wastes CPU converting large JSON fields)- Queries without namespace/cluster filters on large clusters (fetches all resources)
- Client-side filtering (LIKE, complex expressions) when API-pushable filters exist
- Tab: Auto-complete keywords/tables/columns
- ↑/↓: Navigate history
- Ctrl+D: Quit
- \dt: SHOW TABLES (displays: table_name, aliases, group, version, kind, scope, resource_type)
- \l: SHOW DATABASES
- \d table: DESCRIBE table
- \x: Toggle expanded display
- \q: Quit
PRQL (Pipelined Relational Query Language) is supported as an alternative to SQL. Queries are auto-detected based on syntax - no mode switching required.
- PRQL: Queries starting with
from,let, orprql - SQL: Queries starting with
SELECT,WITH,SHOW,DESCRIBE,EXPLAIN,USE, etc.
# Basic query
from pods | take 5
# Filter and select
from pods
filter namespace == "kube-system"
select {name, namespace, created}
take 10
# Sort and limit
from deployments
filter namespace == "default"
sort name
take 20
# Aggregation
from pods
group namespace (aggregate {count = count this})
sort {-count}
| PRQL | SQL |
|---|---|
from pods | take 5 |
SELECT * FROM pods LIMIT 5 |
from pods | filter namespace == "x" |
SELECT * FROM pods WHERE namespace = 'x' |
from pods | select {name, namespace} |
SELECT name, namespace FROM pods |
from pods | sort created |
SELECT * FROM pods ORDER BY created |
from pods | sort {-created} |
SELECT * FROM pods ORDER BY created DESC |
k8sql provides intuitive dot notation for accessing nested JSON fields. This works in both SQL and PRQL:
| Syntax | Meaning | Example |
|---|---|---|
.field |
Object field access | status.phase |
[n] |
Array index (0-based) | spec.containers[0] |
[] |
Array expansion (UNNEST) | spec.containers[] |
Conversion Examples:
| Input | Converted To |
|---|---|
status.phase |
status->>'phase' |
spec.selector.app |
spec->'selector'->>'app' |
p.status.phase |
p.status->>'phase' |
spec.containers[0].image |
spec->'containers'->0->>'image' |
spec.containers[].image |
(UNNEST(json_get_array(spec->'containers', NULL)))->>'image' |
SQL Examples:
-- Simple field access
SELECT name, status.phase FROM pods WHERE status.phase = 'Running'
-- Array indexing
SELECT name, spec.containers[0].image FROM pods
-- Array expansion (one row per container)
SELECT name, spec.containers[].image AS container_image FROM pods
-- With table alias
SELECT p.name, p.spec.containers[0].image FROM pods pPRQL Examples:
# Simple field access (no s-strings needed!)
from pods
filter status.phase == "Running"
select {name, phase = status.phase}
# Array indexing
from pods
select {name, image = spec.containers[0].image}
# Array expansion
from pods
select {name, container_image = spec.containers[].image}
Known JSON Columns (auto-detected):
labels,annotations,spec,status,data,owner_references- CRD fields detected from OpenAPI schema (object/array types)
Non-JSON columns (preserved as-is):
pods.name→ stayspods.name(table.column reference)public.pods→ stayspublic.pods(schema.table reference)
Advanced: Raw Arrow Operators
For complex cases, use PostgreSQL arrow operators directly:
-- Already-converted syntax is not double-processed
SELECT spec->>'replicas' FROM deployments
-- Complex JSON functions
SELECT json_get_str(UNNEST(json_get_array(spec, 'containers')), 'image') FROM podsIn PRQL, use s-strings for raw SQL when needed:
from pods
select {name, containers_json = s"spec->'containers'"}