From adb479877998041c0b7d6beeb409c5b51ec975e5 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 07:40:28 +0200 Subject: [PATCH 01/25] Add DuckLake + Quack server combo on ducklake branch. Server bootstrap attaches a local DuckLake catalog (inventory table + Parquet under /work/lake), then quack_serve. Client e2e queries remote.lake.inventory through tailscale_quack_forward. Install ducklake in compose images alongside quack. Co-authored-by: Cursor --- docs/DUCKLAKE_TAILNET.md | 45 ++++++++------ examples/.env.example | 3 + examples/Dockerfile | 4 +- examples/README.md | 4 +- examples/docker-compose.yml | 4 ++ examples/ducklake/README.md | 68 ++++++++++++++++++++++ examples/ducklake/local-demo.sql | 41 +++++++++++++ scripts/e2e/quacktail-compose-bootstrap.sh | 47 +++++++++++++-- scripts/e2e/quacktail-entrypoint.sh | 15 +++-- scripts/lib/quacktail_ext.sh | 54 +++++++++++++++++ 10 files changed, 255 insertions(+), 30 deletions(-) create mode 100644 examples/ducklake/README.md create mode 100644 examples/ducklake/local-demo.sql diff --git a/docs/DUCKLAKE_TAILNET.md b/docs/DUCKLAKE_TAILNET.md index 6a516bf..efccc4d 100644 --- a/docs/DUCKLAKE_TAILNET.md +++ b/docs/DUCKLAKE_TAILNET.md @@ -1,38 +1,45 @@ -# DuckLake over QuackTail (planned) +# DuckLake over QuackTail -Goal: serve DuckLake (or SQLite / Postgres-backed catalogs) on a node via **Quack**, reachable only on the **Headscale tailnet**, with **discovery** similar to `quack_discover()`. +Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Headscale tailnet**, with discovery similar to `quack_discover()`. -## Target architecture +## Status on branch `ducklake` + +| Piece | Status | +|-------|--------| +| Server: local DuckLake + `quack_serve` + `tailscale_serve_local` | **Done** (compose bootstrap) | +| Client: `tailscale_quack_forward` + `remote.lake.*` queries | **Done** (compose e2e) | +| `ducklake_discover()` / enriched `quack_discover` | TBD | +| Client `ducklake:quack:` attach with client-side `DATA_PATH` | Documented, not in compose e2e | +| CI DuckLake profile | TBD | + +## Architecture ```text ┌─────────────────────┐ tailnet ┌─────────────────────┐ │ quacktail-client │ ◄──────────────► │ quacktail-server │ -│ ATTACH quack:… │ │ quack_serve │ -│ quack_discover() │ │ ATTACH ducklake:… │ -└─────────────────────┘ │ (lake catalog) │ +│ ATTACH quack:… │ │ ATTACH ducklake:… │ +│ remote.lake.* │ │ quack_serve │ +└─────────────────────┘ │ Parquet → /work/lake/data └─────────────────────┘ ``` -1. **Server** joins tailnet, runs `quack_serve`, attaches local DuckLake (or other catalog) as the served DuckDB session catalog. -2. **Client** joins tailnet, `quack_discover()` finds `quack::9494`, `ATTACH`es, queries `remote..`. -3. **Discovery extension** (future QuackScale work): advertise DuckLake URIs alongside Quack URIs, e.g. columns `listen_uri`, `catalog_type` (`quack`, `ducklake`, `sqlite`), `attach_hint`. +1. **Server** joins tailnet, attaches DuckLake (`lake` catalog, local metadata + Parquet), runs `quack_serve` + `tailscale_serve_local`. +2. **Client** joins tailnet, `tailscale_quack_forward`, `ATTACH`es Quack at `127.0.0.1:19494`, queries `remote.lake.inventory`. -## Constraints (today) +## Constraints -- **Quack streaming-scan limit** — one remote read or write per SQL statement on an attached catalog; see [QUACK_STREAMING.md](QUACK_STREAMING.md). DuckLake workloads often use separate statements or server-side execution, so parallelism is less blocked than multi-scan single statements on ATTACH. -- **Nested catalogs** — Quack ATTACH exposes the server's session catalogs; deep names like `remote.lake.schema.table` may need `quack_query()` until Quack nested-catalog support lands ([duckdb#22605](https://github.com/duckdb/duckdb/issues/22605)). +- **Quack streaming-scan limit** — one remote read or write per SQL statement; see [QUACK_STREAMING.md](QUACK_STREAMING.md). +- **Nested catalogs** — `remote.lake.table` works when the server attached `lake` before `quack_serve`. Deep paths may need `quack_query()` until upstream nested-catalog support lands. -## Demo recipe (next step after compose e2e) +## Demo -1. Server bootstrap SQL: `INSTALL ducklake; LOAD ducklake; ATTACH 'ducklake:…' AS lake …;` then `quack_serve`. -2. Client: same compose flow as today; `SELECT * FROM remote.lake.main.my_table LIMIT 5`. -3. CI: extend `scripts/ci_headscale_e2e.sh` with optional DuckLake profile (Postgres or local metadata). +See [examples/ducklake/README.md](../examples/ducklake/README.md). ## QuackScale changes (not in core `quack`) | Piece | Owner | Notes | |-------|--------|------| -| Tailnet join, `quack_uri`, `quack_discover` | quackscale | Done | -| Compose / Headscale demo | quackscale | Done | -| `ducklake_discover()` or enriched `quack_discover` | quackscale | TBD — metadata from server whoami / config | +| Tailnet join, `tailscale_quack_forward` | quackscale | Done | +| Compose DuckLake server bootstrap | quackscale | Done on `ducklake` branch | +| `ducklake_discover()` or enriched `quack_discover` | quackscale | TBD | | Quack multi-scan planner | duckdb-quack | Upstream | diff --git a/examples/.env.example b/examples/.env.example index dad1ac1..0791504 100644 --- a/examples/.env.example +++ b/examples/.env.example @@ -7,6 +7,9 @@ QUACK_PORT=9494 QUACK_FORWARD_LOCAL_PORT=19494 BUILD_FROM_SOURCE=1 QUACKTAIL_RELEASE_TAG=v1.0.2 +QUACKTAIL_ENABLE_DUCKLAKE=1 +QUACKTAIL_LAKE_NAME=lake +QUACKTAIL_LAKE_DATA_PATH=/work/lake/data # Headscale preauth key (generated at bootstrap — NOT the Quack token above): # docker compose exec -T quacktail-server cat /work/authkey diff --git a/examples/Dockerfile b/examples/Dockerfile index a2f10ab..36fd618 100644 --- a/examples/Dockerfile +++ b/examples/Dockerfile @@ -78,7 +78,9 @@ RUN mkdir -p /duckdb_extensions \ cp -a /opt/quacktail-build/quackscale-ext/. /duckdb_extensions/; \ fi \ && duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core; LOAD quack; SELECT 1;" \ - || duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core_nightly; LOAD quack; SELECT 1;" + || duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core_nightly; LOAD quack; SELECT 1;" \ + && duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL ducklake FROM core; LOAD ducklake; SELECT 1;" \ + || duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL ducklake FROM core_nightly; LOAD ducklake; SELECT 1;" COPY scripts/e2e/quacktail-entrypoint.sh /usr/local/bin/quacktail-entrypoint.sh COPY scripts/e2e/quacktail-compose-bootstrap.sh /usr/local/bin/quacktail-compose-bootstrap.sh diff --git a/examples/README.md b/examples/README.md index ba70fe2..06136f4 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,6 +1,8 @@ # QuackTail Docker Compose example -Two-node **Headscale + QuackTail** demo on Linux: a long-lived **server** DuckDB joins the tailnet and serves Quack on port 9494; a one-shot **client** joins the same tailnet, forwards Quack HTTP through tsnet, and `ATTACH`es the remote database. +Two-node **Headscale + QuackTail** demo on Linux: server joins the tailnet and serves Quack; client `ATTACH`es via `tailscale_quack_forward`. + +**DuckLake combo (branch `ducklake`):** [ducklake/README.md](ducklake/README.md) **Requires:** Linux, Docker Compose v2, `/dev/net/tun`, outbound HTTPS. diff --git a/examples/docker-compose.yml b/examples/docker-compose.yml index 70f20d2..10d070f 100644 --- a/examples/docker-compose.yml +++ b/examples/docker-compose.yml @@ -29,6 +29,10 @@ x-env: &env SERVER_HOST: quacktail-server CLIENT_HOST: quacktail-client DUCKDB_EXTENSION_DIRECTORY: /duckdb_extensions + QUACKTAIL_ENABLE_DUCKLAKE: ${QUACKTAIL_ENABLE_DUCKLAKE:-1} + QUACKTAIL_LAKE_NAME: ${QUACKTAIL_LAKE_NAME:-lake} + QUACKTAIL_LAKE_METADATA: ${QUACKTAIL_LAKE_METADATA:-/work/lake/metadata/inventory.ducklake} + QUACKTAIL_LAKE_DATA_PATH: ${QUACKTAIL_LAKE_DATA_PATH:-/work/lake/data} x-headscale-volumes: &headscale_volumes configs: diff --git a/examples/ducklake/README.md b/examples/ducklake/README.md new file mode 100644 index 0000000..886e324 --- /dev/null +++ b/examples/ducklake/README.md @@ -0,0 +1,68 @@ +# DuckLake + Quack on QuackTail + +Branch **`ducklake`** extends the compose demo: the server attaches a local DuckLake catalog, seeds an `inventory` table, then `quack_serve` exposes it on the tailnet. The client queries `remote.lake.inventory` through `tailscale_quack_forward`. + +## Architecture + +```text +quacktail-server quacktail-client +───────────────── ───────────────── +tailscale_up tailscale_up +ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward + └─ /work/lake/data/*.parquet ATTACH quack:127.0.0.1:19494 AS remote +quack_serve(127.0.0.1:9494) SELECT * FROM remote.lake.inventory +tailscale_serve_local +``` + +Parquet files live on the **server** (`QUACKTAIL_LAKE_DATA_PATH`). Clients reach the catalog over Quack — no direct file sharing. + +## Run the demo + +Same commands as the base compose demo (on this branch): + +```bash +cd examples +docker compose build quacktail-server quacktail-client +docker compose up -d --force-recreate headscale quacktail-server +docker compose --profile test run --rm quacktail-client +``` + +Expect `PASSED` with `inventory_rows = 2` and two inventory rows `(101, 50)`, `(102, 120)`. + +Set `QUACKTAIL_ENABLE_DUCKLAKE=0` to run the original Quack-only e2e. + +## Environment + +| Variable | Default | +|----------|---------| +| `QUACKTAIL_ENABLE_DUCKLAKE` | `1` | +| `QUACKTAIL_LAKE_NAME` | `lake` | +| `QUACKTAIL_LAKE_METADATA` | `/work/lake/metadata/inventory.ducklake` | +| `QUACKTAIL_LAKE_DATA_PATH` | `/work/lake/data` | + +## Local SQL reference + +See [local-demo.sql](local-demo.sql) for the standalone DuckLake + Quack pattern (single host, no tailnet). + +**Tailnet client** (after `tailscale_quack_forward`): + +```sql +LOAD quack; +CREATE SECRET (TYPE quack, TOKEN 'quackscale-demo-token', SCOPE 'quack:127.0.0.1:19494'); +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); +SELECT * FROM remote.lake.inventory; +``` + +**Direct DuckLake-over-Quack attach** (metadata via Quack URI — optional pattern): + +```sql +LOAD ducklake; +LOAD quack; +CREATE SECRET (TYPE quack, TOKEN 'your_token', SCOPE 'quack:127.0.0.1:19494'); +ATTACH 'ducklake:quack:127.0.0.1:19494' AS my_lake (DATA_PATH '/path/to/local/parquet/'); +USE my_lake; +``` + +Use this when Parquet files are local to the client; the compose demo uses server-side storage and `remote.lake.*` instead. + +See [docs/DUCKLAKE_TAILNET.md](../docs/DUCKLAKE_TAILNET.md) for roadmap (discovery, CI profile). diff --git a/examples/ducklake/local-demo.sql b/examples/ducklake/local-demo.sql new file mode 100644 index 0000000..ea84c92 --- /dev/null +++ b/examples/ducklake/local-demo.sql @@ -0,0 +1,41 @@ +-- DuckLake + Quack on one host (no tailnet). Requires DuckDB 1.5+ with quack + ducklake from core. +-- +-- Server session (terminal 1): +-- duckdb server.duckdb +-- +-- Client session (terminal 2): +-- duckdb + +-- === Server === +INSTALL quack FROM core; +INSTALL ducklake FROM core; +LOAD quack; +LOAD ducklake; + +ATTACH 'ducklake:./lake/metadata/inventory.ducklake' AS lake (DATA_PATH './lake/data/'); +USE lake; + +CREATE TABLE IF NOT EXISTS inventory (item_id INT, quantity INT); +INSERT INTO inventory VALUES (101, 50), (102, 120); + +CALL quack_serve( + 'quack:127.0.0.1:9494', + allow_other_hostname => true, + token => 'quackscale-demo-token' +); + +-- === Client (new duckdb process) === +-- INSTALL quack FROM core; +-- INSTALL ducklake FROM core; +-- LOAD quack; +-- LOAD ducklake; +-- +-- CREATE SECRET (TYPE quack, TOKEN 'quackscale-demo-token', SCOPE 'quack:127.0.0.1:9494'); +-- +-- Option A: query lake tables via Quack attach +-- ATTACH 'quack:127.0.0.1:9494' AS remote (TYPE quack); +-- SELECT * FROM remote.lake.inventory; +-- +-- Option B: DuckLake metadata via Quack URI (local Parquet path on client) +-- ATTACH 'ducklake:quack:127.0.0.1:9494' AS my_lake (DATA_PATH './lake/data/'); +-- SELECT * FROM my_lake.inventory; diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index 5533a07..86d6a00 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -8,6 +8,10 @@ CLIENT_HOST="${CLIENT_HOST:-quacktail-client}" QUACK_PORT="${QUACK_PORT:-9494}" QUACK_FORWARD_LOCAL_PORT="${QUACK_FORWARD_LOCAL_PORT:-19494}" QUACK_TOKEN="${QUACK_TAILNET_TOKEN:-quackscale-demo-token}" +LAKE_NAME="${QUACKTAIL_LAKE_NAME:-lake}" +LAKE_METADATA="${QUACKTAIL_LAKE_METADATA:-${WORK}/lake/metadata/inventory.ducklake}" +LAKE_DATA_PATH="${QUACKTAIL_LAKE_DATA_PATH:-${WORK}/lake/data}" +ENABLE_DUCKLAKE="${QUACKTAIL_ENABLE_DUCKLAKE:-1}" CONTROL_URL="${HEADSCALE_CONTROL_URL:-http://headscale:8080}" HS_USER="${HEADSCALE_USER:-quackscale-demo}" HS_CFG="${HEADSCALE_CONFIG:-/etc/headscale/config.yaml}" @@ -74,6 +78,22 @@ SQL fi } +write_server_ducklake_sql() { + [[ "$ENABLE_DUCKLAKE" == "1" ]] || return 0 + mkdir -p "$(dirname "$LAKE_METADATA")" "$LAKE_DATA_PATH" + cat >"$WORK/server_ducklake.sql" <"$WORK/client_session.sql" </dev/null; }; then + write_server_ducklake_sql + echo "✓ server ducklake SQL ready — ${LAKE_NAME} @ ${LAKE_METADATA}" + fi if [[ "${COMPOSE_REFRESH_CLIENT_SQL:-}" == "1" ]] \ || [[ ! -f "$WORK/client_session.sql" ]] \ || [[ ! -f "$WORK/client_init.sql" ]] \ @@ -257,7 +291,8 @@ if [[ -f "$WORK/server_setup.sql" && -f "$WORK/authkey" ]]; then || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_ping' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'quack_query' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && grep -q 'ON CONFLICT' "$WORK/client_session.sql"; } \ - || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_quack_proxy' "$WORK/client_session.sql"; }; then + || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_quack_proxy' "$WORK/client_session.sql"; } \ + || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q "remote.${LAKE_NAME}.inventory" "$WORK/client_session.sql"; }; then refresh_client_sql "$AUTHKEY" echo "✓ client SQL ready — attach ${ATTACH_URI}" fi @@ -399,6 +434,8 @@ SQL write_server_quack_sql +write_server_ducklake_sql + refresh_client_sql "$AUTHKEY" echo "✓ Headscale authkey ready — attach URI ${ATTACH_URI}" diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index 60b07a1..ad7fbef 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -23,7 +23,7 @@ fi ensure_quack() { local ext_dir="${DUCKDB_EXTENSION_DIRECTORY:-$(quacktail_ext_container_dir)}" export DUCKDB_EXTENSION_DIRECTORY="$ext_dir" - quacktail_ci_ensure_quack "$DUCKDB" "$ext_dir" load_only + quacktail_ci_ensure_demo_extensions "$DUCKDB" "$ext_dir" load_only } quacktail_sql_extension_directory() { @@ -105,13 +105,20 @@ ensure_server_hosts_mapping() { run_server() { maybe_compose_bootstrap if [[ -f "${WORK}/authkey" ]] && [[ -x /usr/local/bin/quacktail-compose-bootstrap.sh ]]; then - COMPOSE_REFRESH_SERVER_QUACK=1 QUACKTAIL_AUTO_BOOTSTRAP=1 /usr/local/bin/quacktail-compose-bootstrap.sh + COMPOSE_REFRESH_SERVER_QUACK=1 COMPOSE_REFRESH_SERVER_DUCKLAKE=1 QUACKTAIL_AUTO_BOOTSTRAP=1 \ + /usr/local/bin/quacktail-compose-bootstrap.sh fi ensure_quack rm -f "${WORK}/quack_ready" - cat "${WORK}/server_setup.sql" "${WORK}/server_quack.sql" >"$INIT_SQL" + { + cat "${WORK}/server_setup.sql" + if [[ -f "${WORK}/server_ducklake.sql" ]]; then + cat "${WORK}/server_ducklake.sql" + fi + cat "${WORK}/server_quack.sql" + } >"$INIT_SQL" if [[ "$QUIET" == "1" ]]; then - echo "→ quacktail-server: join tailnet + quack_serve(127.0.0.1:${PORT}) + tailscale_serve_local" + echo "→ quacktail-server: tailnet + ducklake + quack_serve(127.0.0.1:${PORT}) + tailscale_serve_local" echo " (libtailscale logs → ${WORK}/server.log)" else echo "=== server init SQL ===" diff --git a/scripts/lib/quacktail_ext.sh b/scripts/lib/quacktail_ext.sh index 8aa7570..d9a53c1 100755 --- a/scripts/lib/quacktail_ext.sh +++ b/scripts/lib/quacktail_ext.sh @@ -74,6 +74,60 @@ quacktail_ci_ensure_quack() { "${set_ext} LOAD quack; SELECT extension_name, loaded, install_path FROM duckdb_extensions() WHERE extension_name='quack';" } +# Install/load ducklake (core, then core_nightly). +quacktail_ci_ensure_ducklake() { + local duckdb_bin="${1:?duckdb binary}" + local ext_dir="${2:-}" + local mode="${3:-install}" + + if [[ -z "$ext_dir" ]]; then + ext_dir="$(quacktail_ext_container_dir)" + fi + mkdir -p "$ext_dir" + + local set_ext + set_ext="$(quacktail_ext_sql_set "$ext_dir")" + + if [[ "$mode" == "load_only" ]]; then + if ! "$duckdb_bin" :memory: -batch -c "${set_ext} LOAD ducklake; SELECT 1;" >/dev/null; then + echo "error: ducklake not available at ${ext_dir}" >&2 + return 1 + fi + elif ! "$duckdb_bin" :memory: -batch -c "${set_ext} LOAD ducklake; SELECT 1;" >/dev/null; then + echo "Installing ducklake (core, then core_nightly) into ${ext_dir} ..." + if ! "$duckdb_bin" :memory: -batch -c "${set_ext} INSTALL ducklake FROM core; LOAD ducklake; SELECT 1;"; then + "$duckdb_bin" :memory: -batch -c "${set_ext} INSTALL ducklake FROM core_nightly; LOAD ducklake; SELECT 1;" + fi + fi + + if [[ "${QUACKTAIL_QUIET:-}" == "1" ]]; then + return 0 + fi + + echo "=== ducklake extension (${mode}) ===" + "$duckdb_bin" :memory: -batch -echo -c \ + "${set_ext} LOAD ducklake; SELECT extension_name, loaded, install_path FROM duckdb_extensions() WHERE extension_name='ducklake';" +} + +quacktail_ci_ensure_demo_extensions() { + local duckdb_bin="${1:?duckdb binary}" + local ext_dir="${2:-}" + local mode="${3:-install}" + quacktail_ci_ensure_quack "$duckdb_bin" "$ext_dir" "$mode" + if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-1}" == "1" ]]; then + quacktail_ci_ensure_ducklake "$duckdb_bin" "$ext_dir" "$mode" + fi +} + +quacktail_ext_sql_load_demo() { + local ext_dir="${1:?extension directory required}" + echo "$(quacktail_ext_sql_set "$ext_dir")" + echo "LOAD quack;" + if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-1}" == "1" ]]; then + echo "LOAD ducklake;" + fi +} + # Server init finished: explicit marker and/or quack_serve + tailscale_serve_local output in server.log. quacktail_server_log_ready() { local log="${1:?server.log path required}" From 782c25e1042538fc783bf21dbf1ef0af5a2365c7 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 07:45:12 +0200 Subject: [PATCH 02/25] Persist DuckLake data in dedicated ducklake-lake compose volume. Mount /var/lib/ducklake on quacktail-server only; seed inventory on first boot and attach-only on restart so stop/start keeps Parquet and metadata. Co-authored-by: Cursor --- docs/DUCKLAKE_TAILNET.md | 2 +- examples/.env.example | 2 +- examples/docker-compose.yml | 10 ++++++-- examples/ducklake/README.md | 30 +++++++++++++++++----- scripts/e2e/quacktail-compose-bootstrap.sh | 21 +++++++++++---- 5 files changed, 50 insertions(+), 15 deletions(-) diff --git a/docs/DUCKLAKE_TAILNET.md b/docs/DUCKLAKE_TAILNET.md index efccc4d..4c80954 100644 --- a/docs/DUCKLAKE_TAILNET.md +++ b/docs/DUCKLAKE_TAILNET.md @@ -19,7 +19,7 @@ Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Heads │ quacktail-client │ ◄──────────────► │ quacktail-server │ │ ATTACH quack:… │ │ ATTACH ducklake:… │ │ remote.lake.* │ │ quack_serve │ -└─────────────────────┘ │ Parquet → /work/lake/data +└─────────────────────┘ │ Parquet → /var/lib/ducklake (volume ducklake-lake) └─────────────────────┘ ``` diff --git a/examples/.env.example b/examples/.env.example index 0791504..a5ccbb6 100644 --- a/examples/.env.example +++ b/examples/.env.example @@ -9,7 +9,7 @@ BUILD_FROM_SOURCE=1 QUACKTAIL_RELEASE_TAG=v1.0.2 QUACKTAIL_ENABLE_DUCKLAKE=1 QUACKTAIL_LAKE_NAME=lake -QUACKTAIL_LAKE_DATA_PATH=/work/lake/data +QUACKTAIL_LAKE_DATA_PATH=/var/lib/ducklake/data # Headscale preauth key (generated at bootstrap — NOT the Quack token above): # docker compose exec -T quacktail-server cat /work/authkey diff --git a/examples/docker-compose.yml b/examples/docker-compose.yml index 10d070f..86d7c84 100644 --- a/examples/docker-compose.yml +++ b/examples/docker-compose.yml @@ -31,8 +31,8 @@ x-env: &env DUCKDB_EXTENSION_DIRECTORY: /duckdb_extensions QUACKTAIL_ENABLE_DUCKLAKE: ${QUACKTAIL_ENABLE_DUCKLAKE:-1} QUACKTAIL_LAKE_NAME: ${QUACKTAIL_LAKE_NAME:-lake} - QUACKTAIL_LAKE_METADATA: ${QUACKTAIL_LAKE_METADATA:-/work/lake/metadata/inventory.ducklake} - QUACKTAIL_LAKE_DATA_PATH: ${QUACKTAIL_LAKE_DATA_PATH:-/work/lake/data} + QUACKTAIL_LAKE_METADATA: ${QUACKTAIL_LAKE_METADATA:-/var/lib/ducklake/metadata/inventory.ducklake} + QUACKTAIL_LAKE_DATA_PATH: ${QUACKTAIL_LAKE_DATA_PATH:-/var/lib/ducklake/data} x-headscale-volumes: &headscale_volumes configs: @@ -135,6 +135,11 @@ services: depends_on: headscale: condition: service_healthy + volumes: + - quacktail-work:/work + - ducklake-lake:/var/lib/ducklake + - headscale-data:/var/lib/headscale + - headscale-run:/var/run/headscale environment: <<: *env QUACKTAIL_WORK: /work @@ -189,6 +194,7 @@ volumes: headscale-data: headscale-run: quacktail-work: + ducklake-lake: networks: quacktail: diff --git a/examples/ducklake/README.md b/examples/ducklake/README.md index 886e324..2b16e66 100644 --- a/examples/ducklake/README.md +++ b/examples/ducklake/README.md @@ -9,12 +9,29 @@ quacktail-server quacktail-client ───────────────── ───────────────── tailscale_up tailscale_up ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward - └─ /work/lake/data/*.parquet ATTACH quack:127.0.0.1:19494 AS remote -quack_serve(127.0.0.1:9494) SELECT * FROM remote.lake.inventory -tailscale_serve_local + └─ ducklake-lake volume (/var/lib/ducklake/data/*.parquet) +quack_serve(127.0.0.1:9494) ATTACH quack:127.0.0.1:19494 AS remote +tailscale_serve_local SELECT * FROM remote.lake.inventory ``` -Parquet files live on the **server** (`QUACKTAIL_LAKE_DATA_PATH`). Clients reach the catalog over Quack — no direct file sharing. +Parquet + metadata live on a **named Docker volume** (`ducklake-lake` → `/var/lib/ducklake`). Survives `docker compose stop` / `down` (without `-v`). + +## Persistence + +| Action | DuckLake data | +|--------|----------------| +| `docker compose stop` / `start` | **Kept** — same inventory rows | +| `docker compose down` (no `-v`) | **Kept** | +| `docker compose down -v` | **Wiped** — re-seeds on next first boot | + +First boot creates metadata + demo rows `(101,50)`, `(102,120)`. Restarts **attach only** — no `DELETE` / re-seed. + +```bash +# stop stack, start again — data should remain +docker compose stop +docker compose up -d headscale quacktail-server +docker compose --profile test run --rm quacktail-client +``` ## Run the demo @@ -37,8 +54,9 @@ Set `QUACKTAIL_ENABLE_DUCKLAKE=0` to run the original Quack-only e2e. |----------|---------| | `QUACKTAIL_ENABLE_DUCKLAKE` | `1` | | `QUACKTAIL_LAKE_NAME` | `lake` | -| `QUACKTAIL_LAKE_METADATA` | `/work/lake/metadata/inventory.ducklake` | -| `QUACKTAIL_LAKE_DATA_PATH` | `/work/lake/data` | +| `QUACKTAIL_LAKE_METADATA` | `/var/lib/ducklake/metadata/inventory.ducklake` | +| `QUACKTAIL_LAKE_DATA_PATH` | `/var/lib/ducklake/data` | +| Docker volume | `ducklake-lake` → `/var/lib/ducklake` (server only) | ## Local SQL reference diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index 86d6a00..1225024 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -9,8 +9,8 @@ QUACK_PORT="${QUACK_PORT:-9494}" QUACK_FORWARD_LOCAL_PORT="${QUACK_FORWARD_LOCAL_PORT:-19494}" QUACK_TOKEN="${QUACK_TAILNET_TOKEN:-quackscale-demo-token}" LAKE_NAME="${QUACKTAIL_LAKE_NAME:-lake}" -LAKE_METADATA="${QUACKTAIL_LAKE_METADATA:-${WORK}/lake/metadata/inventory.ducklake}" -LAKE_DATA_PATH="${QUACKTAIL_LAKE_DATA_PATH:-${WORK}/lake/data}" +LAKE_METADATA="${QUACKTAIL_LAKE_METADATA:-/var/lib/ducklake/metadata/inventory.ducklake}" +LAKE_DATA_PATH="${QUACKTAIL_LAKE_DATA_PATH:-/var/lib/ducklake/data}" ENABLE_DUCKLAKE="${QUACKTAIL_ENABLE_DUCKLAKE:-1}" CONTROL_URL="${HEADSCALE_CONTROL_URL:-http://headscale:8080}" HS_USER="${HEADSCALE_USER:-quackscale-demo}" @@ -81,7 +81,18 @@ SQL write_server_ducklake_sql() { [[ "$ENABLE_DUCKLAKE" == "1" ]] || return 0 mkdir -p "$(dirname "$LAKE_METADATA")" "$LAKE_DATA_PATH" - cat >"$WORK/server_ducklake.sql" <"$WORK/server_ducklake.sql" <"$WORK/server_ducklake.sql" </dev/null; }; then + && ! grep -q 'quacktail: lake-' "$WORK/server_ducklake.sql" 2>/dev/null; }; then write_server_ducklake_sql echo "✓ server ducklake SQL ready — ${LAKE_NAME} @ ${LAKE_METADATA}" fi From f9a83504e56f7aef31a7f9e38a10d1dd8e7b2aad Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 07:49:05 +0200 Subject: [PATCH 03/25] Fix DuckLake client SQL quoting and make compose client honor CTRL-C. Use $'...' for lake inventory statements so DuckDB gets real newlines, and stop the client retry loop on SIGINT/SIGTERM instead of treating interrupts as transient failures. Co-authored-by: Cursor --- scripts/e2e/quacktail-compose-bootstrap.sh | 4 +- scripts/e2e/quacktail-entrypoint.sh | 61 +++++++++++++++++----- 2 files changed, 50 insertions(+), 15 deletions(-) diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index 1225024..648fb38 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -154,8 +154,8 @@ write_client_session_sql() { fi if [[ "$ENABLE_DUCKLAKE" == "1" ]]; then lake_load=$'LOAD ducklake;\n' - lake_select=$"SELECT * FROM remote.${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5;\n" - lake_passed_col=$",\n (SELECT COUNT(*)::INTEGER FROM remote.${LAKE_NAME}.inventory) AS inventory_rows" + lake_select=$'SELECT * FROM remote.'"${LAKE_NAME}"$'.inventory ORDER BY item_id LIMIT 5;\n' + lake_passed_col=$',\n (SELECT COUNT(*)::INTEGER FROM remote.'"${LAKE_NAME}"$'.inventory) AS inventory_rows' fi cat >"$WORK/client_session.sql" </dev/null; then + kill -INT "$QUACKTAIL_CLIENT_SESSION_PID" 2>/dev/null || true + wait "$QUACKTAIL_CLIENT_SESSION_PID" 2>/dev/null || true + fi + echo "Interrupted — stopping client demo" >&2 + exit "$rc" +} + run_duckdb_client_session() { local session_sql="${1:?session sql file}" local out="${2:?out file}" local demo_timeout="${3:?timeout}" local tsnet_log="${WORK}/client-tsnet.log" local ext_cmd duckdb_rc=0 + local timeout_cmd=(timeout --foreground "$demo_timeout") ext_cmd="$(quacktail_sql_extension_directory)" : >"$tsnet_log" # Same invocation as scripts/local_remote_headscale_test.sh (-f, no -bail, no -init file db). - set +o pipefail - if [[ "$QUIET" == "1" ]]; then - timeout "$demo_timeout" stdbuf -oL -eL "$DUCKDB" -batch -echo \ - -cmd "$ext_cmd" -f "$session_sql" \ - 2>>"$tsnet_log" | quacktail_filter_demo_stream | tee "$out" - else - timeout "$demo_timeout" stdbuf -oL -eL "$DUCKDB" -batch -echo \ - -cmd "$ext_cmd" -f "$session_sql" \ - 2>&1 | quacktail_filter_demo_stream | tee "$out" - fi - duckdb_rc=$? - set -o pipefail + # Subshell + background wait so SIGINT can kill the whole pipeline via trap. + ( + set +o pipefail + if [[ "$QUIET" == "1" ]]; then + "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ + -cmd "$ext_cmd" -f "$session_sql" \ + 2>>"$tsnet_log" | quacktail_filter_demo_stream | tee "$out" + else + "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ + -cmd "$ext_cmd" -f "$session_sql" \ + 2>&1 | quacktail_filter_demo_stream | tee "$out" + fi + ) & + QUACKTAIL_CLIENT_SESSION_PID=$! + wait "$QUACKTAIL_CLIENT_SESSION_PID" || duckdb_rc=$? + QUACKTAIL_CLIENT_SESSION_PID="" if [[ "$duckdb_rc" -eq 124 ]]; then echo "error: client demo timed out after ${demo_timeout}s" >&2 quacktail_dump_client_failure return 124 fi + if quacktail_is_signal_rc "$duckdb_rc"; then + return "$duckdb_rc" + fi return "$duckdb_rc" } @@ -215,6 +244,9 @@ run_client() { local duckdb_rc=0 local attempt + trap 'quacktail_client_on_signal INT' INT + trap 'quacktail_client_on_signal TERM' TERM + wait_for_tailnet_server ensure_quack ensure_server_hosts_mapping @@ -240,13 +272,16 @@ run_client() { duckdb_rc=0 run_duckdb_client_session "$session_sql" "$out" "$demo_timeout" \ || duckdb_rc=$? + if quacktail_is_signal_rc "$duckdb_rc"; then + exit "$duckdb_rc" + fi if [[ "$duckdb_rc" -eq 0 ]] && grep -q "PASSED" "$out" 2>/dev/null; then break fi if (( attempt < max_attempts )); then [[ "$QUIET" == "1" ]] && echo "→ retry ${attempt}/${max_attempts} ..." quacktail_dump_client_failure - sleep "$poll_sec" + sleep "$poll_sec" || exit 130 fi done From 1de40984a9a7549618815c5ee69e88422c79b625 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:00:28 +0200 Subject: [PATCH 04/25] Use ducklake:quack attach for tailnet DuckLake queries. Plain quack ATTACH only exposes the primary catalog; switch the client to quack_discover plus ducklake:quack with shared Parquet DATA_PATH. Co-authored-by: Cursor --- docs/DUCKLAKE_TAILNET.md | 49 ++++++++++++++---- examples/docker-compose.yml | 5 ++ examples/ducklake/README.md | 60 ++++++++++------------ examples/ducklake/local-demo.sql | 9 ++-- scripts/e2e/quacktail-compose-bootstrap.sh | 28 +++++++--- scripts/e2e/quacktail-entrypoint.sh | 16 +++++- 6 files changed, 109 insertions(+), 58 deletions(-) diff --git a/docs/DUCKLAKE_TAILNET.md b/docs/DUCKLAKE_TAILNET.md index 4c80954..913e120 100644 --- a/docs/DUCKLAKE_TAILNET.md +++ b/docs/DUCKLAKE_TAILNET.md @@ -1,15 +1,15 @@ # DuckLake over QuackTail -Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Headscale tailnet**, with discovery similar to `quack_discover()`. +Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Headscale tailnet**, with discovery via `quack_discover()` and queries via the official **`ducklake:quack:`** attach pattern. ## Status on branch `ducklake` | Piece | Status | |-------|--------| | Server: local DuckLake + `quack_serve` + `tailscale_serve_local` | **Done** (compose bootstrap) | -| Client: `tailscale_quack_forward` + `remote.lake.*` queries | **Done** (compose e2e) | +| Client: `quack_discover()` + `ducklake:quack:` attach + inventory query | **Done** (compose e2e) | | `ducklake_discover()` / enriched `quack_discover` | TBD | -| Client `ducklake:quack:` attach with client-side `DATA_PATH` | Documented, not in compose e2e | +| Client `DATA_PATH` on object storage (no shared volume) | TBD (use `s3://…` per DuckLake docs) | | CI DuckLake profile | TBD | ## Architecture @@ -17,19 +17,46 @@ Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Heads ```text ┌─────────────────────┐ tailnet ┌─────────────────────┐ │ quacktail-client │ ◄──────────────► │ quacktail-server │ -│ ATTACH quack:… │ │ ATTACH ducklake:… │ -│ remote.lake.* │ │ quack_serve │ -└─────────────────────┘ │ Parquet → /var/lib/ducklake (volume ducklake-lake) - └─────────────────────┘ +│ quack_discover() │ │ ATTACH ducklake:… │ +│ ducklake:quack:… │ │ quack_serve │ +│ + ro Parquet vol │ │ Parquet → ducklake-lake +└─────────────────────┘ └─────────────────────┘ ``` -1. **Server** joins tailnet, attaches DuckLake (`lake` catalog, local metadata + Parquet), runs `quack_serve` + `tailscale_serve_local`. -2. **Client** joins tailnet, `tailscale_quack_forward`, `ATTACH`es Quack at `127.0.0.1:19494`, queries `remote.lake.inventory`. +1. **Server** joins tailnet, attaches DuckLake (`lake` catalog, metadata + Parquet on `ducklake-lake`), runs `quack_serve` + `tailscale_serve_local`. +2. **Client** joins tailnet, `CALL quack_discover()` to find Quack URIs, `tailscale_quack_forward`, then **`ATTACH 'ducklake:quack:127.0.0.1:19494' AS lake (DATA_PATH '…')`** and queries `lake.inventory`. + +## Why not `remote.lake.*`? + +`ATTACH 'quack:…' AS remote` exposes the server's **primary** DuckDB catalog only (`remote.e2e_payload` works). Nested attached databases (the server's local `lake` DuckLake catalog) are **not** visible as `remote.lake.table`. + +DuckDB **v1.5.3** added the supported pattern: use the remote Quack server as the DuckLake **catalog database** ([announcement](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)): + +```sql +-- Server +CALL quack_serve('quack:127.0.0.1:9494', token => '…'); + +-- Client +LOAD ducklake; +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); +ATTACH 'ducklake:quack:127.0.0.1:19494' AS lake (DATA_PATH '/var/lib/ducklake/data'); +SELECT * FROM lake.inventory; +``` + +Catalog metadata flows over Quack; **`DATA_PATH` must still resolve to the Parquet files** (shared volume in compose, or `s3://` / `https://` in production — see [DuckLake remote data path](https://duckdb.org/docs/stable/duckdb/guides/using_a_remote_data_path)). + +## Discovery + +| What | How | +|------|-----| +| Find Quack servers on tailnet | `FROM quack_discover();` (after `tailscale_up`) | +| Connect | `tailscale_quack_forward` → `ducklake:quack:127.0.0.1:` | +| DuckLake-specific discovery | TBD (`ducklake_discover()` enriching `quack_discover`) | ## Constraints -- **Quack streaming-scan limit** — one remote read or write per SQL statement; see [QUACK_STREAMING.md](QUACK_STREAMING.md). -- **Nested catalogs** — `remote.lake.table` works when the server attached `lake` before `quack_serve`. Deep paths may need `quack_query()` until upstream nested-catalog support lands. +- **Quack streaming-scan limit** — one remote Quack read/write per SQL statement; see [QUACK_STREAMING.md](QUACK_STREAMING.md). DuckLake attach is separate from plain `quack:` attach. +- **Parquet path** — client `DATA_PATH` must match where files live (compose: read-only mount of `ducklake-lake` at the same path as the server). ## Demo diff --git a/examples/docker-compose.yml b/examples/docker-compose.yml index 86d7c84..5c96afe 100644 --- a/examples/docker-compose.yml +++ b/examples/docker-compose.yml @@ -163,6 +163,11 @@ services: depends_on: quacktail-server: condition: service_healthy + volumes: + - quacktail-work:/work + - ducklake-lake:/var/lib/ducklake:ro + - headscale-data:/var/lib/headscale + - headscale-run:/var/run/headscale environment: <<: *env QUACKTAIL_WORK: /work diff --git a/examples/ducklake/README.md b/examples/ducklake/README.md index 2b16e66..6de38f7 100644 --- a/examples/ducklake/README.md +++ b/examples/ducklake/README.md @@ -1,6 +1,6 @@ # DuckLake + Quack on QuackTail -Branch **`ducklake`** extends the compose demo: the server attaches a local DuckLake catalog, seeds an `inventory` table, then `quack_serve` exposes it on the tailnet. The client queries `remote.lake.inventory` through `tailscale_quack_forward`. +Branch **`ducklake`** extends the compose demo: the server attaches a local DuckLake catalog, seeds an `inventory` table, then `quack_serve` exposes it on the tailnet. The client **discovers** Quack endpoints with `quack_discover()`, then queries inventory via the official **`ducklake:quack:`** attach ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)). ## Architecture @@ -8,13 +8,18 @@ Branch **`ducklake`** extends the compose demo: the server attaches a local Duck quacktail-server quacktail-client ───────────────── ───────────────── tailscale_up tailscale_up -ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward - └─ ducklake-lake volume (/var/lib/ducklake/data/*.parquet) -quack_serve(127.0.0.1:9494) ATTACH quack:127.0.0.1:19494 AS remote -tailscale_serve_local SELECT * FROM remote.lake.inventory +ATTACH ducklake:… AS lake (local Parquet) quack_discover() + └─ ducklake-lake volume tailscale_quack_forward +quack_serve(127.0.0.1:9494) ATTACH ducklake:quack:127.0.0.1:19494 +tailscale_serve_local └─ ro ducklake-lake (same DATA_PATH) + SELECT * FROM lake.inventory ``` -Parquet + metadata live on a **named Docker volume** (`ducklake-lake` → `/var/lib/ducklake`). Survives `docker compose stop` / `down` (without `-v`). +Parquet + metadata live on a **named Docker volume** (`ducklake-lake` → `/var/lib/ducklake`). The client mounts the same volume **read-only** so `DATA_PATH` resolves to the server's Parquet files. + +## Why not `remote.lake.inventory`? + +Plain `ATTACH 'quack:…' AS remote` only exposes the server's primary catalog (`remote.e2e_payload` works). The server's attached DuckLake catalog is **not** nested under `remote.lake.*`. Use **`ducklake:quack:`** instead — Quack carries catalog metadata; `DATA_PATH` points at Parquet. ## Persistence @@ -26,17 +31,8 @@ Parquet + metadata live on a **named Docker volume** (`ducklake-lake` → `/var/ First boot creates metadata + demo rows `(101,50)`, `(102,120)`. Restarts **attach only** — no `DELETE` / re-seed. -```bash -# stop stack, start again — data should remain -docker compose stop -docker compose up -d headscale quacktail-server -docker compose --profile test run --rm quacktail-client -``` - ## Run the demo -Same commands as the base compose demo (on this branch): - ```bash cd examples docker compose build quacktail-server quacktail-client @@ -44,7 +40,7 @@ docker compose up -d --force-recreate headscale quacktail-server docker compose --profile test run --rm quacktail-client ``` -Expect `PASSED` with `inventory_rows = 2` and two inventory rows `(101, 50)`, `(102, 120)`. +Expect `PASSED` (Quack e2e) and `LAKE_PASSED` with `inventory_rows = 2`. Set `QUACKTAIL_ENABLE_DUCKLAKE=0` to run the original Quack-only e2e. @@ -56,31 +52,29 @@ Set `QUACKTAIL_ENABLE_DUCKLAKE=0` to run the original Quack-only e2e. | `QUACKTAIL_LAKE_NAME` | `lake` | | `QUACKTAIL_LAKE_METADATA` | `/var/lib/ducklake/metadata/inventory.ducklake` | | `QUACKTAIL_LAKE_DATA_PATH` | `/var/lib/ducklake/data` | -| Docker volume | `ducklake-lake` → `/var/lib/ducklake` (server only) | - -## Local SQL reference +| Docker volume | `ducklake-lake` → `/var/lib/ducklake` (server rw, client ro) | -See [local-demo.sql](local-demo.sql) for the standalone DuckLake + Quack pattern (single host, no tailnet). +## Tailnet client SQL -**Tailnet client** (after `tailscale_quack_forward`): +After `tailscale_up` and `tailscale_quack_forward`: ```sql +LOAD quackscale; +FROM quack_discover(); + LOAD quack; +LOAD ducklake; CREATE SECRET (TYPE quack, TOKEN 'quackscale-demo-token', SCOPE 'quack:127.0.0.1:19494'); -ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); -SELECT * FROM remote.lake.inventory; -``` -**Direct DuckLake-over-Quack attach** (metadata via Quack URI — optional pattern): +-- Quack e2e table (primary catalog) +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); +SELECT * FROM remote.e2e_payload; -```sql -LOAD ducklake; -LOAD quack; -CREATE SECRET (TYPE quack, TOKEN 'your_token', SCOPE 'quack:127.0.0.1:19494'); -ATTACH 'ducklake:quack:127.0.0.1:19494' AS my_lake (DATA_PATH '/path/to/local/parquet/'); -USE my_lake; +-- DuckLake over Quack (catalog via Quack, Parquet via DATA_PATH) +ATTACH 'ducklake:quack:127.0.0.1:19494' AS lake (DATA_PATH '/var/lib/ducklake/data'); +SELECT * FROM lake.inventory; ``` -Use this when Parquet files are local to the client; the compose demo uses server-side storage and `remote.lake.*` instead. +For production tailnets without a shared volume, use a remote `DATA_PATH` (`s3://…`, `https://…`) per [DuckLake docs](https://duckdb.org/docs/stable/duckdb/guides/using_a_remote_data_path). -See [docs/DUCKLAKE_TAILNET.md](../docs/DUCKLAKE_TAILNET.md) for roadmap (discovery, CI profile). +See [local-demo.sql](local-demo.sql) for single-host reference and [docs/DUCKLAKE_TAILNET.md](../docs/DUCKLAKE_TAILNET.md) for roadmap. diff --git a/examples/ducklake/local-demo.sql b/examples/ducklake/local-demo.sql index ea84c92..1f3fef9 100644 --- a/examples/ducklake/local-demo.sql +++ b/examples/ducklake/local-demo.sql @@ -32,10 +32,9 @@ CALL quack_serve( -- -- CREATE SECRET (TYPE quack, TOKEN 'quackscale-demo-token', SCOPE 'quack:127.0.0.1:9494'); -- --- Option A: query lake tables via Quack attach +-- Option A: query lake tables via DuckLake-over-Quack (catalog over Quack, Parquet via DATA_PATH) -- ATTACH 'quack:127.0.0.1:9494' AS remote (TYPE quack); --- SELECT * FROM remote.lake.inventory; +-- ATTACH 'ducklake:quack:127.0.0.1:9494' AS lake (DATA_PATH './lake/data/'); +-- SELECT * FROM lake.inventory; -- --- Option B: DuckLake metadata via Quack URI (local Parquet path on client) --- ATTACH 'ducklake:quack:127.0.0.1:9494' AS my_lake (DATA_PATH './lake/data/'); --- SELECT * FROM my_lake.inventory; +-- Do NOT use remote.lake.inventory — plain quack attach does not expose nested DuckLake catalogs. diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index 648fb38..f18f030 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -78,6 +78,15 @@ SQL fi } +compose_sql_attach_ducklake() { + local attach_uri="${1:?attach uri required}" + local lake_name="${2:?lake name required}" + local data_path="${3:?data path required}" + cat <"$WORK/client_session.sql" </dev/null; then - break + if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" != "1" ]] || grep -q "LAKE_PASSED" "$out" 2>/dev/null; then + break + fi fi if (( attempt < max_attempts )); then [[ "$QUIET" == "1" ]] && echo "→ retry ${attempt}/${max_attempts} ..." @@ -297,8 +299,18 @@ run_client() { exit 1 fi + if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" ]] && ! grep -q "LAKE_PASSED" "$out" 2>/dev/null; then + echo "error: expected LAKE_PASSED row missing (DuckLake inventory query failed)" >&2 + quacktail_dump_client_failure + exit 1 + fi + if [[ "$QUIET" == "1" ]]; then - echo "✓ Demo passed — two-node QuackTail cluster is working" + if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" ]]; then + echo "✓ Demo passed — QuackTail cluster + DuckLake over tailnet" + else + echo "✓ Demo passed — two-node QuackTail cluster is working" + fi else echo "ok: client e2e passed (PASSED row present)" fi From f89921e9650a2074086ce168b88288e01eaaf597 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:06:04 +0200 Subject: [PATCH 05/25] Fix client hang: foreground DuckDB pipeline and quack_query for DuckLake. Avoid background pipe deadlock; discover and query lake tables via quack_query on the server instead of client ducklake:quack attach with mismatched paths. Co-authored-by: Cursor --- docs/DUCKLAKE_TAILNET.md | 88 +++++++++++----------- examples/docker-compose.yml | 5 -- examples/ducklake/README.md | 65 +++++----------- scripts/e2e/quacktail-compose-bootstrap.sh | 41 +++++++--- scripts/e2e/quacktail-entrypoint.sh | 40 ++++------ 5 files changed, 102 insertions(+), 137 deletions(-) diff --git a/docs/DUCKLAKE_TAILNET.md b/docs/DUCKLAKE_TAILNET.md index 913e120..238a54b 100644 --- a/docs/DUCKLAKE_TAILNET.md +++ b/docs/DUCKLAKE_TAILNET.md @@ -1,72 +1,68 @@ # DuckLake over QuackTail -Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Headscale tailnet**, with discovery via `quack_discover()` and queries via the official **`ducklake:quack:`** attach pattern. +Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Headscale tailnet** — **find** endpoints and **query** tables from any tailnet client. ## Status on branch `ducklake` | Piece | Status | |-------|--------| -| Server: local DuckLake + `quack_serve` + `tailscale_serve_local` | **Done** (compose bootstrap) | -| Client: `quack_discover()` + `ducklake:quack:` attach + inventory query | **Done** (compose e2e) | +| Server: local DuckLake + `quack_serve` + `tailscale_serve_local` | **Done** | +| Client: `quack_query` → `quack_discover()` + `lake.inventory` | **Done** | +| Client `ducklake:quack:` attach (client-side `DATA_PATH`) | Documented — use when Parquet is local/shared | | `ducklake_discover()` / enriched `quack_discover` | TBD | -| Client `DATA_PATH` on object storage (no shared volume) | TBD (use `s3://…` per DuckLake docs) | -| CI DuckLake profile | TBD | -## Architecture +## Find + query on tailnet -```text -┌─────────────────────┐ tailnet ┌─────────────────────┐ -│ quacktail-client │ ◄──────────────► │ quacktail-server │ -│ quack_discover() │ │ ATTACH ducklake:… │ -│ ducklake:quack:… │ │ quack_serve │ -│ + ro Parquet vol │ │ Parquet → ducklake-lake -└─────────────────────┘ └─────────────────────┘ -``` +### 1) Find Quack / DuckLake servers -1. **Server** joins tailnet, attaches DuckLake (`lake` catalog, metadata + Parquet on `ducklake-lake`), runs `quack_serve` + `tailscale_serve_local`. -2. **Client** joins tailnet, `CALL quack_discover()` to find Quack URIs, `tailscale_quack_forward`, then **`ATTACH 'ducklake:quack:127.0.0.1:19494' AS lake (DATA_PATH '…')`** and queries `lake.inventory`. +`FROM quack_discover()` on **this** node lists **local** tailnet URIs only. To discover a **remote** server's endpoints, run discover **on the server** via Quack: -## Why not `remote.lake.*`? +```sql +CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); -`ATTACH 'quack:…' AS remote` exposes the server's **primary** DuckDB catalog only (`remote.e2e_payload` works). Nested attached databases (the server's local `lake` DuckLake catalog) are **not** visible as `remote.lake.table`. +FROM quack_query( + 'quack:127.0.0.1:19494', + 'FROM quack_discover()', + token => '…', + disable_ssl => true +); +-- → quack:quacktail-server:9494, quack:100.64.x.x:9494, … +``` -DuckDB **v1.5.3** added the supported pattern: use the remote Quack server as the DuckLake **catalog database** ([announcement](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)): +### 2) Query DuckLake tables -```sql --- Server -CALL quack_serve('quack:127.0.0.1:9494', token => '…'); +When the server owns DuckLake metadata + Parquet (compose demo), run lake SQL **on the server** through `quack_query`: --- Client -LOAD ducklake; -CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); -ATTACH 'ducklake:quack:127.0.0.1:19494' AS lake (DATA_PATH '/var/lib/ducklake/data'); -SELECT * FROM lake.inventory; +```sql +FROM quack_query( + 'quack:127.0.0.1:19494', + 'SELECT * FROM lake.inventory ORDER BY item_id', + token => '…', + disable_ssl => true +); ``` -Catalog metadata flows over Quack; **`DATA_PATH` must still resolve to the Parquet files** (shared volume in compose, or `s3://` / `https://` in production — see [DuckLake remote data path](https://duckdb.org/docs/stable/duckdb/guides/using_a_remote_data_path)). +**Why not `remote.lake.inventory`?** Plain `ATTACH 'quack:…' AS remote` exposes the primary catalog only — not nested attached DuckLake catalogs. -## Discovery +**Why not `ducklake:quack:` in compose?** That pattern ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)) uses Quack as the catalog DB and requires client `DATA_PATH` to resolve Parquet. Our demo stores Parquet on the **server volume** only; `ducklake:quack:` attach can block when paths don't align. Use it when the client has a local or object-store `DATA_PATH` (`s3://…`). -| What | How | -|------|-----| -| Find Quack servers on tailnet | `FROM quack_discover();` (after `tailscale_up`) | -| Connect | `tailscale_quack_forward` → `ducklake:quack:127.0.0.1:` | -| DuckLake-specific discovery | TBD (`ducklake_discover()` enriching `quack_discover`) | +## Architecture + +```text +┌─────────────────────┐ tailnet ┌─────────────────────┐ +│ quacktail-client │ ◄──────────────► │ quacktail-server │ +│ quack_query(…) │ │ ATTACH ducklake:… │ +│ (find + lake SQL) │ │ quack_serve │ +└─────────────────────┘ │ ducklake-lake vol │ + └─────────────────────┘ +``` ## Constraints -- **Quack streaming-scan limit** — one remote Quack read/write per SQL statement; see [QUACK_STREAMING.md](QUACK_STREAMING.md). DuckLake attach is separate from plain `quack:` attach. -- **Parquet path** — client `DATA_PATH` must match where files live (compose: read-only mount of `ducklake-lake` at the same path as the server). +- **Quack streaming-scan limit** — one remote Quack read/write per SQL statement; see [QUACK_STREAMING.md](QUACK_STREAMING.md). Each `quack_query` call is one statement. +- **Discovery** — remote discover = `quack_query(..., 'FROM quack_discover()')` until `ducklake_discover()` lands. ## Demo -See [examples/ducklake/README.md](../examples/ducklake/README.md). - -## QuackScale changes (not in core `quack`) - -| Piece | Owner | Notes | -|-------|--------|------| -| Tailnet join, `tailscale_quack_forward` | quackscale | Done | -| Compose DuckLake server bootstrap | quackscale | Done on `ducklake` branch | -| `ducklake_discover()` or enriched `quack_discover` | quackscale | TBD | -| Quack multi-scan planner | duckdb-quack | Upstream | +[examples/ducklake/README.md](../examples/ducklake/README.md) diff --git a/examples/docker-compose.yml b/examples/docker-compose.yml index 5c96afe..86d7c84 100644 --- a/examples/docker-compose.yml +++ b/examples/docker-compose.yml @@ -163,11 +163,6 @@ services: depends_on: quacktail-server: condition: service_healthy - volumes: - - quacktail-work:/work - - ducklake-lake:/var/lib/ducklake:ro - - headscale-data:/var/lib/headscale - - headscale-run:/var/run/headscale environment: <<: *env QUACKTAIL_WORK: /work diff --git a/examples/ducklake/README.md b/examples/ducklake/README.md index 6de38f7..d976a00 100644 --- a/examples/ducklake/README.md +++ b/examples/ducklake/README.md @@ -1,6 +1,6 @@ # DuckLake + Quack on QuackTail -Branch **`ducklake`** extends the compose demo: the server attaches a local DuckLake catalog, seeds an `inventory` table, then `quack_serve` exposes it on the tailnet. The client **discovers** Quack endpoints with `quack_discover()`, then queries inventory via the official **`ducklake:quack:`** attach ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)). +Branch **`ducklake`** extends the compose demo: the server attaches a local DuckLake catalog, seeds an `inventory` table, then `quack_serve` exposes it on the tailnet. The client discovers and queries the lake **via `quack_query`** (SQL runs on the server where DuckLake is attached). ## Architecture @@ -8,28 +8,21 @@ Branch **`ducklake`** extends the compose demo: the server attaches a local Duck quacktail-server quacktail-client ───────────────── ───────────────── tailscale_up tailscale_up -ATTACH ducklake:… AS lake (local Parquet) quack_discover() - └─ ducklake-lake volume tailscale_quack_forward -quack_serve(127.0.0.1:9494) ATTACH ducklake:quack:127.0.0.1:19494 -tailscale_serve_local └─ ro ducklake-lake (same DATA_PATH) - SELECT * FROM lake.inventory +ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward + └─ ducklake-lake volume quack_query → quack_discover() (find) +quack_serve(127.0.0.1:9494) quack_query → lake.inventory (query) +tailscale_serve_local ATTACH quack:… AS remote (e2e) ``` -Parquet + metadata live on a **named Docker volume** (`ducklake-lake` → `/var/lib/ducklake`). The client mounts the same volume **read-only** so `DATA_PATH` resolves to the server's Parquet files. +Parquet + metadata live on **`ducklake-lake`** on the server only (`/var/lib/ducklake`). -## Why not `remote.lake.inventory`? +## Access patterns -Plain `ATTACH 'quack:…' AS remote` only exposes the server's primary catalog (`remote.e2e_payload` works). The server's attached DuckLake catalog is **not** nested under `remote.lake.*`. Use **`ducklake:quack:`** instead — Quack carries catalog metadata; `DATA_PATH` points at Parquet. - -## Persistence - -| Action | DuckLake data | -|--------|----------------| -| `docker compose stop` / `start` | **Kept** — same inventory rows | -| `docker compose down` (no `-v`) | **Kept** | -| `docker compose down -v` | **Wiped** — re-seeds on next first boot | - -First boot creates metadata + demo rows `(101,50)`, `(102,120)`. Restarts **attach only** — no `DELETE` / re-seed. +| Pattern | When to use | +|---------|-------------| +| **`quack_query(uri, '…')`** | Server owns DuckLake files (compose demo). Find + query without client-side Parquet. | +| **`ATTACH 'ducklake:quack:…' AS lake (DATA_PATH '…')`** | Client has local or shared Parquet path ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)). | +| **`ATTACH 'quack:…' AS remote`** | Primary catalog only (`remote.e2e_payload`). **Not** nested `remote.lake.*`. | ## Run the demo @@ -42,39 +35,17 @@ docker compose --profile test run --rm quacktail-client Expect `PASSED` (Quack e2e) and `LAKE_PASSED` with `inventory_rows = 2`. -Set `QUACKTAIL_ENABLE_DUCKLAKE=0` to run the original Quack-only e2e. - -## Environment - -| Variable | Default | -|----------|---------| -| `QUACKTAIL_ENABLE_DUCKLAKE` | `1` | -| `QUACKTAIL_LAKE_NAME` | `lake` | -| `QUACKTAIL_LAKE_METADATA` | `/var/lib/ducklake/metadata/inventory.ducklake` | -| `QUACKTAIL_LAKE_DATA_PATH` | `/var/lib/ducklake/data` | -| Docker volume | `ducklake-lake` → `/var/lib/ducklake` (server rw, client ro) | - ## Tailnet client SQL -After `tailscale_up` and `tailscale_quack_forward`: - ```sql -LOAD quackscale; -FROM quack_discover(); - -LOAD quack; -LOAD ducklake; +CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); CREATE SECRET (TYPE quack, TOKEN 'quackscale-demo-token', SCOPE 'quack:127.0.0.1:19494'); --- Quack e2e table (primary catalog) -ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); -SELECT * FROM remote.e2e_payload; +-- Find: run quack_discover on the server (not locally — local discover lists this node only) +FROM quack_query('quack:127.0.0.1:19494', 'FROM quack_discover()', token => '…', disable_ssl => true); --- DuckLake over Quack (catalog via Quack, Parquet via DATA_PATH) -ATTACH 'ducklake:quack:127.0.0.1:19494' AS lake (DATA_PATH '/var/lib/ducklake/data'); -SELECT * FROM lake.inventory; +-- Query: SQL executes on server where lake is attached +FROM quack_query('quack:127.0.0.1:19494', 'SELECT * FROM lake.inventory', token => '…', disable_ssl => true); ``` -For production tailnets without a shared volume, use a remote `DATA_PATH` (`s3://…`, `https://…`) per [DuckLake docs](https://duckdb.org/docs/stable/duckdb/guides/using_a_remote_data_path). - -See [local-demo.sql](local-demo.sql) for single-host reference and [docs/DUCKLAKE_TAILNET.md](../docs/DUCKLAKE_TAILNET.md) for roadmap. +See [docs/DUCKLAKE_TAILNET.md](../docs/DUCKLAKE_TAILNET.md) and [local-demo.sql](local-demo.sql). diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index f18f030..b2c9fd5 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -87,6 +87,26 @@ ATTACH 'ducklake:${attach_uri}' AS ${lake_name} (DATA_PATH '${data_path}'); SQL } +# Escape single quotes for SQL embedded in quack_query(..., '...'). +compose_sql_escape() { + printf "%s" "${1:?}" | sed "s/'/''/g" +} + +compose_sql_quack_query() { + local attach_uri="${1:?attach uri required}" + local sql="${2:?sql required}" + local escaped + escaped="$(compose_sql_escape "$sql")" + cat < '${QUACK_TOKEN}', + disable_ssl => true +); +SQL +} + write_server_ducklake_sql() { [[ "$ENABLE_DUCKLAKE" == "1" ]] || return 0 mkdir -p "$(dirname "$LAKE_METADATA")" "$LAKE_DATA_PATH" @@ -150,9 +170,7 @@ write_client_session_sql() { local attach_uri="${2:?attach uri required}" local ping_sql="" local forward_sql="" - local lake_load="" local lake_discover="" - local lake_attach="" local lake_select="" local lake_passed_sql="" if duckdb_has_quackscale_function tailscale_ping; then @@ -164,11 +182,9 @@ write_client_session_sql() { forward_sql="CALL tailscale_quack_proxy();" fi if [[ "$ENABLE_DUCKLAKE" == "1" ]]; then - lake_load=$'LOAD ducklake;\n' - lake_discover=$'FROM quack_discover();\n' - lake_attach="$(compose_sql_attach_ducklake "$attach_uri" "$LAKE_NAME" "$LAKE_DATA_PATH")" - lake_select=$'SELECT * FROM '"${LAKE_NAME}"$'.inventory ORDER BY item_id LIMIT 5;\n' - lake_passed_sql=$'SELECT\n '"'"'LAKE_PASSED'"'"' AS status,\n COUNT(*)::INTEGER AS inventory_rows\nFROM '"${LAKE_NAME}"$'.inventory;\n' + lake_discover="$(compose_sql_quack_query "$attach_uri" "FROM quack_discover();")" + lake_select="$(compose_sql_quack_query "$attach_uri" "SELECT * FROM ${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5")" + lake_passed_sql="$(compose_sql_quack_query "$attach_uri" "SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM ${LAKE_NAME}.inventory")" fi cat >"$WORK/client_session.sql" </dev/null; then - kill -INT "$QUACKTAIL_CLIENT_SESSION_PID" 2>/dev/null || true - wait "$QUACKTAIL_CLIENT_SESSION_PID" 2>/dev/null || true - fi echo "Interrupted — stopping client demo" >&2 - exit "$rc" + exit 130 } run_duckdb_client_session() { @@ -205,23 +196,18 @@ run_duckdb_client_session() { ext_cmd="$(quacktail_sql_extension_directory)" : >"$tsnet_log" - # Same invocation as scripts/local_remote_headscale_test.sh (-f, no -bail, no -init file db). - # Subshell + background wait so SIGINT can kill the whole pipeline via trap. - ( - set +o pipefail - if [[ "$QUIET" == "1" ]]; then - "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ - -cmd "$ext_cmd" -f "$session_sql" \ - 2>>"$tsnet_log" | quacktail_filter_demo_stream | tee "$out" - else - "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ - -cmd "$ext_cmd" -f "$session_sql" \ - 2>&1 | quacktail_filter_demo_stream | tee "$out" - fi - ) & - QUACKTAIL_CLIENT_SESSION_PID=$! - wait "$QUACKTAIL_CLIENT_SESSION_PID" || duckdb_rc=$? - QUACKTAIL_CLIENT_SESSION_PID="" + # Foreground pipeline (background subshell caused stdout backpressure / apparent hangs). + set +o pipefail + if [[ "$QUIET" == "1" ]]; then + "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ + -cmd "$ext_cmd" -f "$session_sql" \ + 2>>"$tsnet_log" | quacktail_filter_demo_stream | tee "$out" || duckdb_rc=$? + else + "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ + -cmd "$ext_cmd" -f "$session_sql" \ + 2>&1 | quacktail_filter_demo_stream | tee "$out" || duckdb_rc=$? + fi + set -o pipefail if [[ "$duckdb_rc" -eq 124 ]]; then echo "error: client demo timed out after ${demo_timeout}s" >&2 From dea410cd9ae48094b49279e254935374f09e1ecf Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:10:35 +0200 Subject: [PATCH 06/25] =?UTF-8?q?Drop=20remote=20quack=5Fdiscover=20quack?= =?UTF-8?q?=5Fquery=20=E2=80=94=20it=20hangs=20over=20Quack.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Probe lake via duckdb_databases() and run all quack_query lake checks before ATTACH to avoid nested remote session deadlocks. Co-authored-by: Cursor --- docs/DUCKLAKE_TAILNET.md | 16 ++++++++++------ examples/ducklake/README.md | 17 +++++++++++------ scripts/e2e/quacktail-compose-bootstrap.sh | 10 +++++----- 3 files changed, 26 insertions(+), 17 deletions(-) diff --git a/docs/DUCKLAKE_TAILNET.md b/docs/DUCKLAKE_TAILNET.md index 238a54b..a2123c0 100644 --- a/docs/DUCKLAKE_TAILNET.md +++ b/docs/DUCKLAKE_TAILNET.md @@ -7,29 +7,33 @@ Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Heads | Piece | Status | |-------|--------| | Server: local DuckLake + `quack_serve` + `tailscale_serve_local` | **Done** | -| Client: `quack_query` → `quack_discover()` + `lake.inventory` | **Done** | +| Client: tailscale forward + `quack_query` lake probe/query | **Done** | | Client `ducklake:quack:` attach (client-side `DATA_PATH`) | Documented — use when Parquet is local/shared | | `ducklake_discover()` / enriched `quack_discover` | TBD | ## Find + query on tailnet -### 1) Find Quack / DuckLake servers +### 1) Find the server and its DuckLake catalog -`FROM quack_discover()` on **this** node lists **local** tailnet URIs only. To discover a **remote** server's endpoints, run discover **on the server** via Quack: +On the client, **tailnet routing** finds the server; **`quack_discover()` must run locally** (do not invoke it via `quack_query` on the server — that hangs over Quack): ```sql CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); +CALL tailscale_ping(host => 'quacktail-server', port => 9494); + CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); +-- Confirm the lake catalog exists on the server FROM quack_query( 'quack:127.0.0.1:19494', - 'FROM quack_discover()', + 'SELECT database_name FROM duckdb_databases() WHERE database_name = ''lake''', token => '…', disable_ssl => true ); --- → quack:quacktail-server:9494, quack:100.64.x.x:9494, … ``` +Run `FROM quack_discover()` **on the server node** (or locally to list this host's Quack URIs). + ### 2) Query DuckLake tables When the server owns DuckLake metadata + Parquet (compose demo), run lake SQL **on the server** through `quack_query`: @@ -61,7 +65,7 @@ FROM quack_query( ## Constraints - **Quack streaming-scan limit** — one remote Quack read/write per SQL statement; see [QUACK_STREAMING.md](QUACK_STREAMING.md). Each `quack_query` call is one statement. -- **Discovery** — remote discover = `quack_query(..., 'FROM quack_discover()')` until `ducklake_discover()` lands. +- **Discovery** — client uses `tailscale_quack_forward` + `tailscale_ping`; probe lake with `quack_query(..., duckdb_databases())`. Do **not** `quack_query(..., 'FROM quack_discover()')`. ## Demo diff --git a/examples/ducklake/README.md b/examples/ducklake/README.md index d976a00..4e8ba47 100644 --- a/examples/ducklake/README.md +++ b/examples/ducklake/README.md @@ -9,8 +9,8 @@ quacktail-server quacktail-client ───────────────── ───────────────── tailscale_up tailscale_up ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward - └─ ducklake-lake volume quack_query → quack_discover() (find) -quack_serve(127.0.0.1:9494) quack_query → lake.inventory (query) + └─ ducklake-lake volume tailscale_ping + quack_query (find lake) +quack_serve(127.0.0.1:9494) quack_query → lake.inventory tailscale_serve_local ATTACH quack:… AS remote (e2e) ``` @@ -39,13 +39,18 @@ Expect `PASSED` (Quack e2e) and `LAKE_PASSED` with `inventory_rows = 2`. ```sql CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); +CALL tailscale_ping(host => 'quacktail-server', port => 9494); CREATE SECRET (TYPE quack, TOKEN 'quackscale-demo-token', SCOPE 'quack:127.0.0.1:19494'); --- Find: run quack_discover on the server (not locally — local discover lists this node only) -FROM quack_query('quack:127.0.0.1:19494', 'FROM quack_discover()', token => '…', disable_ssl => true); +-- Find lake catalog on server (do not quack_query quack_discover — it hangs) +FROM quack_query('quack:127.0.0.1:19494', + 'SELECT database_name FROM duckdb_databases() WHERE database_name = ''lake''', + token => 'quackscale-demo-token', disable_ssl => true); --- Query: SQL executes on server where lake is attached -FROM quack_query('quack:127.0.0.1:19494', 'SELECT * FROM lake.inventory', token => '…', disable_ssl => true); +-- Query inventory (SQL runs on server) +FROM quack_query('quack:127.0.0.1:19494', + 'SELECT * FROM lake.inventory', + token => 'quackscale-demo-token', disable_ssl => true); ``` See [docs/DUCKLAKE_TAILNET.md](../docs/DUCKLAKE_TAILNET.md) and [local-demo.sql](local-demo.sql). diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index b2c9fd5..a198f31 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -182,7 +182,8 @@ write_client_session_sql() { forward_sql="CALL tailscale_quack_proxy();" fi if [[ "$ENABLE_DUCKLAKE" == "1" ]]; then - lake_discover="$(compose_sql_quack_query "$attach_uri" "FROM quack_discover();")" + # Do not quack_query('…', 'FROM quack_discover()') — hangs when invoked on server over Quack. + lake_discover="$(compose_sql_quack_query "$attach_uri" "SELECT database_name FROM duckdb_databases() WHERE database_name = '${LAKE_NAME}'")" lake_select="$(compose_sql_quack_query "$attach_uri" "SELECT * FROM ${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5")" lake_passed_sql="$(compose_sql_quack_query "$attach_uri" "SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM ${LAKE_NAME}.inventory")" fi @@ -216,13 +217,12 @@ FROM quack_query( token => '${QUACK_TOKEN}', disable_ssl => true ); - -$(compose_sql_attach_remote "$attach_uri") - -SELECT * FROM remote.e2e_payload LIMIT 5; ${lake_discover} ${lake_select} ${lake_passed_sql} +$(compose_sql_attach_remote "$attach_uri") + +SELECT * FROM remote.e2e_payload LIMIT 5; SELECT 'PASSED' AS status, '${attach_uri}' AS attach_uri, From 75e1d3e744cbb60094355e1e295cc63d2b5eba26 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:11:50 +0200 Subject: [PATCH 07/25] Fix client hang: drop quack_discover quack_query and grep pipe. quack_query(FROM quack_discover()) deadlocks on the server; use forward quack_uri for discovery. Write DuckDB stdout directly to client.out and run lake quack_query before ATTACH remote. Co-authored-by: Cursor --- docs/DUCKLAKE_TAILNET.md | 26 ++++++------------ examples/ducklake/README.md | 32 ++++++++++------------ scripts/e2e/quacktail-compose-bootstrap.sh | 7 ++--- scripts/e2e/quacktail-entrypoint.sh | 7 +++-- 4 files changed, 30 insertions(+), 42 deletions(-) diff --git a/docs/DUCKLAKE_TAILNET.md b/docs/DUCKLAKE_TAILNET.md index a2123c0..f3023ed 100644 --- a/docs/DUCKLAKE_TAILNET.md +++ b/docs/DUCKLAKE_TAILNET.md @@ -7,36 +7,26 @@ Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Heads | Piece | Status | |-------|--------| | Server: local DuckLake + `quack_serve` + `tailscale_serve_local` | **Done** | -| Client: tailscale forward + `quack_query` lake probe/query | **Done** | +| Client: `quack_query` → `lake.inventory` (before ATTACH remote) | **Done** | | Client `ducklake:quack:` attach (client-side `DATA_PATH`) | Documented — use when Parquet is local/shared | | `ducklake_discover()` / enriched `quack_discover` | TBD | ## Find + query on tailnet -### 1) Find the server and its DuckLake catalog +### 1) Find Quack servers on the tailnet -On the client, **tailnet routing** finds the server; **`quack_discover()` must run locally** (do not invoke it via `quack_query` on the server — that hangs over Quack): +Use **`tailscale_quack_forward`** (returns `quack_uri`) or **`CALL quack_discover()`** on a node that runs quackscale locally. + +Do **not** run `quack_query(..., 'FROM quack_discover()')` — it can deadlock when the server executes discover inside Quack's query handler (tsnet + quack lock contention). Remote discovery is TBD (`ducklake_discover()`). ```sql CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); -CALL tailscale_ping(host => 'quacktail-server', port => 9494); - -CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); - --- Confirm the lake catalog exists on the server -FROM quack_query( - 'quack:127.0.0.1:19494', - 'SELECT database_name FROM duckdb_databases() WHERE database_name = ''lake''', - token => '…', - disable_ssl => true -); +-- → quack_uri = quack:127.0.0.1:19494 ``` -Run `FROM quack_discover()` **on the server node** (or locally to list this host's Quack URIs). - ### 2) Query DuckLake tables -When the server owns DuckLake metadata + Parquet (compose demo), run lake SQL **on the server** through `quack_query`: +Run lake SQL **before** `ATTACH … AS remote` in the same session (mixing attached Quack catalog + extra `quack_query` calls can stall). ```sql FROM quack_query( @@ -65,7 +55,7 @@ FROM quack_query( ## Constraints - **Quack streaming-scan limit** — one remote Quack read/write per SQL statement; see [QUACK_STREAMING.md](QUACK_STREAMING.md). Each `quack_query` call is one statement. -- **Discovery** — client uses `tailscale_quack_forward` + `tailscale_ping`; probe lake with `quack_query(..., duckdb_databases())`. Do **not** `quack_query(..., 'FROM quack_discover()')`. +- **Discovery** — use `tailscale_quack_forward` / local `quack_discover()`; not `quack_query(..., quack_discover)` (deadlocks). ## Demo diff --git a/examples/ducklake/README.md b/examples/ducklake/README.md index 4e8ba47..61aae51 100644 --- a/examples/ducklake/README.md +++ b/examples/ducklake/README.md @@ -1,6 +1,6 @@ # DuckLake + Quack on QuackTail -Branch **`ducklake`** extends the compose demo: the server attaches a local DuckLake catalog, seeds an `inventory` table, then `quack_serve` exposes it on the tailnet. The client discovers and queries the lake **via `quack_query`** (SQL runs on the server where DuckLake is attached). +Branch **`ducklake`** extends the compose demo: the server attaches a local DuckLake catalog, seeds an `inventory` table, then `quack_serve` exposes it on the tailnet. The client queries the lake **via `quack_query`** (SQL runs on the server where DuckLake is attached). ## Architecture @@ -8,10 +8,10 @@ Branch **`ducklake`** extends the compose demo: the server attaches a local Duck quacktail-server quacktail-client ───────────────── ───────────────── tailscale_up tailscale_up -ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward - └─ ducklake-lake volume tailscale_ping + quack_query (find lake) -quack_serve(127.0.0.1:9494) quack_query → lake.inventory -tailscale_serve_local ATTACH quack:… AS remote (e2e) +ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward → quack_uri + └─ ducklake-lake volume quack_query → lake.inventory (before ATTACH) +quack_serve(127.0.0.1:9494) ATTACH quack:… AS remote (e2e) +tailscale_serve_local ``` Parquet + metadata live on **`ducklake-lake`** on the server only (`/var/lib/ducklake`). @@ -20,8 +20,9 @@ Parquet + metadata live on **`ducklake-lake`** on the server only (`/var/lib/duc | Pattern | When to use | |---------|-------------| -| **`quack_query(uri, '…')`** | Server owns DuckLake files (compose demo). Find + query without client-side Parquet. | -| **`ATTACH 'ducklake:quack:…' AS lake (DATA_PATH '…')`** | Client has local or shared Parquet path ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)). | +| **`quack_query(uri, '…')`** | Server owns DuckLake files (compose demo). Run **before** `ATTACH quack AS remote`. | +| **`tailscale_quack_forward`** | Find/connect — returns `quack_uri` for the forwarder. | +| **`ATTACH 'ducklake:quack:…' AS lake (DATA_PATH '…')`** | Client has local or shared Parquet ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)). | | **`ATTACH 'quack:…' AS remote`** | Primary catalog only (`remote.e2e_payload`). **Not** nested `remote.lake.*`. | ## Run the demo @@ -33,24 +34,21 @@ docker compose up -d --force-recreate headscale quacktail-server docker compose --profile test run --rm quacktail-client ``` -Expect `PASSED` (Quack e2e) and `LAKE_PASSED` with `inventory_rows = 2`. +Expect `DISCOVERED`, inventory rows, `LAKE_PASSED`, and `PASSED`. ## Tailnet client SQL ```sql CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); -CALL tailscale_ping(host => 'quacktail-server', port => 9494); +-- Find: quack_uri from forward result (do not quack_query quack_discover — deadlocks) + CREATE SECRET (TYPE quack, TOKEN 'quackscale-demo-token', SCOPE 'quack:127.0.0.1:19494'); --- Find lake catalog on server (do not quack_query quack_discover — it hangs) -FROM quack_query('quack:127.0.0.1:19494', - 'SELECT database_name FROM duckdb_databases() WHERE database_name = ''lake''', - token => 'quackscale-demo-token', disable_ssl => true); +-- Query lake on server (before ATTACH remote) +FROM quack_query('quack:127.0.0.1:19494', 'SELECT * FROM lake.inventory', token => '…', disable_ssl => true); --- Query inventory (SQL runs on server) -FROM quack_query('quack:127.0.0.1:19494', - 'SELECT * FROM lake.inventory', - token => 'quackscale-demo-token', disable_ssl => true); +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); +SELECT * FROM remote.e2e_payload; ``` See [docs/DUCKLAKE_TAILNET.md](../docs/DUCKLAKE_TAILNET.md) and [local-demo.sql](local-demo.sql). diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index a198f31..3c2e8d5 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -182,8 +182,7 @@ write_client_session_sql() { forward_sql="CALL tailscale_quack_proxy();" fi if [[ "$ENABLE_DUCKLAKE" == "1" ]]; then - # Do not quack_query('…', 'FROM quack_discover()') — hangs when invoked on server over Quack. - lake_discover="$(compose_sql_quack_query "$attach_uri" "SELECT database_name FROM duckdb_databases() WHERE database_name = '${LAKE_NAME}'")" + lake_discover=$"SELECT 'DISCOVERED' AS status, '${attach_uri}' AS quack_uri, '${SERVER_HOST}' AS server_host;\n" lake_select="$(compose_sql_quack_query "$attach_uri" "SELECT * FROM ${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5")" lake_passed_sql="$(compose_sql_quack_query "$attach_uri" "SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM ${LAKE_NAME}.inventory")" fi @@ -217,8 +216,8 @@ FROM quack_query( token => '${QUACK_TOKEN}', disable_ssl => true ); -${lake_discover} -${lake_select} + +${lake_discover}${lake_select} ${lake_passed_sql} $(compose_sql_attach_remote "$attach_uri") diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index 0f66e09..af14164 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -196,16 +196,17 @@ run_duckdb_client_session() { ext_cmd="$(quacktail_sql_extension_directory)" : >"$tsnet_log" - # Foreground pipeline (background subshell caused stdout backpressure / apparent hangs). + # Write directly to client.out — never pipe through grep (stdout backpressure deadlock). set +o pipefail if [[ "$QUIET" == "1" ]]; then "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ -cmd "$ext_cmd" -f "$session_sql" \ - 2>>"$tsnet_log" | quacktail_filter_demo_stream | tee "$out" || duckdb_rc=$? + >"$out" 2>>"$tsnet_log" || duckdb_rc=$? + cat "$out" else "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ -cmd "$ext_cmd" -f "$session_sql" \ - 2>&1 | quacktail_filter_demo_stream | tee "$out" || duckdb_rc=$? + 2>&1 | tee "$out" || duckdb_rc=$? fi set -o pipefail From 06481701dcc8d21a9636409d61e3bc86975d07ed Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:17:47 +0200 Subject: [PATCH 08/25] Fix client demo slowness, literal \\n SQL, and pointless retries. Stop forcing bootstrap refresh every run (wrong stale check), restore live tee output, fail fast on parser errors, fix DISCOVERED SQL quoting, and lower demo timeout/retries to 60s / 3 attempts. Co-authored-by: Cursor --- examples/docker-compose.yml | 4 +-- scripts/e2e/quacktail-compose-bootstrap.sh | 15 ++++++++--- scripts/e2e/quacktail-entrypoint.sh | 29 +++++++++++++++++++--- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/examples/docker-compose.yml b/examples/docker-compose.yml index 86d7c84..8cfb195 100644 --- a/examples/docker-compose.yml +++ b/examples/docker-compose.yml @@ -19,8 +19,8 @@ x-env: &env QUACK_FORWARD_LOCAL_PORT: "19494" QUACK_TAILNET_TOKEN: ${QUACK_TAILNET_TOKEN:-quackscale-demo-token} QUACKTAIL_QUIET: "1" - QUACKTAIL_DEMO_TIMEOUT_SEC: "90" - QUACKTAIL_CLIENT_ATTEMPTS: "15" + QUACKTAIL_DEMO_TIMEOUT_SEC: "60" + QUACKTAIL_CLIENT_ATTEMPTS: "3" QUACKTAIL_CLIENT_POLL_SEC: "2" QUACKTAIL_WAIT_ATTEMPTS: "15" QUACKTAIL_WAIT_POLL_SEC: "1" diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index 3c2e8d5..061af83 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -170,9 +170,9 @@ write_client_session_sql() { local attach_uri="${2:?attach uri required}" local ping_sql="" local forward_sql="" - local lake_discover="" local lake_select="" local lake_passed_sql="" + local lake_discover_sql="" if duckdb_has_quackscale_function tailscale_ping; then ping_sql="CALL tailscale_ping(host => '${SERVER_HOST}', port => ${QUACK_PORT});" fi @@ -182,7 +182,7 @@ write_client_session_sql() { forward_sql="CALL tailscale_quack_proxy();" fi if [[ "$ENABLE_DUCKLAKE" == "1" ]]; then - lake_discover=$"SELECT 'DISCOVERED' AS status, '${attach_uri}' AS quack_uri, '${SERVER_HOST}' AS server_host;\n" + lake_discover_sql="SELECT 'DISCOVERED' AS status, '${attach_uri}' AS quack_uri, '${SERVER_HOST}' AS server_host;" lake_select="$(compose_sql_quack_query "$attach_uri" "SELECT * FROM ${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5")" lake_passed_sql="$(compose_sql_quack_query "$attach_uri" "SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM ${LAKE_NAME}.inventory")" fi @@ -217,7 +217,8 @@ FROM quack_query( disable_ssl => true ); -${lake_discover}${lake_select} +${lake_discover_sql} +${lake_select} ${lake_passed_sql} $(compose_sql_attach_remote "$attach_uri") @@ -230,6 +231,10 @@ SELECT COUNT(*)::INTEGER AS total_rows FROM remote.e2e_payload; SQL + if grep -q '\\n' "$WORK/client_session.sql" 2>/dev/null; then + echo "error: generated client_session.sql contains literal \\n" >&2 + exit 1 + fi write_client_init_sql "$authkey" cp "$WORK/client_session.sql" "$WORK/client_demo.sql" } @@ -332,7 +337,9 @@ if [[ -f "$WORK/server_setup.sql" && -f "$WORK/authkey" ]]; then || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_ping' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'quack_query' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && grep -q 'ON CONFLICT' "$WORK/client_session.sql"; } \ - || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_quack_proxy' "$WORK/client_session.sql"; } \ + || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_quack_forward' "$WORK/client_session.sql"; } \ + || { [[ -f "$WORK/client_session.sql" ]] && grep -q '\\n' "$WORK/client_session.sql"; } \ + || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q 'DISCOVERED' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q "${LAKE_NAME}.inventory" "$WORK/client_session.sql"; }; then refresh_client_sql "$AUTHKEY" echo "✓ client SQL ready — attach ${ATTACH_URI}" diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index af14164..47fcd48 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -138,12 +138,16 @@ quacktail_filter_demo_stream() { ensure_client_sql() { if [[ -f "${WORK}/authkey" ]] && [[ -x /usr/local/bin/quacktail-compose-bootstrap.sh ]]; then - COMPOSE_REFRESH_CLIENT_SQL=1 QUACKTAIL_AUTO_BOOTSTRAP=1 /usr/local/bin/quacktail-compose-bootstrap.sh + QUACKTAIL_AUTO_BOOTSTRAP=1 /usr/local/bin/quacktail-compose-bootstrap.sh fi if [[ ! -f "${WORK}/client_session.sql" ]]; then echo "error: ${WORK}/client_session.sql missing" >&2 exit 1 fi + if grep -q '\\n' "${WORK}/client_session.sql" 2>/dev/null; then + echo "error: ${WORK}/client_session.sql contains literal \\n (regenerate bootstrap)" >&2 + exit 1 + fi } client_attach_uri() { @@ -185,6 +189,15 @@ quacktail_client_on_signal() { exit 130 } +quacktail_client_has_fatal_sql_error() { + local out="${1:?client out file}" + local tsnet_log="${WORK}/client-tsnet.log" + grep -qE 'Parser Error:|Catalog Error:|Binder Error:|Syntax Error:' "$out" 2>/dev/null \ + && return 0 + [[ -s "$tsnet_log" ]] \ + && grep -qE 'Parser Error:|Catalog Error:|Binder Error:|Syntax Error:' "$tsnet_log" 2>/dev/null +} + run_duckdb_client_session() { local session_sql="${1:?session sql file}" local out="${2:?out file}" @@ -196,13 +209,12 @@ run_duckdb_client_session() { ext_cmd="$(quacktail_sql_extension_directory)" : >"$tsnet_log" - # Write directly to client.out — never pipe through grep (stdout backpressure deadlock). + # Live tee to stdout + client.out — do not pipe through grep (deadlock) or buffer until exit. set +o pipefail if [[ "$QUIET" == "1" ]]; then "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ -cmd "$ext_cmd" -f "$session_sql" \ - >"$out" 2>>"$tsnet_log" || duckdb_rc=$? - cat "$out" + 2>>"$tsnet_log" | tee "$out" || duckdb_rc=$? else "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ -cmd "$ext_cmd" -f "$session_sql" \ @@ -234,6 +246,10 @@ run_client() { trap 'quacktail_client_on_signal INT' INT trap 'quacktail_client_on_signal TERM' TERM + if [[ "$QUIET" == "1" ]]; then + echo "→ preparing client (tailnet wait, extensions, session SQL) ..." + fi + wait_for_tailnet_server ensure_quack ensure_server_hosts_mapping @@ -262,6 +278,11 @@ run_client() { if quacktail_is_signal_rc "$duckdb_rc"; then exit "$duckdb_rc" fi + if quacktail_client_has_fatal_sql_error "$out"; then + echo "error: non-retryable SQL failure in client session" >&2 + quacktail_dump_client_failure + exit 1 + fi if [[ "$duckdb_rc" -eq 0 ]] && grep -q "PASSED" "$out" 2>/dev/null; then if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" != "1" ]] || grep -q "LAKE_PASSED" "$out" 2>/dev/null; then break From b2927c3e5d867791d2334dc5e16b3e59a5b53536 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:23:01 +0200 Subject: [PATCH 09/25] Exit client demo cleanly after PASSED via tailscale_down. One-shot clients hung forever because tsnet and quack_forward background threads kept DuckDB alive. Add CALL tailscale_down(), CLIENT_DEMO_DONE marker, and entrypoint watchdog to SIGINT the session after teardown. Co-authored-by: Cursor --- scripts/e2e/quacktail-compose-bootstrap.sh | 11 ++++ scripts/e2e/quacktail-entrypoint.sh | 77 +++++++++++++++++----- src/quackscale_extension.cpp | 25 +++++++ 3 files changed, 96 insertions(+), 17 deletions(-) diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index 061af83..3a39ad6 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -173,6 +173,10 @@ write_client_session_sql() { local lake_select="" local lake_passed_sql="" local lake_discover_sql="" + local client_cleanup_sql="" + if duckdb_has_quackscale_function tailscale_down; then + client_cleanup_sql="CALL tailscale_down();" + fi if duckdb_has_quackscale_function tailscale_ping; then ping_sql="CALL tailscale_ping(host => '${SERVER_HOST}', port => ${QUACK_PORT});" fi @@ -230,6 +234,11 @@ SELECT MAX(CASE WHEN source = 'client' THEN msg END) AS client_row, COUNT(*)::INTEGER AS total_rows FROM remote.e2e_payload; + +DETACH remote; +${client_cleanup_sql} + +SELECT 'CLIENT_DEMO_DONE' AS status; SQL if grep -q '\\n' "$WORK/client_session.sql" 2>/dev/null; then echo "error: generated client_session.sql contains literal \\n" >&2 @@ -339,6 +348,8 @@ if [[ -f "$WORK/server_setup.sql" && -f "$WORK/authkey" ]]; then || { [[ -f "$WORK/client_session.sql" ]] && grep -q 'ON CONFLICT' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_quack_forward' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && grep -q '\\n' "$WORK/client_session.sql"; } \ + || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'CLIENT_DEMO_DONE' "$WORK/client_session.sql"; } \ + || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'DETACH remote' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q 'DISCOVERED' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q "${LAKE_NAME}.inventory" "$WORK/client_session.sql"; }; then refresh_client_sql "$AUTHKEY" diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index 47fcd48..37f37f4 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -198,6 +198,16 @@ quacktail_client_has_fatal_sql_error() { && grep -qE 'Parser Error:|Catalog Error:|Binder Error:|Syntax Error:' "$tsnet_log" 2>/dev/null } +quacktail_client_session_succeeded() { + local out="${1:?client out file}" + grep -q "CLIENT_DEMO_DONE" "$out" 2>/dev/null || return 1 + grep -q "PASSED" "$out" 2>/dev/null || return 1 + if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" ]]; then + grep -q "LAKE_PASSED" "$out" 2>/dev/null || return 1 + fi + return 0 +} + run_duckdb_client_session() { local session_sql="${1:?session sql file}" local out="${2:?out file}" @@ -205,21 +215,56 @@ run_duckdb_client_session() { local tsnet_log="${WORK}/client-tsnet.log" local ext_cmd duckdb_rc=0 local timeout_cmd=(timeout --foreground "$demo_timeout") + local session_pid=0 + local deadline=0 ext_cmd="$(quacktail_sql_extension_directory)" : >"$tsnet_log" + : >"$out" - # Live tee to stdout + client.out — do not pipe through grep (deadlock) or buffer until exit. + # Background pipeline: duckdb | tee. After PASSED+LAKE_PASSED, stop tsnet/forward threads + # (otherwise DuckDB never exits). tailscale_down in SQL is preferred; this is the fallback. set +o pipefail - if [[ "$QUIET" == "1" ]]; then - "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ - -cmd "$ext_cmd" -f "$session_sql" \ - 2>>"$tsnet_log" | tee "$out" || duckdb_rc=$? - else - "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ - -cmd "$ext_cmd" -f "$session_sql" \ - 2>&1 | tee "$out" || duckdb_rc=$? - fi + ( + if [[ "$QUIET" == "1" ]]; then + "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ + -cmd "$ext_cmd" -f "$session_sql" \ + 2>>"$tsnet_log" | tee "$out" + else + "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ + -cmd "$ext_cmd" -f "$session_sql" \ + 2>&1 | tee "$out" + fi + ) & + session_pid=$! + deadline=$((SECONDS + demo_timeout)) + + while kill -0 "$session_pid" 2>/dev/null; do + if quacktail_client_session_succeeded "$out"; then + sleep 0.5 + kill -INT "$session_pid" 2>/dev/null || true + wait "$session_pid" 2>/dev/null || duckdb_rc=0 + set -o pipefail + return 0 + fi + if quacktail_client_has_fatal_sql_error "$out"; then + kill -INT "$session_pid" 2>/dev/null || true + wait "$session_pid" 2>/dev/null || true + set -o pipefail + return 1 + fi + if (( SECONDS >= deadline )); then + kill -TERM "$session_pid" 2>/dev/null || true + wait "$session_pid" 2>/dev/null || duckdb_rc=124 + set -o pipefail + echo "error: client demo timed out after ${demo_timeout}s" >&2 + quacktail_dump_client_failure + return 124 + fi + sleep 0.2 + done + + wait "$session_pid" || duckdb_rc=$? set -o pipefail if [[ "$duckdb_rc" -eq 124 ]]; then @@ -236,8 +281,8 @@ run_duckdb_client_session() { run_client() { local session_sql="${WORK}/client_session.sql" local out="${WORK}/client.out" - local demo_timeout="${QUACKTAIL_DEMO_TIMEOUT_SEC:-90}" - local max_attempts="${QUACKTAIL_CLIENT_ATTEMPTS:-15}" + local demo_timeout="${QUACKTAIL_DEMO_TIMEOUT_SEC:-60}" + local max_attempts="${QUACKTAIL_CLIENT_ATTEMPTS:-3}" local poll_sec="${QUACKTAIL_CLIENT_POLL_SEC:-2}" local attach_uri local duckdb_rc=0 @@ -283,10 +328,8 @@ run_client() { quacktail_dump_client_failure exit 1 fi - if [[ "$duckdb_rc" -eq 0 ]] && grep -q "PASSED" "$out" 2>/dev/null; then - if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" != "1" ]] || grep -q "LAKE_PASSED" "$out" 2>/dev/null; then - break - fi + if [[ "$duckdb_rc" -eq 0 ]] && grep -q "CLIENT_DEMO_DONE" "$out" 2>/dev/null; then + break fi if (( attempt < max_attempts )); then [[ "$QUIET" == "1" ]] && echo "→ retry ${attempt}/${max_attempts} ..." @@ -320,7 +363,7 @@ run_client() { echo "✓ Demo passed — two-node QuackTail cluster is working" fi else - echo "ok: client e2e passed (PASSED row present)" + echo "ok: client e2e passed (CLIENT_DEMO_DONE)" fi } diff --git a/src/quackscale_extension.cpp b/src/quackscale_extension.cpp index f9003b7..4ccd72b 100644 --- a/src/quackscale_extension.cpp +++ b/src/quackscale_extension.cpp @@ -130,6 +130,28 @@ static void QuackscaleStatusFunction(ClientContext &context, TableFunctionInput bind.finished = true; } +struct QuackscaleDownBindData : public TableFunctionData { + bool finished = false; +}; + +static unique_ptr QuackscaleDownBind(ClientContext &context, TableFunctionBindInput &input, + vector &return_types, vector &names) { + return_types = {LogicalType::BOOLEAN}; + names = {"shutdown_ok"}; + return make_uniq(); +} + +static void QuackscaleDownFunction(ClientContext &context, TableFunctionInput &data_p, DataChunk &output) { + auto &bind = data_p.bind_data->CastNoConst(); + if (bind.finished) { + return; + } + TailscaleBridge::Get().Shutdown(); + output.SetCardinality(1); + output.SetValue(0, 0, Value::BOOLEAN(true)); + bind.finished = true; +} + static void QuackscaleQuackUriFunction(DataChunk &args, ExpressionState &state, Vector &result) { auto uri = TailscaleBridge::Get().QuackListenURI(QUACKSCALE_DEFAULT_QUACK_PORT); result.Reference(Value(uri)); @@ -460,6 +482,9 @@ static void LoadInternal(ExtensionLoader &loader) { RegisterAuthParameters(up_function); loader.RegisterFunction(up_function); + TableFunction down_function("tailscale_down", {}, QuackscaleDownFunction, QuackscaleDownBind); + loader.RegisterFunction(down_function); + TableFunction login_function("tailscale_login", {}, QuackscaleBeginLoginFunction, QuackscaleBeginLoginBind); RegisterAuthParameters(login_function); From a0426ddf867ebc493f58c1f5f0ac2cbc5c6a315a Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:28:08 +0200 Subject: [PATCH 10/25] Exit client demo quickly after CLIENT_DEMO_DONE. Always emit CALL tailscale_down in client SQL; wait for teardown then SIGTERM/KILL the DuckDB process instead of leaving tsnet threads running. Co-authored-by: Cursor --- scripts/e2e/quacktail-compose-bootstrap.sh | 9 +-- scripts/e2e/quacktail-entrypoint.sh | 67 +++++++++++++--------- 2 files changed, 43 insertions(+), 33 deletions(-) diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index 3a39ad6..efc1aa2 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -173,10 +173,6 @@ write_client_session_sql() { local lake_select="" local lake_passed_sql="" local lake_discover_sql="" - local client_cleanup_sql="" - if duckdb_has_quackscale_function tailscale_down; then - client_cleanup_sql="CALL tailscale_down();" - fi if duckdb_has_quackscale_function tailscale_ping; then ping_sql="CALL tailscale_ping(host => '${SERVER_HOST}', port => ${QUACK_PORT});" fi @@ -236,9 +232,10 @@ SELECT FROM remote.e2e_payload; DETACH remote; -${client_cleanup_sql} SELECT 'CLIENT_DEMO_DONE' AS status; + +CALL tailscale_down(); SQL if grep -q '\\n' "$WORK/client_session.sql" 2>/dev/null; then echo "error: generated client_session.sql contains literal \\n" >&2 @@ -348,8 +345,8 @@ if [[ -f "$WORK/server_setup.sql" && -f "$WORK/authkey" ]]; then || { [[ -f "$WORK/client_session.sql" ]] && grep -q 'ON CONFLICT' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_quack_forward' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && grep -q '\\n' "$WORK/client_session.sql"; } \ + || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'CALL tailscale_down' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'CLIENT_DEMO_DONE' "$WORK/client_session.sql"; } \ - || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'DETACH remote' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q 'DISCOVERED' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q "${LAKE_NAME}.inventory" "$WORK/client_session.sql"; }; then refresh_client_sql "$AUTHKEY" diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index 37f37f4..5c9bca8 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -208,63 +208,76 @@ quacktail_client_session_succeeded() { return 0 } +quacktail_stop_process() { + local pid="${1:?pid}" + local wait_ms="${2:-1500}" + local elapsed=0 + kill -0 "$pid" 2>/dev/null || return 0 + while (( elapsed < wait_ms )); do + kill -0 "$pid" 2>/dev/null || break + sleep 0.1 + elapsed=$((elapsed + 100)) + done + kill -0 "$pid" 2>/dev/null || { wait "$pid" 2>/dev/null || true; return 0; } + kill -TERM "$pid" 2>/dev/null || true + sleep 0.2 + kill -0 "$pid" 2>/dev/null || { wait "$pid" 2>/dev/null || true; return 0; } + kill -KILL "$pid" 2>/dev/null || true + wait "$pid" 2>/dev/null || true +} + run_duckdb_client_session() { local session_sql="${1:?session sql file}" local out="${2:?out file}" local demo_timeout="${3:?timeout}" local tsnet_log="${WORK}/client-tsnet.log" local ext_cmd duckdb_rc=0 - local timeout_cmd=(timeout --foreground "$demo_timeout") - local session_pid=0 + local timeout_cmd=(timeout --foreground --kill-after=3 "$demo_timeout") + local duck_pid=0 local deadline=0 ext_cmd="$(quacktail_sql_extension_directory)" : >"$tsnet_log" : >"$out" - # Background pipeline: duckdb | tee. After PASSED+LAKE_PASSED, stop tsnet/forward threads - # (otherwise DuckDB never exits). tailscale_down in SQL is preferred; this is the fallback. + # Live tee + monitor: after CLIENT_DEMO_DONE, allow tailscale_down then SIGTERM/KILL if needed. set +o pipefail - ( - if [[ "$QUIET" == "1" ]]; then - "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ - -cmd "$ext_cmd" -f "$session_sql" \ - 2>>"$tsnet_log" | tee "$out" - else - "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ - -cmd "$ext_cmd" -f "$session_sql" \ - 2>&1 | tee "$out" - fi - ) & - session_pid=$! - deadline=$((SECONDS + demo_timeout)) + if [[ "$QUIET" == "1" ]]; then + "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ + -cmd "$ext_cmd" -f "$session_sql" \ + > >(tee "$out") 2>>"$tsnet_log" & + else + "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ + -cmd "$ext_cmd" -f "$session_sql" \ + 2>&1 | tee "$out" & + fi + duck_pid=$! + deadline=$((SECONDS + demo_timeout + 5)) - while kill -0 "$session_pid" 2>/dev/null; do + while kill -0 "$duck_pid" 2>/dev/null; do if quacktail_client_session_succeeded "$out"; then - sleep 0.5 - kill -INT "$session_pid" 2>/dev/null || true - wait "$session_pid" 2>/dev/null || duckdb_rc=0 + # CLIENT_DEMO_DONE is printed before CALL tailscale_down(); allow it to run. + sleep 1.5 + quacktail_stop_process "$duck_pid" 500 set -o pipefail return 0 fi if quacktail_client_has_fatal_sql_error "$out"; then - kill -INT "$session_pid" 2>/dev/null || true - wait "$session_pid" 2>/dev/null || true + quacktail_stop_process "$duck_pid" 500 set -o pipefail return 1 fi if (( SECONDS >= deadline )); then - kill -TERM "$session_pid" 2>/dev/null || true - wait "$session_pid" 2>/dev/null || duckdb_rc=124 + quacktail_stop_process "$duck_pid" 500 set -o pipefail echo "error: client demo timed out after ${demo_timeout}s" >&2 quacktail_dump_client_failure return 124 fi - sleep 0.2 + sleep 0.1 done - wait "$session_pid" || duckdb_rc=$? + wait "$duck_pid" || duckdb_rc=$? set -o pipefail if [[ "$duckdb_rc" -eq 124 ]]; then From 59332523573c2f5e39e7599c4dee7a5de7d04c3e Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:33:46 +0200 Subject: [PATCH 11/25] Fix client teardown when tailscale_down is missing and stop double retries. Probe for tailscale_down before emitting client SQL, run teardown before CLIENT_DEMO_DONE, merge stderr into client.out, and verify the function at image build time when compiling from source. Co-authored-by: Cursor --- README.md | 2 + docs/README.md | 14 +- docs/usage.md | 470 +++++++++++++++++++++ examples/Dockerfile | 7 +- scripts/e2e/quacktail-compose-bootstrap.sh | 11 +- scripts/e2e/quacktail-entrypoint.sh | 33 +- 6 files changed, 507 insertions(+), 30 deletions(-) create mode 100644 docs/usage.md diff --git a/README.md b/README.md index f65d3b8..86ef4d4 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ LOAD quackscale; -- tailscale_up, quack_uri, quack_token, ... | Doc | Contents | |-----|----------| +| [docs/usage.md](docs/usage.md) | **Use cases** — Quack, DuckLake, S3, discovery, production patterns | | [docs/PLAN.md](docs/PLAN.md) | Architecture, roadmap, risks | | [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md) | **Tailscale** — auth keys, browser login, `TS_AUTHKEY` | | [docs/HEADSCALE.md](docs/HEADSCALE.md) | **Headscale** — self-hosted control plane (`control_url`, preauth keys) | @@ -174,6 +175,7 @@ Load with `LOAD quackscale;`. Use **`CALL`** for table functions (same style as | `CALL tailscale_login_status()` | Poll login (`starting` / `needs_login` / `up` / `error`) | | `CALL tailscale_status()` | libtailscale linked?, running, hostname, tailnet IPs | | `CALL tailscale_quack_forward(host => 'peer', port => 9494)` | Localhost TCP → `tailscale_dial` (preferred for Quack ATTACH; no ALL_PROXY) | +| `CALL tailscale_down()` | Stop forwarder + close tsnet (one-shot clients — required or process hangs) | | `CALL tailscale_quack_proxy()` | Legacy SOCKS + ALL_PROXY | | `CALL tailscale_proxy_status()` | Legacy SOCKS status | diff --git a/docs/README.md b/docs/README.md index c4be666..cf78406 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,12 +2,14 @@ ## Start here -1. **[../README.md](../README.md)** — build, quick start, SQL reference -2. **[AUTHENTICATION.md](AUTHENTICATION.md)** — Tailscale (`TS_AUTHKEY`, `tailscale_up`, browser login) -3. **[HEADSCALE.md](HEADSCALE.md)** — self-hosted [Headscale](https://github.com/juanfont/headscale) (`control_url`, preauth keys) -4. **[QUACK_AUTH.md](QUACK_AUTH.md)** — Quack tokens for QuackTail (`QUACK_TAILNET_TOKEN`, shared secrets, auth macros) -5. **[PLAN.md](PLAN.md)** — architecture, API roadmap, risks -6. **[../examples/README.md](../examples/README.md)** — Docker Compose two-node Headscale demo +1. **[usage.md](usage.md)** — **use cases & solution patterns** (Quack, DuckLake, S3, discovery) +2. **[../README.md](../README.md)** — build, quick start, SQL reference +3. **[AUTHENTICATION.md](AUTHENTICATION.md)** — Tailscale (`TS_AUTHKEY`, `tailscale_up`, browser login) +4. **[HEADSCALE.md](HEADSCALE.md)** — self-hosted [Headscale](https://github.com/juanfont/headscale) (`control_url`, preauth keys) +5. **[QUACK_AUTH.md](QUACK_AUTH.md)** — Quack tokens for QuackTail (`QUACK_TAILNET_TOKEN`, shared secrets, auth macros) +6. **[DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md)** — DuckLake over the tailnet (compose demo, patterns B/C) +7. **[PLAN.md](PLAN.md)** — architecture, API roadmap, risks +8. **[../examples/README.md](../examples/README.md)** — Docker Compose two-node Headscale demo ## QuackTail authentication at a glance diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 0000000..baa3844 --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,470 @@ +# QuackTail usage guide + +QuackTail combines three ideas: + +1. **Tailscale (or Headscale)** — private mesh network between nodes +2. **Quack** — DuckDB’s HTTP client/server protocol (`quack:` URIs) +3. **QuackScale** — joins DuckDB to the tailnet and bridges Quack across it + +Optional fourth ingredient: **DuckLake** — transactional lakehouse catalog + Parquet, which [integrates with Quack in DuckDB 1.5.3+](https://duckdb.org/2026/05/20/announcing-duckdb-153.html). + +This guide is for **designing solutions**: what works today, how the compose demo maps to real deployments, and how to grow toward S3, many lakes, and fleet discovery. + +--- + +## What you can build + +| Idea | Building blocks | Tailnet role | +|------|-----------------|--------------| +| **Shared analytics DB** | `quack_serve` + client `ATTACH` | One DuckDB process serves tables; many clients query/write over Quack | +| **Edge ingest → central DuckDB** | Quack `INSERT` from clients | Laptops/containers push rows to a central server without copying files | +| **Lakehouse catalog server** | DuckLake on server + Quack | Server owns metadata + Parquet; clients query lake tables remotely | +| **Distributed lake readers** | `ducklake:quack:` + shared `DATA_PATH` | Catalog over Quack; Parquet on S3 or a path every reader can see | +| **Hybrid** | Quack primary DB + attached DuckLake | Operational tables via `remote.*`; historical Parquet via lake SQL | + +QuackScale’s job is **not** to replace Quack or DuckLake — it makes them reachable on `100.x.x.x` / MagicDNS without exposing the public internet. + +--- + +## Mental model + +```text +┌─────────────────────────────────────────────────────────────────┐ +│ quacktail-server (long-lived) │ +│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │ +│ │ quackscale │──►│ tsnet │──►│ tailnet :9494 │ │ +│ │ tailscale_up│ │ serve_local │ │ (MagicDNS / 100.x) │ │ +│ └─────────────┘ └──────────────┘ └───────────┬─────────────┘ │ +│ ┌─────────────┐ quack_serve(127.0.0.1:9494) │ │ +│ │ quack │◄────────────────────────────────┘ │ +│ │ DuckDB │ optional: ATTACH ducklake:… AS lake │ +│ └─────────────┘ Parquet → local disk or s3://bucket/prefix/ │ +└─────────────────────────────────────────────────────────────────┘ + ▲ + │ tailscale_dial (encrypted) + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ quacktail-client (batch job, laptop, second container) │ +│ tailscale_up → tailscale_quack_forward → quack:127.0.0.1:19494 │ +│ ATTACH / quack_query / ducklake:quack:… │ +│ CALL tailscale_down() ← one-shot clients must tear down tsnet │ +└─────────────────────────────────────────────────────────────────┘ +``` + +**Two credentials, always:** + +| Layer | Question | Typical secret | +|-------|----------|----------------| +| Tailscale | Is this node on our mesh? | `TS_AUTHKEY` / Headscale preauth key | +| Quack | May this caller run SQL? | `QUACK_TAILNET_TOKEN` + `CREATE SECRET` | + +See [AUTHENTICATION.md](AUTHENTICATION.md) and [QUACK_AUTH.md](QUACK_AUTH.md). + +--- + +## Choose a pattern + +```text +Need remote DuckDB tables (CRUD, dashboards)? + └─► Pattern A: ATTACH 'quack:…' AS remote + +Need lakehouse tables (Parquet, time travel, many files)? + ├─► Server owns all Parquet? + │ └─► Pattern B: quack_query(server, 'SELECT … FROM lake.t') + └─► Clients can read same Parquet path (disk mount, S3, GCS)? + └─► Pattern C: ATTACH 'ducklake:quack:…' AS lake (DATA_PATH 's3://…') + +Both operational + lake on one node? + └─► Pattern D: Hybrid (B + A in one session — mind ordering & limits) +``` + +### Pattern comparison + +| Pattern | Client SQL | Parquet location | Best for | +|---------|------------|------------------|----------| +| **A — Quack attach** | `ATTACH 'quack:host:9494' AS remote` | Server DuckDB file / memory | Shared tables, multi-writer Quack | +| **B — quack_query lake** | `quack_query(uri, 'SELECT … FROM lake.t')` | Server-only paths | Compose demo, server-side lake | +| **C — ducklake:quack** | `ATTACH 'ducklake:quack:host' AS lake (DATA_PATH '…')` | Shared object store or mount | Fleet of readers, [DuckDB 1.5.3 pattern](https://duckdb.org/2026/05/20/announcing-duckdb-153.html) | +| **D — Hybrid** | B first, then A (separate statements) | Mixed | Apps + analytics on one tailnet node | + +**Common mistake:** `SELECT * FROM remote.lake.inventory` — plain Quack attach exposes the **primary catalog only**, not nested attached DuckLake databases. Use pattern B or C instead. + +--- + +## Use case 1 — Remote DuckDB over Quack (analytics hub) + +**Story:** A central DuckDB node serves live tables to analysts and services on the tailnet. + +### Server (long-lived) + +```sql +LOAD quack; +LOAD quackscale; + +CALL tailscale_up( + hostname => 'analytics-hub', + state_dir => '/var/lib/quacktail/hub', + authkey => getenv('TS_AUTHKEY') -- or Headscale preauth key +); + +CREATE TABLE IF NOT EXISTS events (id INTEGER, payload VARCHAR, ts TIMESTAMP); +INSERT INTO events VALUES (1, 'hello-tailnet', now()); + +CALL quack_serve( + 'quack:127.0.0.1:9494', + allow_other_hostname => true, + token => quack_token() +); +CALL tailscale_serve_local(port => 9494); + +-- What clients can use: +FROM quack_discover(); +``` + +Keep this process running (systemd, Kubernetes, `quacktail-server` container). Do **not** call `tailscale_down()` on the server. + +### Client (analyst laptop or job) + +```sql +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(hostname => 'analyst-laptop', state_dir => '…', authkey => '…'); + +-- Forward tailnet Quack to loopback (required when client uses embedded tsnet) +CALL tailscale_quack_forward(host => 'analytics-hub', port => 9494, local_port => 19494); + +CREATE SECRET ( + TYPE quack, + TOKEN getenv('QUACK_TAILNET_TOKEN'), + SCOPE 'quack:127.0.0.1:19494' +); + +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); + +SELECT * FROM remote.events ORDER BY ts DESC LIMIT 10; +INSERT INTO remote.events VALUES (2, 'from-client', now()); + +-- One-shot jobs: tear down so the process exits +DETACH remote; +CALL tailscale_down(); +``` + +**Expand:** add read replicas (multiple Quack servers), token allowlists ([Quack security](https://duckdb.org/docs/current/quack/security)), TLS termination in front of Quack for non-tailnet clients. + +**Runnable demo:** [examples/README.md](../examples/README.md) + +--- + +## Use case 2 — DuckLake on a QuackTail node (server owns Parquet) + +**Story:** One node holds the DuckLake catalog and Parquet files; tailnet clients query inventory, metrics, or historical tables without copying the lake. + +### Server + +```sql +LOAD quack; +LOAD ducklake; +LOAD quackscale; + +CALL tailscale_up(hostname => 'lake-server', …); + +ATTACH 'ducklake:/data/lake/metadata/warehouse.ducklake' AS warehouse ( + DATA_PATH '/data/lake/parquet/' +); +USE warehouse; + +CREATE TABLE IF NOT EXISTS inventory (item_id INTEGER, quantity INTEGER); +INSERT INTO inventory VALUES (101, 50), (102, 120); + +CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => quack_token()); +CALL tailscale_serve_local(port => 9494); +``` + +Persist `/data/lake/` on disk, EBS, or sync to object storage out-of-band. + +### Client — query via `quack_query` (verified in compose) + +Run lake SQL **on the server** through stateless Quack HTTP. Use **`quack_query` before `ATTACH remote`** in the same session. + +```sql +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(…); +CALL tailscale_quack_forward(host => 'lake-server', port => 9494, local_port => 19494); + +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); + +-- Lake query (executes on server where `warehouse` is attached) +FROM quack_query( + 'quack:127.0.0.1:19494', + 'SELECT * FROM warehouse.inventory ORDER BY item_id', + token => '…', + disable_ssl => true +); + +-- Optional: also use Quack attach for non-lake tables +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); +SELECT * FROM remote.some_operational_table; + +DETACH remote; +CALL tailscale_down(); +``` + +**Expand to S3 on the server:** attach DuckLake with `DATA_PATH 's3://my-bucket/lake/parquet/'` ([DuckLake remote data path](https://duckdb.org/docs/stable/duckdb/guides/using_a_remote_data_path)). Clients keep using `quack_query` — they never need direct S3 credentials if the server holds them. + +**Runnable demo:** [examples/ducklake/README.md](../examples/ducklake/README.md) (branch `ducklake`) + +--- + +## Use case 3 — DuckLake over Quack with shared Parquet (client-side `DATA_PATH`) + +**Story:** Catalog metadata flows over Quack; every reader resolves Parquet from a **shared** location (S3, GCS, NFS, read-only volume). + +Official [DuckDB 1.5.3 example](https://duckdb.org/2026/05/20/announcing-duckdb-153.html): + +### Server (catalog via Quack only) + +```sql +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(hostname => 'lake-catalog', …); +CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => quack_token()); +CALL tailscale_serve_local(port => 9494); +``` + +### Client + +```sql +LOAD ducklake; +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(…); +CALL tailscale_quack_forward(host => 'lake-catalog', port => 9494, local_port => 19494); + +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); + +ATTACH 'ducklake:quack:127.0.0.1:19494' AS warehouse ( + DATA_PATH 's3://my-bucket/lake/parquet/' +); +-- Or: DATA_PATH '/mnt/shared/lake/parquet/' when NFS/K8s volume is mounted identically + +SELECT * FROM warehouse.inventory; +INSERT INTO warehouse.inventory VALUES (103, 7); + +CALL tailscale_down(); +``` + +**When to choose this:** many readers, object-store Parquet, clients need local DuckLake semantics (attach, `USE`, DML) — not just single-statement `quack_query`. + +**Requirements:** + +- `DATA_PATH` must be reachable from **each client** (same bucket prefix or shared mount). +- Configure DuckDB `httpfs` / cloud secrets on clients for `s3://` paths. +- Align with [DuckLake attach options](https://duckdb.org/docs/stable/duckdb/attach) (`OVERRIDE_DATA_PATH`, etc.) when migrating paths. + +--- + +## Use case 4 — Hybrid hub (Quack tables + DuckLake) + +**Story:** Same tailnet node serves live operational tables **and** a lake catalog. + +```text +analytics-hub +├── primary catalog → remote.orders, remote.users (Pattern A) +└── ATTACH ducklake AS warehouse → lake SQL via Pattern B or C +``` + +**Session discipline:** + +1. Lake reads/writes: `quack_query(…, '… lake …')` **or** `ducklake:quack:` attach +2. Operational tables: `ATTACH 'quack:…' AS remote` +3. **One remote Quack read/write per SQL statement** ([QUACK_STREAMING.md](QUACK_STREAMING.md)) +4. End one-shot clients with `DETACH remote; CALL tailscale_down();` + +--- + +## Connection recipe (tailnet client) + +This sequence is what the compose e2e proves end-to-end: + +```sql +LOAD quackscale; +CALL tailscale_up(hostname => '…', control_url => '…', authkey => '…', …); + +CALL tailscale_quack_forward(host => 'peer-hostname', port => 9494, local_port => 19494); +CALL tailscale_ping(host => 'peer-hostname', port => 9494); -- optional readiness + +LOAD quack; +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); + +FROM quack_query('quack:127.0.0.1:19494', 'SELECT 1 AS probe', token => '…', disable_ssl => true); + +-- Pattern B/C/D statements here … + +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); -- Pattern A +-- … + +DETACH remote; +CALL tailscale_down(); +``` + +**Why `tailscale_quack_forward`?** Quack clients use normal HTTP/TCP. Embedded tsnet does not automatically route kernel TCP to tailnet IPs. The forwarder listens on `127.0.0.1:19494` and dials the peer via `tailscale_dial`. + +**Why `tailscale_down`?** `tailscale_up` and the forwarder start background threads. One-shot DuckDB processes **hang after SQL finishes** unless you shut tsnet down. + +--- + +## Finding peers on the tailnet + +| Method | Scope | Status | +|--------|--------|--------| +| **`CALL tailscale_quack_forward(…)`** | Returns `quack_uri` for a **known** host | **Use today** | +| **`FROM quack_discover()`** on **this** node | Lists URIs this node would advertise | **Use today** (server-side) | +| **Config / service registry** | Helm values, Consul, env vars | **Use today** (operations) | +| **`quack_query(…, 'FROM quack_discover()')`** | Remote discover via Quack | **Avoid** — can deadlock on server | +| **`ducklake_discover()`** | Enriched discovery (lake + Quack) | **Planned** ([DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md)) | + +**Practical fleet pattern today:** + +1. Deploy lake/analytics nodes with stable Headscale/Tailscale hostnames (`analytics-hub`, `lake-server`). +2. Document `quack_uri` from server bootstrap (`quack_discover()` in server init logs). +3. Clients use MagicDNS names in `tailscale_quack_forward(host => 'analytics-hub', …)`. + +**Multiple lakes on one server:** attach each with a distinct alias: + +```sql +ATTACH 'ducklake:/data/sales.ducklake' AS sales (DATA_PATH 's3://bucket/sales/'); +ATTACH 'ducklake:/data/support.ducklake' AS support (DATA_PATH 's3://bucket/support/'); +``` + +Clients query with fully qualified names: + +```sql +FROM quack_query(uri, 'SELECT * FROM sales.orders LIMIT 100', …); +FROM quack_query(uri, 'SELECT count(*) FROM support.tickets', …); +``` + +Each call is one statement — compatible with Quack streaming limits. + +**Multiple Quack servers:** one forwarder port per peer (`19494`, `19495`, …) or sequential client sessions: + +```sql +CALL tailscale_quack_forward(host => 'hub-a', port => 9494, local_port => 19494); +-- work with hub-a … +CALL tailscale_down(); + +CALL tailscale_up(…); -- or reuse session if your app keeps tsnet up +CALL tailscale_quack_forward(host => 'hub-b', port => 9494, local_port => 19494); +``` + +--- + +## Expanding toward production + +### Object storage (S3 / GCS / Azure) + +| Role | Approach | +|------|----------| +| **Server owns lake** | `DATA_PATH 's3://bucket/prefix/'` in server `ATTACH ducklake:…`; clients use Pattern B | +| **Readers with credentials** | Pattern C — each client `ATTACH 'ducklake:quack:…' (DATA_PATH 's3://…')` | +| **Inline small files** | DuckLake [data inlining](https://duckdb.org/docs/stable/duckdb/ducklake) — future Quack+DuckLake perf wins on tailnet | + +Load `httpfs` / cloud extensions and set secrets on whichever node reads/writes `s3://`. + +### Persistence & lifecycle + +| Deployment | `state_dir` | `tailscale_down` | +|------------|-------------|------------------| +| Long-lived server | Persistent volume | **Never** on steady state | +| Cron / CI job | Ephemeral or persistent | **Always** at end | +| Compose client profile | `/tmp/client-tailscale` | **Always** (see compose bootstrap) | + +DuckLake metadata: file (`*.ducklake`), Postgres, or DuckDB file — see [DuckLake catalog options](https://duckdb.org/docs/stable/duckdb/attach). + +### Security hardening + +- Rotate `QUACK_TAILNET_TOKEN`; use [multi-token tables](https://duckdb.org/docs/current/quack/security#example-multi-token-table) +- Prefer Headscale ACLs ([examples compose policy](../examples/docker-compose.yml)) +- Do not commit auth keys; use K8s secrets / systemd `EnvironmentFile` +- Consider TLS in front of Quack for non-tailnet callers (QuackScale handles tailnet encryption only) + +### Observability + +- Server: `CALL tailscale_status()`, Quack logs, `/work/server.log` in compose +- Client: `/work/client.out`, `/work/client-tsnet.log` +- Readiness: `CALL tailscale_ping(host => 'peer', port => 9494)` before heavy queries + +--- + +## Limitations & workarounds + +| Issue | Workaround | +|-------|------------| +| `remote.lake.table` does not exist | Use `quack_query` or `ducklake:quack:` (patterns B/C) | +| Multiple Quack scans in one SQL | Split statements; see [QUACK_STREAMING.md](QUACK_STREAMING.md) | +| `quack_query` + `ATTACH remote` stalls | Run lake `quack_query` **before** attach; separate statements | +| Client hangs after success | `CALL tailscale_down()` (and `DETACH remote`) | +| `quack_query(…, quack_discover())` hangs | Discover locally or via known hostname — not via remote quack_query | +| Kernel TCP to `100.x:9494` fails from tsnet client | Use `tailscale_quack_forward` | + +--- + +## Runnable demos + +| Demo | Command | Proves | +|------|---------|--------| +| **Quack two-node cluster** | [examples/README.md](../examples/README.md) | tailnet + forward + Quack ATTACH | +| **DuckLake + Quack** | [examples/ducklake/README.md](../examples/ducklake/README.md) | Pattern B lake queries over tailnet | +| **Host DuckDB → compose stack** | `scripts/local_remote_headscale_test.sh` | Laptop joins same Headscale | +| **Vanilla tailnet probe** | `docker compose --profile debug run tailscale-probe` | Network vs DuckDB isolation | + +Quick start: + +```bash +cd examples +docker compose build quacktail-server quacktail-client +docker compose up -d headscale quacktail-server +docker compose --profile test run --rm quacktail-client # Quack + DuckLake on ducklake branch +``` + +--- + +## Sketch: multi-lake SaaS on one tailnet (future-friendly) + +```text + ┌─ sales.ducklake ── s3://tenant-a/sales/ +lake-server ────────┼─ metrics.ducklake ─ s3://tenant-a/metrics/ + quack_serve └─ archive.ducklake ─ s3://tenant-a/archive/ + ▲ + │ quack_query / ducklake:quack: + │ + ┌────┴────┬────────────┐ + │ │ │ + BI tool ETL job notebook + (Pattern C) (B) (B + A) +``` + +1. **ETL** (batch): `quack_query` to run server-side `COPY` / `INSERT` into lake tables. +2. **BI** (interactive): `ducklake:quack:` + `DATA_PATH` on S3 with read-only IAM. +3. **Ops** (live): Quack `ATTACH` to `remote.*` staging tables before lake merge. + +QuackScale roadmap for richer discovery: [PLAN.md](PLAN.md), [DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md). + +--- + +## Further reading + +| Doc | Topic | +|-----|--------| +| [README.md](../README.md) | Build, SQL reference | +| [AUTHENTICATION.md](AUTHENTICATION.md) | Tailscale keys, browser login | +| [HEADSCALE.md](HEADSCALE.md) | Self-hosted control plane | +| [QUACK_AUTH.md](QUACK_AUTH.md) | Shared Quack tokens | +| [QUACK_STREAMING.md](QUACK_STREAMING.md) | One remote op per statement | +| [DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md) | Lake-specific tailnet notes | +| [Quack overview](https://duckdb.org/docs/current/quack/overview) | Upstream Quack protocol | +| [DuckLake docs](https://duckdb.org/docs/stable/duckdb/ducklake) | Catalog, Parquet, attach | diff --git a/examples/Dockerfile b/examples/Dockerfile index 36fd618..83d96e9 100644 --- a/examples/Dockerfile +++ b/examples/Dockerfile @@ -75,7 +75,12 @@ ENV QUACK_PORT=9494 RUN mkdir -p /duckdb_extensions \ && if [ -d /opt/quacktail-build/quackscale-ext ]; then \ - cp -a /opt/quacktail-build/quackscale-ext/. /duckdb_extensions/; \ + cp -a /opt/quacktail-build/quackscale-ext/. /duckdb_extensions/ \ + && duckdb :memory: -batch -csv -noheader -c \ + "SET extension_directory='/duckdb_extensions'; LOAD quackscale; \ + SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='tailscale_down';" \ + | grep -qx 1 \ + || { echo "error: built quackscale missing tailscale_down (rebuild with --no-cache)" >&2; exit 1; }; \ fi \ && duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core; LOAD quack; SELECT 1;" \ || duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core_nightly; LOAD quack; SELECT 1;" \ diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index efc1aa2..ef2dc01 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -170,6 +170,7 @@ write_client_session_sql() { local attach_uri="${2:?attach uri required}" local ping_sql="" local forward_sql="" + local teardown_sql="" local lake_select="" local lake_passed_sql="" local lake_discover_sql="" @@ -181,6 +182,9 @@ write_client_session_sql() { elif duckdb_has_quackscale_function tailscale_quack_proxy; then forward_sql="CALL tailscale_quack_proxy();" fi + if duckdb_has_quackscale_function tailscale_down; then + teardown_sql="CALL tailscale_down();" + fi if [[ "$ENABLE_DUCKLAKE" == "1" ]]; then lake_discover_sql="SELECT 'DISCOVERED' AS status, '${attach_uri}' AS quack_uri, '${SERVER_HOST}' AS server_host;" lake_select="$(compose_sql_quack_query "$attach_uri" "SELECT * FROM ${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5")" @@ -233,9 +237,9 @@ FROM remote.e2e_payload; DETACH remote; -SELECT 'CLIENT_DEMO_DONE' AS status; +${teardown_sql} -CALL tailscale_down(); +SELECT 'CLIENT_DEMO_DONE' AS status; SQL if grep -q '\\n' "$WORK/client_session.sql" 2>/dev/null; then echo "error: generated client_session.sql contains literal \\n" >&2 @@ -345,7 +349,8 @@ if [[ -f "$WORK/server_setup.sql" && -f "$WORK/authkey" ]]; then || { [[ -f "$WORK/client_session.sql" ]] && grep -q 'ON CONFLICT' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'tailscale_quack_forward' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && grep -q '\\n' "$WORK/client_session.sql"; } \ - || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'CALL tailscale_down' "$WORK/client_session.sql"; } \ + || { [[ -f "$WORK/client_session.sql" ]] && grep -q 'CALL tailscale_down' "$WORK/client_session.sql" \ + && ! duckdb_has_quackscale_function tailscale_down; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'CLIENT_DEMO_DONE' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q 'DISCOVERED' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q "${LAKE_NAME}.inventory" "$WORK/client_session.sql"; }; then diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index 5c9bca8..fcaeadc 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -240,24 +240,22 @@ run_duckdb_client_session() { : >"$tsnet_log" : >"$out" - # Live tee + monitor: after CLIENT_DEMO_DONE, allow tailscale_down then SIGTERM/KILL if needed. + # Live tee + monitor: CLIENT_DEMO_DONE is last (after optional tailscale_down); then SIGTERM/KILL. set +o pipefail if [[ "$QUIET" == "1" ]]; then "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ -cmd "$ext_cmd" -f "$session_sql" \ - > >(tee "$out") 2>>"$tsnet_log" & + 2>&1 | tee "$out" "$tsnet_log" >/dev/null & else "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ -cmd "$ext_cmd" -f "$session_sql" \ - 2>&1 | tee "$out" & + 2>&1 | tee "$out" "$tsnet_log" & fi duck_pid=$! deadline=$((SECONDS + demo_timeout + 5)) while kill -0 "$duck_pid" 2>/dev/null; do if quacktail_client_session_succeeded "$out"; then - # CLIENT_DEMO_DONE is printed before CALL tailscale_down(); allow it to run. - sleep 1.5 quacktail_stop_process "$duck_pid" 500 set -o pipefail return 0 @@ -341,9 +339,16 @@ run_client() { quacktail_dump_client_failure exit 1 fi - if [[ "$duckdb_rc" -eq 0 ]] && grep -q "CLIENT_DEMO_DONE" "$out" 2>/dev/null; then + if quacktail_client_session_succeeded "$out"; then + duckdb_rc=0 break fi + if grep -q "PASSED" "$out" 2>/dev/null \ + && { [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" != "1" ]] || grep -q "LAKE_PASSED" "$out" 2>/dev/null; }; then + echo "error: demo passed but CLIENT_DEMO_DONE missing (teardown or exit failed)" >&2 + quacktail_dump_client_failure + exit 1 + fi if (( attempt < max_attempts )); then [[ "$QUIET" == "1" ]] && echo "→ retry ${attempt}/${max_attempts} ..." quacktail_dump_client_failure @@ -351,20 +356,8 @@ run_client() { fi done - if [[ "$duckdb_rc" -ne 0 ]]; then - echo "error: client demo failed (exit ${duckdb_rc})" >&2 - quacktail_dump_client_failure - exit 1 - fi - - if ! grep -q "PASSED" "$out" 2>/dev/null; then - echo "error: expected PASSED row missing after ${max_attempts} attempts" >&2 - quacktail_dump_client_failure - exit 1 - fi - - if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" ]] && ! grep -q "LAKE_PASSED" "$out" 2>/dev/null; then - echo "error: expected LAKE_PASSED row missing (DuckLake inventory query failed)" >&2 + if ! quacktail_client_session_succeeded "$out"; then + echo "error: client demo failed after ${max_attempts} attempt(s) (exit ${duckdb_rc})" >&2 quacktail_dump_client_failure exit 1 fi From d7c14969b0d1bfd1b9c212fd9c0f833d6172fa8b Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 08:39:46 +0200 Subject: [PATCH 12/25] Restore visible client demo output in quiet mode. Write DuckDB results to client.out during the background session, then print the filtered transcript before the success banner (same as before). Co-authored-by: Cursor --- scripts/e2e/quacktail-entrypoint.sh | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index fcaeadc..e71ef78 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -226,6 +226,12 @@ quacktail_stop_process() { wait "$pid" 2>/dev/null || true } +quacktail_show_client_demo_output() { + local out="${1:-${WORK}/client.out}" + [[ -s "$out" ]] || return 0 + quacktail_filter_demo_stream <"$out" +} + run_duckdb_client_session() { local session_sql="${1:?session sql file}" local out="${2:?out file}" @@ -240,16 +246,16 @@ run_duckdb_client_session() { : >"$tsnet_log" : >"$out" - # Live tee + monitor: CLIENT_DEMO_DONE is last (after optional tailscale_down); then SIGTERM/KILL. + # Background duckdb → client.out; monitor file for CLIENT_DEMO_DONE then SIGTERM/KILL. set +o pipefail if [[ "$QUIET" == "1" ]]; then "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ -cmd "$ext_cmd" -f "$session_sql" \ - 2>&1 | tee "$out" "$tsnet_log" >/dev/null & + >"$out" 2>&1 & else "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ -cmd "$ext_cmd" -f "$session_sql" \ - 2>&1 | tee "$out" "$tsnet_log" & + 2>&1 | tee "$out" & fi duck_pid=$! deadline=$((SECONDS + demo_timeout + 5)) @@ -363,6 +369,8 @@ run_client() { fi if [[ "$QUIET" == "1" ]]; then + quacktail_show_client_demo_output "$out" + echo "" if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" ]]; then echo "✓ Demo passed — QuackTail cluster + DuckLake over tailnet" else From 58275be9771ec2e03c2330d1f589d566082f29a7 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 09:02:56 +0200 Subject: [PATCH 13/25] Add attach_ducklake for transparent remote DuckLake reads. Discover server-side lake tables via quack_query and create local views so clients query lake.* without hand-written wrappers or client DATA_PATH. Compose bootstrap prefers attach_ducklake when the function is available. Co-authored-by: Cursor --- CMakeLists.txt | 2 +- README.md | 1 + docs/DUCKLAKE_REMOTE_ATTACH.md | 116 +++++++++++++ docs/README.md | 5 +- docs/usage.md | 3 +- scripts/e2e/quacktail-compose-bootstrap.sh | 15 +- src/attach_ducklake.cpp | 180 +++++++++++++++++++++ src/include/attach_ducklake.hpp | 9 ++ src/quackscale_extension.cpp | 3 + test/sql/quackscale.test | 5 + 10 files changed, 333 insertions(+), 6 deletions(-) create mode 100644 docs/DUCKLAKE_REMOTE_ATTACH.md create mode 100644 src/attach_ducklake.cpp create mode 100644 src/include/attach_ducklake.hpp diff --git a/CMakeLists.txt b/CMakeLists.txt index 8de0b79..cadc899 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -14,7 +14,7 @@ set(CMAKE_CXX_STANDARD_REQUIRED ON) include_directories(src/include) -set(EXTENSION_SOURCES src/quackscale_extension.cpp src/tailscale_bridge.cpp src/tailscale_forwarder.cpp src/tailscale_log_capture.cpp) +set(EXTENSION_SOURCES src/quackscale_extension.cpp src/attach_ducklake.cpp src/tailscale_bridge.cpp src/tailscale_forwarder.cpp src/tailscale_log_capture.cpp) if(QUACKSCALE_WITH_TAILSCALE) include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/Libtailscale.cmake) diff --git a/README.md b/README.md index 86ef4d4..363f20b 100644 --- a/README.md +++ b/README.md @@ -176,6 +176,7 @@ Load with `LOAD quackscale;`. Use **`CALL`** for table functions (same style as | `CALL tailscale_status()` | libtailscale linked?, running, hostname, tailnet IPs | | `CALL tailscale_quack_forward(host => 'peer', port => 9494)` | Localhost TCP → `tailscale_dial` (preferred for Quack ATTACH; no ALL_PROXY) | | `CALL tailscale_down()` | Stop forwarder + close tsnet (one-shot clients — required or process hangs) | +| `CALL attach_ducklake(uri, …)` | Create local views over a remote DuckLake catalog (server-owned Parquet) — see [DUCKLAKE_REMOTE_ATTACH.md](docs/DUCKLAKE_REMOTE_ATTACH.md) | | `CALL tailscale_quack_proxy()` | Legacy SOCKS + ALL_PROXY | | `CALL tailscale_proxy_status()` | Legacy SOCKS status | diff --git a/docs/DUCKLAKE_REMOTE_ATTACH.md b/docs/DUCKLAKE_REMOTE_ATTACH.md new file mode 100644 index 0000000..4492695 --- /dev/null +++ b/docs/DUCKLAKE_REMOTE_ATTACH.md @@ -0,0 +1,116 @@ +# Remote DuckLake attach (server-owned Parquet) + +Goal: query **`lake.inventory`** on a tailnet client with normal SQL — no hand-written `quack_query(...)` — when the DuckLake catalog and Parquet files live **only on the server**. + +Existing patterns stay supported; this adds an optional facade. + +## Why the obvious attaches fail + +| Approach | What happens | +|----------|----------------| +| `ATTACH 'quack:…' AS remote` | Primary catalog only. **`remote.lake.*` does not exist.** | +| `ATTACH 'ducklake:quack:…' AS lake (DATA_PATH '…')` | Catalog metadata over Quack; **Parquet reads use client `DATA_PATH`**. Compose demo stores files on the server volume only → hang or empty reads. | +| `quack_query(uri, 'SELECT … FROM lake.t')` | **Works** — SQL runs on server where DuckLake is attached. Verbose and easy to get wrong (quoting, session ordering). | + +Reference: [DuckDB 1.5.3 DuckLake + Quack](https://duckdb.org/2026/05/20/announcing-duckdb-153.html), [usage.md](usage.md) patterns B/C. + +## Design tiers + +### Tier 1 — `quack_query` (today, unchanged) + +```sql +FROM quack_query( + 'quack:127.0.0.1:19494', + 'SELECT * FROM lake.inventory', + token => '…', + disable_ssl => true +); +``` + +### Tier 2 — `CALL attach_ducklake(...)` (**implemented**) + +One-time setup per session: discover remote tables via `quack_query`, create **local views** that delegate to the server. + +```sql +LOAD quack; +LOAD quackscale; + +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); + +CALL attach_ducklake( + 'quack:127.0.0.1:19494', + remote_catalog => 'lake', + alias => 'lake', + token => '…', + disable_ssl => true +); +-- → local_view | remote_table | status +-- lake.inventory | lake.inventory | created + +SELECT * FROM lake.inventory ORDER BY item_id; +SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM lake.inventory; +``` + +**How it works** + +1. `quack_query` → `duckdb_tables()` on the **server** for `database_name = remote_catalog` +2. For each table: `CREATE OR REPLACE VIEW alias.table AS FROM quack_query(..., 'SELECT * FROM lake.table', ...)` +3. Client reads look like normal SQL; execution still happens on the server (server reads its Parquet) + +**Limits (Tier 2)** + +- No predicate/column pushdown (each scan is `SELECT *` on the server unless you add filters manually) +- Inserts/updates/deletes not supported through views (read path only for now) +- Re-run `CALL attach_ducklake` after server schema changes to refresh views +- Still obeys [Quack streaming-scan rules](QUACK_STREAMING.md) for attached `quack:` catalogs in the **same** statement; view scans use `quack_query`, not Quack attach scans + +### Tier 3 — `ATTACH … TYPE quacktail_lake` (planned) + +Register a **StorageExtension** in quackscale so attach syntax is native: + +```sql +ATTACH 'quacktail-lake:quack:127.0.0.1:19494/lake' AS lake ( + TYPE quacktail_lake, + TOKEN '…', + DISABLE_SSL true +); +SELECT * FROM lake.inventory; +``` + +Requires a custom `Catalog` + table scan operator that issues remote reads (same transport as Tier 2, better planner integration and room for pushdown later). Does not need client `DATA_PATH`. + +**Not implemented yet** — Tier 2 unblocks compose and docs without DuckDB catalog internals. + +## Comparison + +| | `quack_query` | Tier 2 views | `ducklake:quack:` | Tier 3 attach | +|--|---------------|--------------|-------------------|---------------| +| Server-owned Parquet | Yes | Yes | No (needs shared path) | Yes | +| SQL ergonomics | Poor | Good | Best (when paths align) | Best | +| Pushdown | N/A | No | Yes (local Parquet) | Possible | +| DML | Via wrapped SQL | Read-only | Full DuckLake | Planned | +| Changes quack/ducklake | No | No | No | No | + +## Compose demo + +When the built quackscale includes `attach_ducklake`, bootstrap generates: + +```sql +CALL attach_ducklake('quack:127.0.0.1:19494', …); +SELECT * FROM lake.inventory …; +``` + +Older images fall back to `quack_query` automatically. + +## Session order (unchanged) + +1. `tailscale_up` → `tailscale_quack_forward` +2. `LOAD quack` + secret +3. **Lake reads** (Tier 2 or `quack_query`) **before** `ATTACH quack AS remote` if mixing with e2e attach in one session +4. `DETACH remote` + `tailscale_down` for one-shot clients + +## Related + +- [DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md) — tailnet-specific notes +- [usage.md](usage.md) — patterns A–D +- `ducklake_discover()` — planned enriched discovery (hostname + lake catalogs) diff --git a/docs/README.md b/docs/README.md index cf78406..3ea18f3 100644 --- a/docs/README.md +++ b/docs/README.md @@ -8,8 +8,9 @@ 4. **[HEADSCALE.md](HEADSCALE.md)** — self-hosted [Headscale](https://github.com/juanfont/headscale) (`control_url`, preauth keys) 5. **[QUACK_AUTH.md](QUACK_AUTH.md)** — Quack tokens for QuackTail (`QUACK_TAILNET_TOKEN`, shared secrets, auth macros) 6. **[DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md)** — DuckLake over the tailnet (compose demo, patterns B/C) -7. **[PLAN.md](PLAN.md)** — architecture, API roadmap, risks -8. **[../examples/README.md](../examples/README.md)** — Docker Compose two-node Headscale demo +7. **[DUCKLAKE_REMOTE_ATTACH.md](DUCKLAKE_REMOTE_ATTACH.md)** — transparent remote lake reads (`attach_ducklake`) +8. **[PLAN.md](PLAN.md)** — architecture, API roadmap, risks +9. **[../examples/README.md](../examples/README.md)** — Docker Compose two-node Headscale demo ## QuackTail authentication at a glance diff --git a/docs/usage.md b/docs/usage.md index baa3844..ad76456 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -83,7 +83,8 @@ Both operational + lake on one node? | Pattern | Client SQL | Parquet location | Best for | |---------|------------|------------------|----------| | **A — Quack attach** | `ATTACH 'quack:host:9494' AS remote` | Server DuckDB file / memory | Shared tables, multi-writer Quack | -| **B — quack_query lake** | `quack_query(uri, 'SELECT … FROM lake.t')` | Server-only paths | Compose demo, server-side lake | +| **B — quack_query lake** | `quack_query(uri, 'SELECT … FROM lake.t')` | Server-only paths | Compose demo fallback | +| **B+ — remote lake views** | `CALL attach_ducklake(...)` then `SELECT … FROM lake.t` | Server-only paths | **Preferred** when quackscale ≥ ducklake branch ([DUCKLAKE_REMOTE_ATTACH.md](DUCKLAKE_REMOTE_ATTACH.md)) | | **C — ducklake:quack** | `ATTACH 'ducklake:quack:host' AS lake (DATA_PATH '…')` | Shared object store or mount | Fleet of readers, [DuckDB 1.5.3 pattern](https://duckdb.org/2026/05/20/announcing-duckdb-153.html) | | **D — Hybrid** | B first, then A (separate statements) | Mixed | Apps + analytics on one tailnet node | diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index ef2dc01..da61e11 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -171,6 +171,7 @@ write_client_session_sql() { local ping_sql="" local forward_sql="" local teardown_sql="" + local lake_attach_sql="" local lake_select="" local lake_passed_sql="" local lake_discover_sql="" @@ -187,8 +188,14 @@ write_client_session_sql() { fi if [[ "$ENABLE_DUCKLAKE" == "1" ]]; then lake_discover_sql="SELECT 'DISCOVERED' AS status, '${attach_uri}' AS quack_uri, '${SERVER_HOST}' AS server_host;" - lake_select="$(compose_sql_quack_query "$attach_uri" "SELECT * FROM ${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5")" - lake_passed_sql="$(compose_sql_quack_query "$attach_uri" "SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM ${LAKE_NAME}.inventory")" + if duckdb_has_quackscale_function attach_ducklake; then + lake_attach_sql="CALL attach_ducklake('${attach_uri}', remote_catalog => '${LAKE_NAME}', alias => '${LAKE_NAME}', token => '${QUACK_TOKEN}', disable_ssl => true);" + lake_select="SELECT * FROM ${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5;" + lake_passed_sql="SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM ${LAKE_NAME}.inventory;" + else + lake_select="$(compose_sql_quack_query "$attach_uri" "SELECT * FROM ${LAKE_NAME}.inventory ORDER BY item_id LIMIT 5")" + lake_passed_sql="$(compose_sql_quack_query "$attach_uri" "SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM ${LAKE_NAME}.inventory")" + fi fi cat >"$WORK/client_session.sql" < + +namespace duckdb { + +namespace { + +static constexpr const char *kIdentPattern = "^[A-Za-z_][A-Za-z0-9_]*$"; + +static void ValidateIdentifier(const string &name, const char *label) { + std::regex re(kIdentPattern); + if (!std::regex_match(name, re)) { + throw InvalidInputException("%s must match %s (got '%s')", label, kIdentPattern, name); + } +} + +static string EscapeSqlString(const string &value) { + return StringUtil::Replace(value, "'", "''"); +} + +static string BuildQuackQueryFromClause(const string &quack_uri, const string &remote_sql, const string &token, + bool disable_ssl) { + string sql = "FROM quack_query('" + EscapeSqlString(quack_uri) + "', '" + EscapeSqlString(remote_sql) + "'"; + if (!token.empty()) { + sql += ", token => '" + EscapeSqlString(token) + "'"; + } + if (disable_ssl) { + sql += ", disable_ssl => true"; + } + sql += ")"; + return sql; +} + +static void EnsureQuackLoaded(Connection &conn) { + auto result = conn.Query("SELECT COUNT(*) FROM duckdb_functions() WHERE function_name = 'quack_query'"); + if (result->HasError()) { + throw InvalidInputException("attach_ducklake requires LOAD quack; %s", result->GetError()); + } + auto count = result->GetValue(0, 0).GetValue(); + if (count == 0) { + throw InvalidInputException("attach_ducklake requires LOAD quack (quack_query not registered)"); + } +} + +static void RunStatement(Connection &conn, const string &sql) { + auto result = conn.Query(sql); + if (result->HasError()) { + throw InvalidInputException("attach_ducklake failed: %s\nStatement: %s", result->GetError(), sql); + } +} + +struct RemoteLakeAttachBindData : public TableFunctionData { + string quack_uri; + string remote_catalog; + string alias; + string token; + bool disable_ssl = true; + bool finished = false; + vector created_views; +}; + +static unique_ptr RemoteLakeAttachBind(ClientContext &context, TableFunctionBindInput &input, + vector &return_types, vector &names) { + if (input.inputs.empty() || input.inputs[0].IsNull()) { + throw InvalidInputException("attach_ducklake requires quack_uri"); + } + + auto bind = make_uniq(); + bind->quack_uri = input.inputs[0].GetValue(); + + auto catalog_it = input.named_parameters.find("remote_catalog"); + if (catalog_it != input.named_parameters.end()) { + bind->remote_catalog = catalog_it->second.GetValue(); + } else { + bind->remote_catalog = "lake"; + } + auto alias_it = input.named_parameters.find("alias"); + if (alias_it != input.named_parameters.end()) { + bind->alias = alias_it->second.GetValue(); + } else { + bind->alias = bind->remote_catalog; + } + auto token_it = input.named_parameters.find("token"); + if (token_it != input.named_parameters.end() && !token_it->second.IsNull()) { + bind->token = token_it->second.GetValue(); + } + auto ssl_it = input.named_parameters.find("disable_ssl"); + if (ssl_it != input.named_parameters.end()) { + bind->disable_ssl = ssl_it->second.GetValue(); + } + + ValidateIdentifier(bind->remote_catalog, "remote_catalog"); + ValidateIdentifier(bind->alias, "alias"); + + return_types = {LogicalType::VARCHAR, LogicalType::VARCHAR, LogicalType::VARCHAR}; + names = {"local_view", "remote_table", "status"}; + return std::move(bind); +} + +static void RemoteLakeAttachFunction(ClientContext &context, TableFunctionInput &data_p, DataChunk &output) { + auto &bind = data_p.bind_data->CastNoConst(); + if (bind.finished) { + return; + } + + Connection conn(*context.db); + EnsureQuackLoaded(conn); + + RunStatement(conn, "CREATE SCHEMA IF NOT EXISTS " + bind.alias); + + const string list_sql = StringUtil::Format( + "SELECT table_name FROM duckdb_tables() WHERE database_name = '%s' ORDER BY table_name", + EscapeSqlString(bind.remote_catalog)); + const auto list_from = BuildQuackQueryFromClause(bind.quack_uri, list_sql, bind.token, bind.disable_ssl); + + auto tables = conn.Query(list_from); + if (tables->HasError()) { + throw InvalidInputException("attach_ducklake: could not list remote tables: %s", + tables->GetError()); + } + + idx_t row_count = 0; + for (idx_t row = 0; row < tables->RowCount(); row++) { + auto table_name = tables->GetValue(0, row).ToString(); + if (table_name.empty()) { + continue; + } + ValidateIdentifier(table_name, "remote table name"); + + const string remote_select = + StringUtil::Format("SELECT * FROM %s.%s", bind.remote_catalog, table_name); + const string view_sql = StringUtil::Format( + "CREATE OR REPLACE VIEW %s.%s AS %s", bind.alias, table_name, + BuildQuackQueryFromClause(bind.quack_uri, remote_select, bind.token, bind.disable_ssl)); + + RunStatement(conn, view_sql); + bind.created_views.push_back(bind.alias + "." + table_name); + row_count++; + } + + if (row_count == 0) { + throw InvalidInputException( + "attach_ducklake: no tables found in remote catalog '%s' (is DuckLake attached on the server?)", + bind.remote_catalog); + } + + output.SetCardinality(row_count); + for (idx_t row = 0; row < row_count; row++) { + const auto &view_name = bind.created_views[row]; + const auto dot = view_name.find('.'); + const string table_only = dot == string::npos ? view_name : view_name.substr(dot + 1); + output.SetValue(0, row, Value(view_name)); + output.SetValue(1, row, Value(StringUtil::Format("%s.%s", bind.remote_catalog, table_only))); + output.SetValue(2, row, Value("created")); + } + + bind.finished = true; +} + +} // namespace + +void RegisterAttachDucklakeFunctions(ExtensionLoader &loader) { + TableFunction attach("attach_ducklake", {LogicalType::VARCHAR}, RemoteLakeAttachFunction, + RemoteLakeAttachBind); + attach.named_parameters["remote_catalog"] = LogicalType::VARCHAR; + attach.named_parameters["alias"] = LogicalType::VARCHAR; + attach.named_parameters["token"] = LogicalType::VARCHAR; + attach.named_parameters["disable_ssl"] = LogicalType::BOOLEAN; + loader.RegisterFunction(attach); +} + +} // namespace duckdb diff --git a/src/include/attach_ducklake.hpp b/src/include/attach_ducklake.hpp new file mode 100644 index 0000000..8281719 --- /dev/null +++ b/src/include/attach_ducklake.hpp @@ -0,0 +1,9 @@ +#pragma once + +#include "duckdb/main/extension/extension_loader.hpp" + +namespace duckdb { + +void RegisterAttachDucklakeFunctions(ExtensionLoader &loader); + +} // namespace duckdb diff --git a/src/quackscale_extension.cpp b/src/quackscale_extension.cpp index 4ccd72b..711c0ee 100644 --- a/src/quackscale_extension.cpp +++ b/src/quackscale_extension.cpp @@ -2,6 +2,7 @@ #include "quackscale_extension.hpp" #include "quackscale_defaults.hpp" +#include "attach_ducklake.hpp" #include "tailscale_bridge.hpp" #include "duckdb.hpp" @@ -526,6 +527,8 @@ static void LoadInternal(ExtensionLoader &loader) { loader.RegisterFunction(ScalarFunction("quack_uri", {}, LogicalType::VARCHAR, QuackscaleQuackUriFunction)); loader.RegisterFunction(ScalarFunction("quack_token", {}, LogicalType::VARCHAR, QuackTokenFunction)); + + RegisterAttachDucklakeFunctions(loader); } } // namespace diff --git a/test/sql/quackscale.test b/test/sql/quackscale.test index fad793b..5d1e775 100644 --- a/test/sql/quackscale.test +++ b/test/sql/quackscale.test @@ -51,3 +51,8 @@ statement error SELECT quack_token(); ---- quack_token(): set QUACK_TAILNET_TOKEN + +statement error +CALL attach_ducklake('quack:127.0.0.1:19494'); +---- +attach_ducklake requires LOAD quack From 9ec3825456431436542f43d75e4ff8e3fe6533e4 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 09:10:53 +0200 Subject: [PATCH 14/25] Fix double demo run and stale client SQL refresh. Add QUACKTAIL_ROLE=bootstrap to refresh /work without running DuckDB; detect missing attach_ducklake/tailscale_down in volume SQL; verify attach_ducklake at image build time. Co-authored-by: Cursor --- examples/Dockerfile | 6 ++-- examples/README.md | 31 ++++++++++++++------ scripts/e2e/quacktail-compose-bootstrap.sh | 19 +++++++++---- scripts/e2e/quacktail-entrypoint.sh | 33 ++++++++++++++++++++-- 4 files changed, 70 insertions(+), 19 deletions(-) diff --git a/examples/Dockerfile b/examples/Dockerfile index 83d96e9..2a061db 100644 --- a/examples/Dockerfile +++ b/examples/Dockerfile @@ -78,9 +78,9 @@ RUN mkdir -p /duckdb_extensions \ cp -a /opt/quacktail-build/quackscale-ext/. /duckdb_extensions/ \ && duckdb :memory: -batch -csv -noheader -c \ "SET extension_directory='/duckdb_extensions'; LOAD quackscale; \ - SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='tailscale_down';" \ - | grep -qx 1 \ - || { echo "error: built quackscale missing tailscale_down (rebuild with --no-cache)" >&2; exit 1; }; \ + SELECT COUNT(*) FROM duckdb_functions() WHERE function_name IN ('tailscale_down', 'attach_ducklake');" \ + | grep -qx '2' \ + || { echo "error: built quackscale missing tailscale_down or attach_ducklake (rebuild with --no-cache)" >&2; exit 1; }; \ fi \ && duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core; LOAD quack; SELECT 1;" \ || duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core_nightly; LOAD quack; SELECT 1;" \ diff --git a/examples/README.md b/examples/README.md index 06136f4..384c10c 100644 --- a/examples/README.md +++ b/examples/README.md @@ -31,13 +31,22 @@ Quack HTTP uses **kernel TCP**. Embedded tsnet does not route that traffic. `tai ```bash git pull && cd examples -docker compose build quacktail-server quacktail-client +docker compose build --no-cache quacktail-server quacktail-client docker compose up -d --force-recreate headscale quacktail-server docker compose --profile test run --rm quacktail-client ``` Use **`--force-recreate`** on the server after script or SQL changes (otherwise the old DuckDB process keeps running). +**Refresh stale `/work` SQL without running the client demo** (one container, no DuckDB session): + +```bash +docker compose run --rm -e QUACKTAIL_ROLE=bootstrap quacktail-client +docker compose --profile test run --rm quacktail-client +``` + +Do **not** use `quacktail-client true` — compose sets `QUACKTAIL_ROLE=client`, so that still runs the full demo. + **Release binary instead of source build:** ```bash @@ -61,17 +70,21 @@ Expect: QuackTail cluster demo ====================== -→ join tailnet, tailscale_ping quacktail-server:9494, quack_query, ATTACH quack:127.0.0.1:19494 ... - -CALL tailscale_up(...); → running true -CALL tailscale_quack_forward(...); → active true, quack:127.0.0.1:19494 -CALL tailscale_ping(...); → reachable true -FROM quack_query(...); → probe 1 +→ join tailnet, forward, attach_ducklake, ATTACH quack:127.0.0.1:19494 ... + +CALL tailscale_up(...); → running true +CALL tailscale_quack_forward(...); → quack:127.0.0.1:19494 +CALL tailscale_ping(...); → reachable true +FROM quack_query(...); → probe 1 +CALL attach_ducklake(...); → lake.inventory view created +SELECT * FROM lake.inventory ...; +SELECT 'LAKE_PASSED' ...; ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); -SELECT * FROM remote.e2e_payload LIMIT 5; SELECT 'PASSED' ...; +CALL tailscale_down(); +SELECT 'CLIENT_DEMO_DONE' ...; -✓ Demo passed — two-node QuackTail cluster is working +✓ Demo passed — QuackTail cluster + DuckLake over tailnet ``` The client runs one DuckDB session (`duckdb -batch -echo -f /work/client_session.sql`). Compose waits for `quacktail-server` **healthy** (server.log shows `quack_serve` + `tailscale_serve_local`) before starting the client. diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index da61e11..eedec93 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -3,6 +3,7 @@ set -euo pipefail WORK="${QUACKTAIL_WORK:-/work}" +DUCKDB_BIN="${DUCKDB_BIN:-/usr/local/bin/duckdb}" SERVER_HOST="${SERVER_HOST:-quacktail-server}" CLIENT_HOST="${CLIENT_HOST:-quacktail-client}" QUACK_PORT="${QUACK_PORT:-9494}" @@ -50,11 +51,13 @@ resolve_client_attach_uri() { duckdb_has_quackscale_function() { local fn="$1" - local ext_dir="/duckdb_extensions" - command -v duckdb >/dev/null 2>&1 \ - && duckdb :memory: -batch -csv -noheader -c \ - "SET extension_directory='${ext_dir}'; LOAD quackscale; SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='${fn}';" \ - 2>/dev/null | grep -qx '1' + local ext_dir="${DUCKDB_EXTENSION_DIRECTORY:-/duckdb_extensions}" + local count + count="$("$DUCKDB_BIN" :memory: -batch -csv -noheader -c \ + "SET extension_directory='${ext_dir}'; LOAD quackscale; \ + SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='${fn}';" \ + 2>/dev/null | tr -d '[:space:]')" + [[ "$count" == "1" ]] } compose_attach_is_local_forward() { @@ -359,11 +362,17 @@ if [[ -f "$WORK/server_setup.sql" && -f "$WORK/authkey" ]]; then || { [[ -f "$WORK/client_session.sql" ]] && grep -q '\\n' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && grep -q 'CALL tailscale_down' "$WORK/client_session.sql" \ && ! duckdb_has_quackscale_function tailscale_down; } \ + || { [[ -f "$WORK/client_session.sql" ]] && duckdb_has_quackscale_function tailscale_down \ + && ! grep -q 'CALL tailscale_down' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'CLIENT_DEMO_DONE' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q 'DISCOVERED' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && grep -q 'quacktail_attach_remote_lake' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && duckdb_has_quackscale_function attach_ducklake \ && ! grep -q 'attach_ducklake' "$WORK/client_session.sql"; } \ + || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && duckdb_has_quackscale_function attach_ducklake \ + && grep -q 'FROM quack_query' "$WORK/client_session.sql" \ + && grep -q "${LAKE_NAME}.inventory" "$WORK/client_session.sql" \ + && ! grep -q 'attach_ducklake' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q "${LAKE_NAME}.inventory" "$WORK/client_session.sql"; }; then refresh_client_sql "$AUTHKEY" echo "✓ client SQL ready — attach ${ATTACH_URI}" diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index e71ef78..230d92c 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -295,6 +295,34 @@ run_duckdb_client_session() { return "$duckdb_rc" } +run_bootstrap() { + if [[ ! -f "${WORK}/authkey" ]]; then + echo "error: ${WORK}/authkey missing — start headscale + quacktail-server first" >&2 + exit 1 + fi + if [[ "$QUIET" == "1" ]]; then + echo "→ refreshing /work SQL on volume (no client demo) ..." + fi + COMPOSE_REFRESH_CLIENT_SQL=1 COMPOSE_REFRESH_SERVER_QUACK=1 \ + QUACKTAIL_AUTO_BOOTSTRAP=1 /usr/local/bin/quacktail-compose-bootstrap.sh + if [[ "$QUIET" == "1" ]]; then + echo "✓ bootstrap complete — run: docker compose --profile test run --rm quacktail-client" + else + echo "ok: bootstrap complete" + fi +} + +client_demo_banner() { + local session_sql="${1:?session sql}" + local attach_uri="${2:?attach uri}" + if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" ]] \ + && grep -q 'attach_ducklake' "$session_sql" 2>/dev/null; then + echo "→ join tailnet, forward, attach_ducklake, ATTACH ${attach_uri} ..." + else + echo "→ join tailnet, tailscale_ping ${SERVER_HOST}:${PORT}, quack_query, ATTACH ${attach_uri} ..." + fi +} + run_client() { local session_sql="${WORK}/client_session.sql" local out="${WORK}/client.out" @@ -326,7 +354,7 @@ run_client() { echo "" echo "QuackTail cluster demo" echo "======================" - echo "→ join tailnet, tailscale_ping ${SERVER_HOST}:${PORT}, quack_query, ATTACH ${attach_uri} ..." + client_demo_banner "$session_sql" "$attach_uri" echo "" else echo "=== client session SQL (-f) ===" @@ -384,5 +412,6 @@ run_client() { case "$ROLE" in server) run_server ;; client) run_client ;; - *) echo "error: unknown QUACKTAIL_ROLE '$ROLE'" >&2; exit 1 ;; + bootstrap) run_bootstrap ;; + *) echo "error: unknown QUACKTAIL_ROLE '$ROLE' (use server, client, or bootstrap)" >&2; exit 1 ;; esac From b0bce06d31492197b4882684da32e7ac267d090c Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 09:20:44 +0200 Subject: [PATCH 15/25] Regenerate client SQL from image on each client run. Stop the server from writing stale client_session.sql to the shared volume; always refresh client SQL on the client container; print quackscale caps so BUILD_FROM_SOURCE=0 / v1.0.2 release gaps are obvious. Co-authored-by: Cursor --- examples/README.md | 5 +- examples/docker-compose.yml | 2 + scripts/e2e/quacktail-compose-bootstrap.sh | 95 ++++++++++++---------- scripts/e2e/quacktail-entrypoint.sh | 18 +++- scripts/lib/quacktail_ext.sh | 19 +++++ 5 files changed, 92 insertions(+), 47 deletions(-) diff --git a/examples/README.md b/examples/README.md index 384c10c..94a22b4 100644 --- a/examples/README.md +++ b/examples/README.md @@ -29,6 +29,8 @@ Quack HTTP uses **kernel TCP**. Embedded tsnet does not route that traffic. `tai ## Run the demo +**Requires source-built images** (`BUILD_FROM_SOURCE=1`, compose default) for `attach_ducklake` and `tailscale_down`. Release tag `v1.0.2` does not include them. + ```bash git pull && cd examples docker compose build --no-cache quacktail-server quacktail-client @@ -66,7 +68,8 @@ Expect: ```text → waiting for quacktail-server on tailnet ... ✓ quacktail-server on tailnet -✓ client SQL ready — attach quack:127.0.0.1:19494 +✓ client SQL ready — attach quack:127.0.0.1:19494 (lake: attach_ducklake) +→ quackscale: lake=attach_ducklake tailscale_down=yes QuackTail cluster demo ====================== diff --git a/examples/docker-compose.yml b/examples/docker-compose.yml index 8cfb195..09e61fa 100644 --- a/examples/docker-compose.yml +++ b/examples/docker-compose.yml @@ -145,6 +145,7 @@ services: QUACKTAIL_WORK: /work QUACKTAIL_ROLE: server QUACKTAIL_AUTO_BOOTSTRAP: "1" + QUACKTAIL_MANAGE_CLIENT_SQL: "0" HEADSCALE_CONFIG: /etc/headscale/config.yaml healthcheck: test: [CMD, /usr/local/bin/quacktail-server-healthcheck.sh] @@ -167,6 +168,7 @@ services: <<: *env QUACKTAIL_WORK: /work QUACKTAIL_ROLE: client + QUACKTAIL_MANAGE_CLIENT_SQL: "1" QUACKTAIL_WAIT_SERVER: quacktail-server HEADSCALE_CONFIG: /etc/headscale/config.yaml restart: "no" diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index eedec93..2ad8cc8 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -4,6 +4,14 @@ set -euo pipefail WORK="${QUACKTAIL_WORK:-/work}" DUCKDB_BIN="${DUCKDB_BIN:-/usr/local/bin/duckdb}" + +# shellcheck source=/dev/null +source /usr/local/lib/quacktail_ext.sh + +duckdb_has_quackscale_function() { + quacktail_has_quackscale_function "$@" +} + SERVER_HOST="${SERVER_HOST:-quacktail-server}" CLIENT_HOST="${CLIENT_HOST:-quacktail-client}" QUACK_PORT="${QUACK_PORT:-9494}" @@ -29,6 +37,16 @@ mkdir -p "$WORK" CLIENT_STATE_DIR="/tmp/client-tailscale" ATTACH_URI="quack:${SERVER_HOST}:${QUACK_PORT}" +client_sql_lake_mode() { + if [[ "$ENABLE_DUCKLAKE" != "1" ]]; then + echo "off" + elif duckdb_has_quackscale_function attach_ducklake; then + echo "attach_ducklake" + else + echo "quack_query" + fi +} + resolve_server_tailnet_ip() { "${HS[@]}" nodes list 2>/dev/null | grep -F "$SERVER_HOST" | grep -oE '100\.64\.[0-9]+\.[0-9]+' | head -1 || true } @@ -38,28 +56,14 @@ resolve_attach_uri() { } resolve_client_attach_uri() { - local ext_dir="/duckdb_extensions" - if command -v duckdb >/dev/null 2>&1 \ - && duckdb :memory: -batch -csv -noheader -c \ - "SET extension_directory='${ext_dir}'; LOAD quackscale; SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='tailscale_quack_forward';" \ - 2>/dev/null | grep -qx '1'; then + local ext_dir="${DUCKDB_EXTENSION_DIRECTORY:-/duckdb_extensions}" + if duckdb_has_quackscale_function tailscale_quack_forward; then echo "quack:127.0.0.1:${QUACK_FORWARD_LOCAL_PORT}" else resolve_attach_uri fi } -duckdb_has_quackscale_function() { - local fn="$1" - local ext_dir="${DUCKDB_EXTENSION_DIRECTORY:-/duckdb_extensions}" - local count - count="$("$DUCKDB_BIN" :memory: -batch -csv -noheader -c \ - "SET extension_directory='${ext_dir}'; LOAD quackscale; \ - SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='${fn}';" \ - 2>/dev/null | tr -d '[:space:]')" - [[ "$count" == "1" ]] -} - compose_attach_is_local_forward() { case "$1" in quack:127.0.0.1:* | quack:localhost:* | quack:127.0.0.1 | quack:localhost) return 0 ;; @@ -201,6 +205,7 @@ write_client_session_sql() { fi fi cat >"$WORK/client_session.sql" <&2 @@ -304,7 +305,7 @@ run_bootstrap() { echo "→ refreshing /work SQL on volume (no client demo) ..." fi COMPOSE_REFRESH_CLIENT_SQL=1 COMPOSE_REFRESH_SERVER_QUACK=1 \ - QUACKTAIL_AUTO_BOOTSTRAP=1 /usr/local/bin/quacktail-compose-bootstrap.sh + QUACKTAIL_MANAGE_CLIENT_SQL=1 QUACKTAIL_AUTO_BOOTSTRAP=1 /usr/local/bin/quacktail-compose-bootstrap.sh if [[ "$QUIET" == "1" ]]; then echo "✓ bootstrap complete — run: docker compose --profile test run --rm quacktail-client" else @@ -323,6 +324,18 @@ client_demo_banner() { fi } +quacktail_log_quackscale_caps() { + [[ "$QUIET" == "1" ]] || return 0 + local lake_mode="quack_query" + local down="no" + quacktail_has_quackscale_function attach_ducklake && lake_mode="attach_ducklake" + quacktail_has_quackscale_function tailscale_down && down="yes" + echo "→ quackscale: lake=${lake_mode} tailscale_down=${down}" + if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" && "$lake_mode" == "quack_query" ]]; then + echo " (rebuild images with BUILD_FROM_SOURCE=1 for attach_ducklake — release v1.0.2 lacks it)" + fi +} + run_client() { local session_sql="${WORK}/client_session.sql" local out="${WORK}/client.out" @@ -342,6 +355,7 @@ run_client() { wait_for_tailnet_server ensure_quack + quacktail_log_quackscale_caps ensure_server_hosts_mapping ensure_client_sql attach_uri="$(client_attach_uri)" diff --git a/scripts/lib/quacktail_ext.sh b/scripts/lib/quacktail_ext.sh index d9a53c1..2ffcf18 100755 --- a/scripts/lib/quacktail_ext.sh +++ b/scripts/lib/quacktail_ext.sh @@ -12,6 +12,25 @@ quacktail_ext_container_dir() { echo "${QUACKTAIL_CONTAINER_EXT_DIR:-/duckdb_extensions}" } +quacktail_has_quackscale_function() { + local fn="${1:?function name required}" + local duckdb_bin="${DUCKDB_BIN:-/usr/local/bin/duckdb}" + local ext_dir="${DUCKDB_EXTENSION_DIRECTORY:-$(quacktail_ext_container_dir)}" + local count + [[ -x "$duckdb_bin" ]] || return 1 + count="$("$duckdb_bin" :memory: -batch -csv -noheader -c \ + "SET extension_directory='${ext_dir}'; LOAD quackscale; \ + SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='${fn}';" \ + 2>/dev/null | tr -d '[:space:]')" + if [[ "$count" == "1" ]]; then + return 0 + fi + count="$("$duckdb_bin" :memory: -batch -csv -noheader -c \ + "LOAD quackscale; SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='${fn}';" \ + 2>/dev/null | tr -d '[:space:]')" + [[ "$count" == "1" ]] +} + quacktail_ext_verify_artifact() { local install_path="${1:?install path}" if [[ -f "$install_path" ]]; then From 1578843f29918b3bea44f890dc3acde83728ba36 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 09:29:36 +0200 Subject: [PATCH 16/25] Fail Docker build unless attach_ducklake is in the image. Install quack/ducklake before copying quackscale to the nested extension path, verify at image build time, stamp build-info, and fail the client demo when release binaries (v1.0.2) lack attach_ducklake. Co-authored-by: Cursor --- examples/.env.example | 2 ++ examples/Dockerfile | 40 ++++++++++++++++++--------- examples/README.md | 11 +++++++- examples/docker-compose.yml | 2 ++ scripts/e2e/quacktail-entrypoint.sh | 20 +++++++++++++- scripts/e2e/quacktail-verify-image.sh | 27 ++++++++++++++++++ scripts/lib/quacktail_ext.sh | 33 +++++++++++++++------- 7 files changed, 110 insertions(+), 25 deletions(-) create mode 100644 scripts/e2e/quacktail-verify-image.sh diff --git a/examples/.env.example b/examples/.env.example index a5ccbb6..d54c5ab 100644 --- a/examples/.env.example +++ b/examples/.env.example @@ -6,8 +6,10 @@ SERVER_HOST=quacktail-server QUACK_PORT=9494 QUACK_FORWARD_LOCAL_PORT=19494 BUILD_FROM_SOURCE=1 +# Do not set BUILD_FROM_SOURCE=0 for DuckLake demo — v1.0.2 release lacks attach_ducklake. QUACKTAIL_RELEASE_TAG=v1.0.2 QUACKTAIL_ENABLE_DUCKLAKE=1 +QUACKTAIL_REQUIRE_ATTACH_DUCKLAKE=1 QUACKTAIL_LAKE_NAME=lake QUACKTAIL_LAKE_DATA_PATH=/var/lib/ducklake/data diff --git a/examples/Dockerfile b/examples/Dockerfile index 2a061db..3add590 100644 --- a/examples/Dockerfile +++ b/examples/Dockerfile @@ -3,6 +3,7 @@ # # BUILD_FROM_SOURCE=1 (default in compose): build DuckDB + quackscale from this repo. # BUILD_FROM_SOURCE=0: pull pinned GitHub release binary (QUACKTAIL_RELEASE_TAG, default v1.0.2). +# Release v1.0.2 lacks attach_ducklake/tailscale_down — DuckLake demo requires source build. ARG BUILD_FROM_SOURCE=1 ARG GITHUB_REPO=quackscience/duckdb-quackscale @@ -16,7 +17,6 @@ ARG GITHUB_REPO ARG QUACKTAIL_RELEASE_TAG ENV DEBIAN_FRONTEND=noninteractive -# stage-1 always COPY --from=builder /out/ — create it even when downloading a release. RUN mkdir -p /out RUN if [ "$BUILD_FROM_SOURCE" = "1" ]; then \ @@ -33,7 +33,13 @@ RUN if [ "$BUILD_FROM_SOURCE" = "1" ]; then \ git submodule update --init --recursive \ && GEN=ninja make release \ && install -m755 build/release/duckdb /out/duckdb \ - && cp -a build/release/extension/quackscale /out/quackscale-ext 2>/dev/null || true; \ + && test -f build/release/extension/quackscale/quackscale.duckdb_extension \ + && cp -a build/release/extension/quackscale /out/quackscale-ext \ + && git rev-parse HEAD > /out/git-rev \ + && echo "build_from_source=1" > /out/build-info; \ + else \ + echo "build_from_source=0" > /out/build-info \ + && echo "release_tag=${QUACKTAIL_RELEASE_TAG}" >> /out/build-info; \ fi FROM ubuntu:24.04 @@ -73,25 +79,33 @@ ENV DUCKDB_BIN=/usr/local/bin/duckdb ENV DUCKDB_EXTENSION_DIRECTORY=/duckdb_extensions ENV QUACK_PORT=9494 -RUN mkdir -p /duckdb_extensions \ - && if [ -d /opt/quacktail-build/quackscale-ext ]; then \ - cp -a /opt/quacktail-build/quackscale-ext/. /duckdb_extensions/ \ - && duckdb :memory: -batch -csv -noheader -c \ - "SET extension_directory='/duckdb_extensions'; LOAD quackscale; \ - SELECT COUNT(*) FROM duckdb_functions() WHERE function_name IN ('tailscale_down', 'attach_ducklake');" \ - | grep -qx '2' \ - || { echo "error: built quackscale missing tailscale_down or attach_ducklake (rebuild with --no-cache)" >&2; exit 1; }; \ - fi \ +# Install quack + ducklake first, then lay down our quackscale artifact (must not be overwritten). +RUN mkdir -p /duckdb_extensions /etc/quacktail \ && duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core; LOAD quack; SELECT 1;" \ || duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL quack FROM core_nightly; LOAD quack; SELECT 1;" \ && duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL ducklake FROM core; LOAD ducklake; SELECT 1;" \ - || duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL ducklake FROM core_nightly; LOAD ducklake; SELECT 1;" + || duckdb :memory: -batch -c "SET extension_directory='/duckdb_extensions'; INSTALL ducklake FROM core_nightly; LOAD ducklake; SELECT 1;" \ + && if [ -d /opt/quacktail-build/quackscale-ext ]; then \ + install -d /duckdb_extensions/quackscale; \ + install -m644 /opt/quacktail-build/quackscale-ext/quackscale.duckdb_extension \ + /duckdb_extensions/quackscale/quackscale.duckdb_extension; \ + fi \ + && if [ -f /opt/quacktail-build/build-info ]; then \ + cp /opt/quacktail-build/build-info /etc/quacktail/build-info; \ + fi \ + && if [ -f /opt/quacktail-build/git-rev ]; then \ + cp /opt/quacktail-build/git-rev /etc/quacktail/git-rev; \ + fi + +COPY scripts/lib/quacktail_ext.sh /usr/local/lib/quacktail_ext.sh +COPY scripts/e2e/quacktail-verify-image.sh /usr/local/bin/quacktail-verify-image.sh +RUN chmod +x /usr/local/bin/quacktail-verify-image.sh \ + && /usr/local/bin/quacktail-verify-image.sh COPY scripts/e2e/quacktail-entrypoint.sh /usr/local/bin/quacktail-entrypoint.sh COPY scripts/e2e/quacktail-compose-bootstrap.sh /usr/local/bin/quacktail-compose-bootstrap.sh COPY scripts/e2e/quacktail-server-run.sh /usr/local/bin/quacktail-server-run.sh COPY scripts/e2e/quacktail-server-healthcheck.sh /usr/local/bin/quacktail-server-healthcheck.sh -COPY scripts/lib/quacktail_ext.sh /usr/local/lib/quacktail_ext.sh RUN chmod +x /usr/local/bin/quacktail-entrypoint.sh /usr/local/bin/quacktail-compose-bootstrap.sh \ /usr/local/bin/quacktail-server-run.sh /usr/local/bin/quacktail-server-healthcheck.sh diff --git a/examples/README.md b/examples/README.md index 94a22b4..38ce8e2 100644 --- a/examples/README.md +++ b/examples/README.md @@ -29,7 +29,16 @@ Quack HTTP uses **kernel TCP**. Embedded tsnet does not route that traffic. `tai ## Run the demo -**Requires source-built images** (`BUILD_FROM_SOURCE=1`, compose default) for `attach_ducklake` and `tailscale_down`. Release tag `v1.0.2` does not include them. +**Requires source-built images** (`BUILD_FROM_SOURCE=1`, compose default). Release `v1.0.2` does **not** include `attach_ducklake` or `tailscale_down`. + +Verify the image before running the client demo: + +```bash +docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-client +docker compose run --rm --entrypoint cat quacktail-client /etc/quacktail/build-info +``` + +Expect `build_from_source=1` and `ok: quackscale image verify`. ```bash git pull && cd examples diff --git a/examples/docker-compose.yml b/examples/docker-compose.yml index 09e61fa..7feb915 100644 --- a/examples/docker-compose.yml +++ b/examples/docker-compose.yml @@ -30,6 +30,7 @@ x-env: &env CLIENT_HOST: quacktail-client DUCKDB_EXTENSION_DIRECTORY: /duckdb_extensions QUACKTAIL_ENABLE_DUCKLAKE: ${QUACKTAIL_ENABLE_DUCKLAKE:-1} + QUACKTAIL_REQUIRE_ATTACH_DUCKLAKE: ${QUACKTAIL_REQUIRE_ATTACH_DUCKLAKE:-1} QUACKTAIL_LAKE_NAME: ${QUACKTAIL_LAKE_NAME:-lake} QUACKTAIL_LAKE_METADATA: ${QUACKTAIL_LAKE_METADATA:-/var/lib/ducklake/metadata/inventory.ducklake} QUACKTAIL_LAKE_DATA_PATH: ${QUACKTAIL_LAKE_DATA_PATH:-/var/lib/ducklake/data} @@ -52,6 +53,7 @@ x-quacktail: &quacktail BUILD_FROM_SOURCE: ${BUILD_FROM_SOURCE:-1} GITHUB_REPO: ${GITHUB_REPO:-quackscience/duckdb-quackscale} QUACKTAIL_RELEASE_TAG: ${QUACKTAIL_RELEASE_TAG:-v1.0.2} + # DuckLake demo needs attach_ducklake — do not set BUILD_FROM_SOURCE=0 until a release includes it. cap_add: [NET_ADMIN] devices: [/dev/net/tun] <<: *headscale_volumes diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index ef556fd..963a581 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -328,14 +328,31 @@ quacktail_log_quackscale_caps() { [[ "$QUIET" == "1" ]] || return 0 local lake_mode="quack_query" local down="no" + if [[ -f /etc/quacktail/build-info ]]; then + echo "→ image: $(tr '\n' ' ' < /etc/quacktail/build-info)" + fi quacktail_has_quackscale_function attach_ducklake && lake_mode="attach_ducklake" quacktail_has_quackscale_function tailscale_down && down="yes" echo "→ quackscale: lake=${lake_mode} tailscale_down=${down}" if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" && "$lake_mode" == "quack_query" ]]; then - echo " (rebuild images with BUILD_FROM_SOURCE=1 for attach_ducklake — release v1.0.2 lacks it)" + echo " error: attach_ducklake missing — image was not built from source with current ducklake branch" >&2 + quacktail_list_quackscale_functions >&2 || true fi } +quacktail_require_attach_ducklake() { + [[ "${QUACKTAIL_REQUIRE_ATTACH_DUCKLAKE:-0}" == "1" ]] || return 0 + [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" ]] || return 0 + quacktail_has_quackscale_function attach_ducklake && return 0 + echo "error: attach_ducklake required but not in this image" >&2 + [[ -f /etc/quacktail/build-info ]] && cat /etc/quacktail/build-info >&2 + [[ -f /etc/quacktail/git-rev ]] && echo "git-rev: $(cat /etc/quacktail/git-rev)" >&2 + echo "Rebuild: cd examples && docker compose build --no-cache quacktail-client" >&2 + echo "Ensure BUILD_FROM_SOURCE=1 (release v1.0.2 does not include attach_ducklake)" >&2 + quacktail_list_quackscale_functions >&2 || true + exit 1 +} + run_client() { local session_sql="${WORK}/client_session.sql" local out="${WORK}/client.out" @@ -355,6 +372,7 @@ run_client() { wait_for_tailnet_server ensure_quack + quacktail_require_attach_ducklake quacktail_log_quackscale_caps ensure_server_hosts_mapping ensure_client_sql diff --git a/scripts/e2e/quacktail-verify-image.sh b/scripts/e2e/quacktail-verify-image.sh new file mode 100644 index 0000000..aa6b531 --- /dev/null +++ b/scripts/e2e/quacktail-verify-image.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# Verify quackscale in the container image (Dockerfile + runtime diagnostics). +set -euo pipefail + +DUCKDB_BIN="${DUCKDB_BIN:-/usr/local/bin/duckdb}" +EXT_DIR="${DUCKDB_EXTENSION_DIRECTORY:-/duckdb_extensions}" + +# shellcheck source=/dev/null +source /usr/local/lib/quacktail_ext.sh + +require_fn() { + local fn="$1" + if ! quacktail_has_quackscale_function "$fn"; then + echo "error: quackscale missing required function: ${fn}" >&2 + echo "extension_directory=${EXT_DIR}" >&2 + ls -la "$EXT_DIR" "$EXT_DIR/quackscale" 2>/dev/null >&2 || true + echo "registered quackscale functions:" >&2 + quacktail_list_quackscale_functions >&2 || true + [[ -f /etc/quacktail/build-info ]] && cat /etc/quacktail/build-info >&2 + exit 1 + fi +} + +require_fn attach_ducklake +require_fn tailscale_down +require_fn tailscale_quack_forward +echo "ok: quackscale image verify (attach_ducklake, tailscale_down, tailscale_quack_forward)" diff --git a/scripts/lib/quacktail_ext.sh b/scripts/lib/quacktail_ext.sh index 2ffcf18..5206ac2 100755 --- a/scripts/lib/quacktail_ext.sh +++ b/scripts/lib/quacktail_ext.sh @@ -16,21 +16,34 @@ quacktail_has_quackscale_function() { local fn="${1:?function name required}" local duckdb_bin="${DUCKDB_BIN:-/usr/local/bin/duckdb}" local ext_dir="${DUCKDB_EXTENSION_DIRECTORY:-$(quacktail_ext_container_dir)}" - local count + local out count [[ -x "$duckdb_bin" ]] || return 1 - count="$("$duckdb_bin" :memory: -batch -csv -noheader -c \ + out="$("$duckdb_bin" :memory: -batch -csv -noheader -c \ "SET extension_directory='${ext_dir}'; LOAD quackscale; \ - SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='${fn}';" \ - 2>/dev/null | tr -d '[:space:]')" - if [[ "$count" == "1" ]]; then - return 0 - fi - count="$("$duckdb_bin" :memory: -batch -csv -noheader -c \ - "LOAD quackscale; SELECT COUNT(*) FROM duckdb_functions() WHERE function_name='${fn}';" \ - 2>/dev/null | tr -d '[:space:]')" + SELECT CAST(COUNT(*) AS VARCHAR) FROM duckdb_functions() WHERE function_name='${fn}';" \ + 2>&1)" || true + count="$(printf '%s\n' "$out" | tail -1 | tr -d '[:space:]')" + [[ "$count" == "1" ]] && return 0 + out="$("$duckdb_bin" :memory: -batch -csv -noheader -c \ + "LOAD quackscale; SELECT CAST(COUNT(*) AS VARCHAR) FROM duckdb_functions() WHERE function_name='${fn}';" \ + 2>&1)" || true + count="$(printf '%s\n' "$out" | tail -1 | tr -d '[:space:]')" [[ "$count" == "1" ]] } +quacktail_list_quackscale_functions() { + local duckdb_bin="${DUCKDB_BIN:-/usr/local/bin/duckdb}" + local ext_dir="${DUCKDB_EXTENSION_DIRECTORY:-$(quacktail_ext_container_dir)}" + [[ -x "$duckdb_bin" ]] || return 1 + "$duckdb_bin" :memory: -batch -csv -noheader -c \ + "SET extension_directory='${ext_dir}'; LOAD quackscale; \ + SELECT function_name FROM duckdb_functions() \ + WHERE function_name LIKE 'tailscale_%' \ + OR function_name LIKE 'attach_%' \ + OR function_name IN ('quack_uri', 'quack_token', 'quack_discover') \ + ORDER BY 1;" +} + quacktail_ext_verify_artifact() { local install_path="${1:?install path}" if [[ -f "$install_path" ]]; then From f2f9106fb80ddb0d13836c0aa4d539c7dd44d6db Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 09:44:47 +0200 Subject: [PATCH 17/25] Fix Docker source build on ubuntu:24.04 builder. Exclude host build/ from context, install patch and build-essential for libtailscale, and use a dedicated builder script with clearer errors. Co-authored-by: Cursor --- .dockerignore | 12 +++++ .github/workflows/docker-compose-build.yml | 37 +++++++++++++++ examples/Dockerfile | 15 +++--- examples/README.md | 9 +++- scripts/e2e/docker-build-quackscale.sh | 53 ++++++++++++++++++++++ 5 files changed, 115 insertions(+), 11 deletions(-) create mode 100644 .dockerignore create mode 100644 .github/workflows/docker-compose-build.yml create mode 100755 scripts/e2e/docker-build-quackscale.sh diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 0000000..9064e5a --- /dev/null +++ b/.dockerignore @@ -0,0 +1,12 @@ +# Host build outputs — never send macOS/other host artifacts into Linux builds. +/build/ +build/ +**/build/ +.cache/ +**/.cache/ +.e2e-work/ +examples/.e2e-work/ + +# Editor / OS noise +**/.DS_Store +**/*~ diff --git a/.github/workflows/docker-compose-build.yml b/.github/workflows/docker-compose-build.yml new file mode 100644 index 0000000..8b16b90 --- /dev/null +++ b/.github/workflows/docker-compose-build.yml @@ -0,0 +1,37 @@ +# Build examples/docker-compose images from source (catches missing apt deps like patch). +name: Docker compose build + +on: + workflow_dispatch: + pull_request: + paths: + - '.github/workflows/docker-compose-build.yml' + - 'examples/Dockerfile' + - 'examples/docker-compose.yml' + - 'scripts/e2e/docker-build-quackscale.sh' + - '.dockerignore' + - 'src/**' + - 'cmake/**' + +jobs: + compose-build: + name: docker compose build (source) + runs-on: ubuntu-latest + timeout-minutes: 60 + permissions: + contents: read + + steps: + - uses: actions/checkout@v4 + with: + submodules: recursive + + - name: Build quacktail-server image + run: | + cd examples + docker compose build quacktail-server + + - name: Verify image + run: | + cd examples + docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-server diff --git a/examples/Dockerfile b/examples/Dockerfile index 3add590..30746fd 100644 --- a/examples/Dockerfile +++ b/examples/Dockerfile @@ -21,22 +21,19 @@ RUN mkdir -p /out RUN if [ "$BUILD_FROM_SOURCE" = "1" ]; then \ apt-get update \ - && apt-get install -y --no-install-recommends bash ca-certificates curl git golang-go \ - cmake ninja-build g++ make python3 \ + && apt-get install -y --no-install-recommends bash ca-certificates curl git \ + build-essential cmake ninja-build patch python3 \ && rm -rf /var/lib/apt/lists/*; \ fi WORKDIR /src COPY . /src/ +COPY scripts/e2e/docker-build-quackscale.sh /usr/local/bin/docker-build-quackscale.sh +RUN chmod +x /usr/local/bin/docker-build-quackscale.sh + RUN if [ "$BUILD_FROM_SOURCE" = "1" ]; then \ - git submodule update --init --recursive \ - && GEN=ninja make release \ - && install -m755 build/release/duckdb /out/duckdb \ - && test -f build/release/extension/quackscale/quackscale.duckdb_extension \ - && cp -a build/release/extension/quackscale /out/quackscale-ext \ - && git rev-parse HEAD > /out/git-rev \ - && echo "build_from_source=1" > /out/build-info; \ + /usr/local/bin/docker-build-quackscale.sh /out; \ else \ echo "build_from_source=0" > /out/build-info \ && echo "release_tag=${QUACKTAIL_RELEASE_TAG}" >> /out/build-info; \ diff --git a/examples/README.md b/examples/README.md index 38ce8e2..581b962 100644 --- a/examples/README.md +++ b/examples/README.md @@ -29,9 +29,14 @@ Quack HTTP uses **kernel TCP**. Embedded tsnet does not route that traffic. `tai ## Run the demo -**Requires source-built images** (`BUILD_FROM_SOURCE=1`, compose default). Release `v1.0.2` does **not** include `attach_ducklake` or `tailscale_down`. +**Requires source-built images** (`BUILD_FROM_SOURCE=1`, compose default). Clone **with submodules** before building: -Verify the image before running the client demo: +```bash +git pull +git submodule update --init --recursive +cd examples +docker compose build --no-cache quacktail-server quacktail-client +``` ```bash docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-client diff --git a/scripts/e2e/docker-build-quackscale.sh b/scripts/e2e/docker-build-quackscale.sh new file mode 100755 index 0000000..85132d9 --- /dev/null +++ b/scripts/e2e/docker-build-quackscale.sh @@ -0,0 +1,53 @@ +#!/usr/bin/env bash +# Builder-stage script for examples/Dockerfile (BUILD_FROM_SOURCE=1). +set -euo pipefail + +OUT="${1:-/out}" +mkdir -p "$OUT" + +need_submodules() { + [[ -f duckdb/CMakeLists.txt ]] \ + && [[ -f extension-ci-tools/makefiles/duckdb_extension.Makefile ]] \ + && [[ -f third_party/libtailscale/go.mod || -f third_party/libtailscale/README.md ]] \ + && return 1 + return 0 +} + +if need_submodules; then + echo "→ initializing git submodules ..." + if [[ -d .git ]]; then + git submodule sync --recursive + git submodule update --init --recursive + else + echo "error: git submodules missing and .git not available in build context" >&2 + echo " clone with: git clone --recurse-submodules …" >&2 + echo " or ensure duckdb/ and extension-ci-tools/ are populated before docker build" >&2 + exit 1 + fi +else + echo "→ submodules present in build context (skipping git submodule update)" +fi + +# Never reuse host build trees (especially dangerous when building linux from macOS). +rm -rf build .cache + +echo "→ make release (GEN=ninja, $(nproc) jobs) ..." +GEN=ninja make release -j"$(nproc)" + +EXT_ART="build/release/extension/quackscale/quackscale.duckdb_extension" +if [[ ! -f "$EXT_ART" ]]; then + echo "error: quackscale loadable extension not found at ${EXT_ART}" >&2 + ls -la build/release/extension/quackscale 2>/dev/null >&2 || true + exit 1 +fi + +install -m755 build/release/duckdb "$OUT/duckdb" +cp -a build/release/extension/quackscale "$OUT/quackscale-ext" + +if [[ -d .git ]]; then + git rev-parse HEAD > "$OUT/git-rev" +else + echo "docker-build" > "$OUT/git-rev" +fi +echo "build_from_source=1" > "$OUT/build-info" +echo "✓ quackscale builder done — $(wc -c < "$EXT_ART") byte extension, duckdb at $OUT/duckdb" From d80eae18976ad2bf95827bb31a2d47f47c9de73f Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 10:05:06 +0200 Subject: [PATCH 18/25] Fix client demo hang after PASSED on tailscale_down. Emit CLIENT_DEMO_DONE before teardown so the compose watchdog can exit, and make tailscale_close asynchronous so CALL tailscale_down() returns. Co-authored-by: Cursor --- examples/README.md | 2 +- scripts/e2e/quacktail-compose-bootstrap.sh | 8 ++++++-- scripts/e2e/quacktail-entrypoint.sh | 3 ++- src/include/tailscale_bridge.hpp | 1 + src/tailscale_bridge.cpp | 21 +++++++++++++++------ 5 files changed, 25 insertions(+), 10 deletions(-) diff --git a/examples/README.md b/examples/README.md index 581b962..123c16e 100644 --- a/examples/README.md +++ b/examples/README.md @@ -98,8 +98,8 @@ SELECT * FROM lake.inventory ...; SELECT 'LAKE_PASSED' ...; ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); SELECT 'PASSED' ...; -CALL tailscale_down(); SELECT 'CLIENT_DEMO_DONE' ...; +CALL tailscale_down(); ✓ Demo passed — QuackTail cluster + DuckLake over tailnet ``` diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index 2ad8cc8..a60cb9e 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -253,9 +253,9 @@ FROM remote.e2e_payload; DETACH remote; -${teardown_sql} - SELECT 'CLIENT_DEMO_DONE' AS status; + +${teardown_sql} SQL if grep -q '\\n' "$WORK/client_session.sql" 2>/dev/null; then echo "error: generated client_session.sql contains literal \\n" >&2 @@ -371,6 +371,10 @@ if [[ -f "$WORK/server_setup.sql" && -f "$WORK/authkey" ]]; then || { [[ -f "$WORK/client_session.sql" ]] && duckdb_has_quackscale_function tailscale_down \ && ! grep -q 'CALL tailscale_down' "$WORK/client_session.sql"; } \ || { [[ -f "$WORK/client_session.sql" ]] && ! grep -q 'CLIENT_DEMO_DONE' "$WORK/client_session.sql"; } \ + || { [[ -f "$WORK/client_session.sql" ]] && grep -q 'CALL tailscale_down' "$WORK/client_session.sql" \ + && grep -q 'CLIENT_DEMO_DONE' "$WORK/client_session.sql" \ + && [[ "$(grep -n 'CALL tailscale_down' "$WORK/client_session.sql" | head -1 | cut -d: -f1)" \ + -lt "$(grep -n "CLIENT_DEMO_DONE" "$WORK/client_session.sql" | head -1 | cut -d: -f1)" ]]; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && ! grep -q 'DISCOVERED' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && grep -q 'quacktail_attach_remote_lake' "$WORK/client_session.sql"; } \ || { [[ "$ENABLE_DUCKLAKE" == "1" && -f "$WORK/client_session.sql" ]] && duckdb_has_quackscale_function attach_ducklake \ diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index 963a581..5a4cef6 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -247,7 +247,8 @@ run_duckdb_client_session() { : >"$tsnet_log" : >"$out" - # Background duckdb → client.out; monitor file for CLIENT_DEMO_DONE then SIGTERM/KILL. + # Background duckdb → client.out; monitor client.out for CLIENT_DEMO_DONE then SIGTERM/KILL. + # CLIENT_DEMO_DONE is emitted before tailscale_down (tsnet close can block). set +o pipefail if [[ "$QUIET" == "1" ]]; then "${timeout_cmd[@]}" stdbuf -oL -eL "$DUCKDB" -batch -echo \ diff --git a/src/include/tailscale_bridge.hpp b/src/include/tailscale_bridge.hpp index 4a061b0..367982c 100644 --- a/src/include/tailscale_bridge.hpp +++ b/src/include/tailscale_bridge.hpp @@ -97,6 +97,7 @@ class TailscaleBridge { void RefreshIPs(); string LastErrorMessage() const; void JoinLoginThread(); + void DetachLoginThread(); string ResolveAuthKey(const string &authkey) const; void MaybeStartLoopbackProxy(bool enable); void StartLoopbackProxy(); diff --git a/src/tailscale_bridge.cpp b/src/tailscale_bridge.cpp index 470b8c5..551dc8c 100644 --- a/src/tailscale_bridge.cpp +++ b/src/tailscale_bridge.cpp @@ -137,6 +137,12 @@ void TailscaleBridge::JoinLoginThread() { } } +void TailscaleBridge::DetachLoginThread() { + if (login_thread.joinable()) { + login_thread.detach(); + } +} + TailscaleStatus TailscaleBridge::Status() const { TailscaleStatus status; status.linked = @@ -386,18 +392,21 @@ TailscaleLoginStatus TailscaleBridge::LoginStatus() const { void TailscaleBridge::Shutdown() { std::lock_guard guard(g_tailscale_mutex); log_capture.Stop(); - JoinLoginThread(); + // Do not join — interactive login or tsnet teardown can block indefinitely. + DetachLoginThread(); forwarder.Stop(); ClearProxyEnvironment(); #ifdef QUACKSCALE_WITH_TAILSCALE - if (handle >= 0) { - tailscale_clear_serve(handle); - tailscale_close(handle); - handle = -1; + int closing = handle; + handle = -1; + running = false; + if (closing >= 0) { + tailscale_clear_serve(closing); + // tailscale_close waits for AuthLoop; detach so CALL tailscale_down() returns. + std::thread([closing]() { tailscale_close(closing); }).detach(); } #else #endif - running = false; ips.clear(); login_state = "idle"; login_message.clear(); From 5b5863e5939a81e46a1395c8ef572706db4402d0 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 10:19:00 +0200 Subject: [PATCH 19/25] Remove debug scaffolding from the DuckLake compose demo. Drop runtime capability logging, unused client-tsnet.log handling, and duplicate README build steps while keeping build-time image verify. Co-authored-by: Cursor --- examples/Dockerfile | 5 ++- examples/README.md | 17 ++------- scripts/e2e/docker-build-quackscale.sh | 2 +- scripts/e2e/quacktail-compose-bootstrap.sh | 4 +-- scripts/e2e/quacktail-entrypoint.sh | 40 +--------------------- scripts/e2e/quacktail-verify-image.sh | 2 +- 6 files changed, 8 insertions(+), 62 deletions(-) diff --git a/examples/Dockerfile b/examples/Dockerfile index 30746fd..95ca291 100644 --- a/examples/Dockerfile +++ b/examples/Dockerfile @@ -29,11 +29,10 @@ RUN if [ "$BUILD_FROM_SOURCE" = "1" ]; then \ WORKDIR /src COPY . /src/ -COPY scripts/e2e/docker-build-quackscale.sh /usr/local/bin/docker-build-quackscale.sh -RUN chmod +x /usr/local/bin/docker-build-quackscale.sh +RUN chmod +x /src/scripts/e2e/docker-build-quackscale.sh RUN if [ "$BUILD_FROM_SOURCE" = "1" ]; then \ - /usr/local/bin/docker-build-quackscale.sh /out; \ + /src/scripts/e2e/docker-build-quackscale.sh /out; \ else \ echo "build_from_source=0" > /out/build-info \ && echo "release_tag=${QUACKTAIL_RELEASE_TAG}" >> /out/build-info; \ diff --git a/examples/README.md b/examples/README.md index 123c16e..b12e812 100644 --- a/examples/README.md +++ b/examples/README.md @@ -29,25 +29,14 @@ Quack HTTP uses **kernel TCP**. Embedded tsnet does not route that traffic. `tai ## Run the demo -**Requires source-built images** (`BUILD_FROM_SOURCE=1`, compose default). Clone **with submodules** before building: +Source build is required for the DuckLake demo (`attach_ducklake`, `tailscale_down`). ```bash git pull git submodule update --init --recursive cd examples docker compose build --no-cache quacktail-server quacktail-client -``` - -```bash docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-client -docker compose run --rm --entrypoint cat quacktail-client /etc/quacktail/build-info -``` - -Expect `build_from_source=1` and `ok: quackscale image verify`. - -```bash -git pull && cd examples -docker compose build --no-cache quacktail-server quacktail-client docker compose up -d --force-recreate headscale quacktail-server docker compose --profile test run --rm quacktail-client ``` @@ -82,8 +71,6 @@ Expect: ```text → waiting for quacktail-server on tailnet ... ✓ quacktail-server on tailnet -✓ client SQL ready — attach quack:127.0.0.1:19494 (lake: attach_ducklake) -→ quackscale: lake=attach_ducklake tailscale_down=yes QuackTail cluster demo ====================== @@ -106,7 +93,7 @@ CALL tailscale_down(); The client runs one DuckDB session (`duckdb -batch -echo -f /work/client_session.sql`). Compose waits for `quacktail-server` **healthy** (server.log shows `quack_serve` + `tailscale_serve_local`) before starting the client. -Set `QUACKTAIL_QUIET=0` to print full SQL. libtailscale detail: `/work/client-tsnet.log` (client), `/work/server.log` (server). +Set `QUACKTAIL_QUIET=0` to print full SQL. Server libtailscale logs: `/work/server.log`. ## Services diff --git a/scripts/e2e/docker-build-quackscale.sh b/scripts/e2e/docker-build-quackscale.sh index 85132d9..10353a0 100755 --- a/scripts/e2e/docker-build-quackscale.sh +++ b/scripts/e2e/docker-build-quackscale.sh @@ -50,4 +50,4 @@ else echo "docker-build" > "$OUT/git-rev" fi echo "build_from_source=1" > "$OUT/build-info" -echo "✓ quackscale builder done — $(wc -c < "$EXT_ART") byte extension, duckdb at $OUT/duckdb" +echo "✓ quackscale builder done" diff --git a/scripts/e2e/quacktail-compose-bootstrap.sh b/scripts/e2e/quacktail-compose-bootstrap.sh index a60cb9e..2051b18 100644 --- a/scripts/e2e/quacktail-compose-bootstrap.sh +++ b/scripts/e2e/quacktail-compose-bootstrap.sh @@ -205,7 +205,7 @@ write_client_session_sql() { fi fi cat >"$WORK/client_session.sql" <&2 - echo "headscale users list:" >&2 "${HS[@]}" users list >&2 || true - echo "headscale preauthkeys create (debug):" >&2 create_authkey >&2 || true exit 1 fi diff --git a/scripts/e2e/quacktail-entrypoint.sh b/scripts/e2e/quacktail-entrypoint.sh index 5a4cef6..549bb06 100755 --- a/scripts/e2e/quacktail-entrypoint.sh +++ b/scripts/e2e/quacktail-entrypoint.sh @@ -167,15 +167,10 @@ client_attach_uri() { quacktail_dump_client_failure() { local out="${WORK}/client.out" - local tsnet_log="${WORK}/client-tsnet.log" if [[ -s "$out" ]]; then echo "--- client.out (tail) ---" >&2 tail -30 "$out" >&2 fi - if [[ -s "$tsnet_log" ]]; then - echo "--- client-tsnet.log (tail) ---" >&2 - tail -30 "$tsnet_log" >&2 - fi } quacktail_is_signal_rc() { @@ -192,11 +187,7 @@ quacktail_client_on_signal() { quacktail_client_has_fatal_sql_error() { local out="${1:?client out file}" - local tsnet_log="${WORK}/client-tsnet.log" - grep -qE 'Parser Error:|Catalog Error:|Binder Error:|Syntax Error:' "$out" 2>/dev/null \ - && return 0 - [[ -s "$tsnet_log" ]] \ - && grep -qE 'Parser Error:|Catalog Error:|Binder Error:|Syntax Error:' "$tsnet_log" 2>/dev/null + grep -qE 'Parser Error:|Catalog Error:|Binder Error:|Syntax Error:' "$out" 2>/dev/null } quacktail_client_session_succeeded() { @@ -237,14 +228,12 @@ run_duckdb_client_session() { local session_sql="${1:?session sql file}" local out="${2:?out file}" local demo_timeout="${3:?timeout}" - local tsnet_log="${WORK}/client-tsnet.log" local ext_cmd duckdb_rc=0 local timeout_cmd=(timeout --foreground --kill-after=3 "$demo_timeout") local duck_pid=0 local deadline=0 ext_cmd="$(quacktail_sql_extension_directory)" - : >"$tsnet_log" : >"$out" # Background duckdb → client.out; monitor client.out for CLIENT_DEMO_DONE then SIGTERM/KILL. @@ -325,32 +314,12 @@ client_demo_banner() { fi } -quacktail_log_quackscale_caps() { - [[ "$QUIET" == "1" ]] || return 0 - local lake_mode="quack_query" - local down="no" - if [[ -f /etc/quacktail/build-info ]]; then - echo "→ image: $(tr '\n' ' ' < /etc/quacktail/build-info)" - fi - quacktail_has_quackscale_function attach_ducklake && lake_mode="attach_ducklake" - quacktail_has_quackscale_function tailscale_down && down="yes" - echo "→ quackscale: lake=${lake_mode} tailscale_down=${down}" - if [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" && "$lake_mode" == "quack_query" ]]; then - echo " error: attach_ducklake missing — image was not built from source with current ducklake branch" >&2 - quacktail_list_quackscale_functions >&2 || true - fi -} - quacktail_require_attach_ducklake() { [[ "${QUACKTAIL_REQUIRE_ATTACH_DUCKLAKE:-0}" == "1" ]] || return 0 [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" == "1" ]] || return 0 quacktail_has_quackscale_function attach_ducklake && return 0 echo "error: attach_ducklake required but not in this image" >&2 - [[ -f /etc/quacktail/build-info ]] && cat /etc/quacktail/build-info >&2 - [[ -f /etc/quacktail/git-rev ]] && echo "git-rev: $(cat /etc/quacktail/git-rev)" >&2 echo "Rebuild: cd examples && docker compose build --no-cache quacktail-client" >&2 - echo "Ensure BUILD_FROM_SOURCE=1 (release v1.0.2 does not include attach_ducklake)" >&2 - quacktail_list_quackscale_functions >&2 || true exit 1 } @@ -374,7 +343,6 @@ run_client() { wait_for_tailnet_server ensure_quack quacktail_require_attach_ducklake - quacktail_log_quackscale_caps ensure_server_hosts_mapping ensure_client_sql attach_uri="$(client_attach_uri)" @@ -410,12 +378,6 @@ run_client() { duckdb_rc=0 break fi - if grep -q "PASSED" "$out" 2>/dev/null \ - && { [[ "${QUACKTAIL_ENABLE_DUCKLAKE:-0}" != "1" ]] || grep -q "LAKE_PASSED" "$out" 2>/dev/null; }; then - echo "error: demo passed but CLIENT_DEMO_DONE missing (teardown or exit failed)" >&2 - quacktail_dump_client_failure - exit 1 - fi if (( attempt < max_attempts )); then [[ "$QUIET" == "1" ]] && echo "→ retry ${attempt}/${max_attempts} ..." quacktail_dump_client_failure diff --git a/scripts/e2e/quacktail-verify-image.sh b/scripts/e2e/quacktail-verify-image.sh index aa6b531..37a4b21 100644 --- a/scripts/e2e/quacktail-verify-image.sh +++ b/scripts/e2e/quacktail-verify-image.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# Verify quackscale in the container image (Dockerfile + runtime diagnostics). +# Verify required quackscale functions are present in the container image. set -euo pipefail DUCKDB_BIN="${DUCKDB_BIN:-/usr/local/bin/duckdb}" From e8f817196749779ec2a4b7bf6a29daf3bc635908 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 10:19:07 +0200 Subject: [PATCH 20/25] Fix client log path in usage doc after tsnet.log removal. Co-authored-by: Cursor --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index ad76456..f91852f 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -396,7 +396,7 @@ DuckLake metadata: file (`*.ducklake`), Postgres, or DuckDB file — see [DuckLa ### Observability - Server: `CALL tailscale_status()`, Quack logs, `/work/server.log` in compose -- Client: `/work/client.out`, `/work/client-tsnet.log` +- Client: `/work/client.out` - Readiness: `CALL tailscale_ping(host => 'peer', port => 9494)` before heavy queries --- From e1aa09b710b979b3fc8b95386ac4711f0d08afea Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 10:22:21 +0200 Subject: [PATCH 21/25] Consolidate docs for integrators into a cohesive set. Replace nine overlapping docs with GUIDE (use cases and patterns), AUTHENTICATION (tailnet + Quack credentials), and DEVELOPMENT (contributor build/CI). Update README and examples cross-links. Co-authored-by: Cursor --- README.md | 31 ++- docs/AUTHENTICATION.md | 258 +++++++++++------- docs/DEVELOPMENT.md | 109 ++++++++ docs/DUCKLAKE_REMOTE_ATTACH.md | 116 -------- docs/DUCKLAKE_TAILNET.md | 62 ----- docs/GUIDE.md | 382 ++++++++++++++++++++++++++ docs/HEADSCALE.md | 90 ------- docs/PLAN.md | 150 ----------- docs/QUACK_AUTH.md | 258 ------------------ docs/QUACK_STREAMING.md | 65 ----- docs/README.md | 53 ++-- docs/UPDATING.md | 23 -- docs/usage.md | 471 --------------------------------- examples/README.md | 4 +- examples/ducklake/README.md | 34 ++- 15 files changed, 722 insertions(+), 1384 deletions(-) create mode 100644 docs/DEVELOPMENT.md delete mode 100644 docs/DUCKLAKE_REMOTE_ATTACH.md delete mode 100644 docs/DUCKLAKE_TAILNET.md create mode 100644 docs/GUIDE.md delete mode 100644 docs/HEADSCALE.md delete mode 100644 docs/PLAN.md delete mode 100644 docs/QUACK_AUTH.md delete mode 100644 docs/QUACK_STREAMING.md delete mode 100644 docs/UPDATING.md delete mode 100644 docs/usage.md diff --git a/README.md b/README.md index 363f20b..cc6544b 100644 --- a/README.md +++ b/README.md @@ -13,23 +13,22 @@ LOAD quackscale; -- tailscale_up, quack_uri, quack_token, ... ## Documentation -| Doc | Contents | -|-----|----------| -| [docs/usage.md](docs/usage.md) | **Use cases** — Quack, DuckLake, S3, discovery, production patterns | -| [docs/PLAN.md](docs/PLAN.md) | Architecture, roadmap, risks | -| [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md) | **Tailscale** — auth keys, browser login, `TS_AUTHKEY` | -| [docs/HEADSCALE.md](docs/HEADSCALE.md) | **Headscale** — self-hosted control plane (`control_url`, preauth keys) | -| [docs/QUACK_AUTH.md](docs/QUACK_AUTH.md) | **Quack** — shared tokens, env provisioning, overriding `quack_authentication_function` | +| Doc | Audience | Contents | +|-----|----------|----------| +| [docs/GUIDE.md](docs/GUIDE.md) | **Integrators** | Use cases, patterns, DuckLake, demos, limitations | +| [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md) | **Integrators** | Tailscale, Headscale, Quack tokens | +| [docs/README.md](docs/README.md) | Everyone | Documentation index | +| [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) | Contributors | Build, CI, roadmap | +| [examples/README.md](examples/README.md) | Integrators | Docker Compose two-node demo | ## Authentication (two layers) -QuackTail uses **two separate** credential systems. Both are required in production unless you deliberately relax Quack auth on a locked-down tailnet. +QuackTail uses **two separate** credential systems. See [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md). -| Layer | Question | Provisioned via | QuackScale / Quack | -|-------|----------|-----------------|-------------------| -| **Tailscale** | Is this process on our tailnet? | `TS_AUTHKEY`, `CALL tailscale_up`, or browser login | `tailscale_*` SQL — [AUTHENTICATION.md](docs/AUTHENTICATION.md) | -| **Headscale** (optional) | Same, self-hosted control server | `control_url` + Headscale preauth key | Same SQL — [HEADSCALE.md](docs/HEADSCALE.md) | -| **Quack** | May this caller run SQL on this server? | Shared env token, DuckDB secrets, or custom auth macro | `quack_token()`, `quack_serve(token => ...)`, `CREATE SECRET` — see [QUACK_AUTH.md](docs/QUACK_AUTH.md) | +| Layer | Question | Provisioned via | +|-------|----------|-----------------| +| **Tailnet** | Is this process on our mesh? | `TS_AUTHKEY`, Headscale preauth key, or browser login | +| **Quack** | May this caller run SQL on this server? | `QUACK_TAILNET_TOKEN`, `CREATE SECRET`, or auth macro | **Do not** copy the random `auth_token` column from each `CALL quack_serve` by hand. For a fleet of servers and clients, use a **network-wide shared token** (or allowlist) as described in [Quack security — Overriding authentication](https://duckdb.org/docs/current/quack/security#overriding-authentication). @@ -125,7 +124,7 @@ ATTACH 'quack:my-duckdb-node:9494' AS remote ( FROM remote.query('SELECT 42'); ``` -Use the hostname from `tailscale_up(hostname => ...)` and Quack’s default port **9494**. Details and multi-token setups: [docs/QUACK_AUTH.md](docs/QUACK_AUTH.md). +Use the hostname from `tailscale_up(hostname => ...)` and Quack’s default port **9494**. Details: [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md). ### Tailscale login (first-time / laptop) @@ -160,7 +159,7 @@ Example: [examples/headscale_quacktail.sql](examples/headscale_quacktail.sql). C | **Multi-token allowlist** | Teams, rotation, multiple clients | `SET GLOBAL quack_authentication_function = '...'` + token table — [Quack docs](https://duckdb.org/docs/current/quack/security#example-multi-token-table) | | **Developer mode** | Lab tailnet only | Auth macro always `true` — [Quack docs](https://duckdb.org/docs/current/quack/security#example-developer-mode-always-allow) | -Full walkthrough: [docs/QUACK_AUTH.md](docs/QUACK_AUTH.md). +Full walkthrough: [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md). ## SQL reference @@ -176,7 +175,7 @@ Load with `LOAD quackscale;`. Use **`CALL`** for table functions (same style as | `CALL tailscale_status()` | libtailscale linked?, running, hostname, tailnet IPs | | `CALL tailscale_quack_forward(host => 'peer', port => 9494)` | Localhost TCP → `tailscale_dial` (preferred for Quack ATTACH; no ALL_PROXY) | | `CALL tailscale_down()` | Stop forwarder + close tsnet (one-shot clients — required or process hangs) | -| `CALL attach_ducklake(uri, …)` | Create local views over a remote DuckLake catalog (server-owned Parquet) — see [DUCKLAKE_REMOTE_ATTACH.md](docs/DUCKLAKE_REMOTE_ATTACH.md) | +| `CALL attach_ducklake(uri, …)` | Create local views over a remote DuckLake catalog (server-owned Parquet) — see [docs/GUIDE.md](docs/GUIDE.md) | | `CALL tailscale_quack_proxy()` | Legacy SOCKS + ALL_PROXY | | `CALL tailscale_proxy_status()` | Legacy SOCKS status | diff --git a/docs/AUTHENTICATION.md b/docs/AUTHENTICATION.md index 75bcdd1..7c55716 100644 --- a/docs/AUTHENTICATION.md +++ b/docs/AUTHENTICATION.md @@ -1,163 +1,225 @@ -# Tailscale authentication (QuackScale) +# Authentication -This document covers **only Tailscale** — getting a DuckDB process onto your tailnet. +QuackTail uses **two independent credential layers**. Both matter in production unless you deliberately relax Quack auth on a locked-down tailnet. -For **Quack HTTP tokens** (shared secrets between QuackTail servers and clients), see **[QUACK_AUTH.md](QUACK_AUTH.md)**. You need both layers in production. +| Layer | Question | Configure with | +|-------|----------|----------------| +| **Tailnet** | Is this process on our mesh? | `TS_AUTHKEY`, Headscale preauth key, or browser login → `CALL tailscale_up` | +| **Quack** | May this caller run SQL over HTTP? | `QUACK_TAILNET_TOKEN`, `CREATE SECRET`, or custom auth macro | -| Doc | Topic | -|-----|--------| -| [QUACK_AUTH.md](QUACK_AUTH.md) | Shared `QUACK_TAILNET_TOKEN`, `quack_token()`, `CREATE SECRET`, overriding `quack_authentication_function` | -| [PLAN.md](PLAN.md) | Architecture and roadmap | -| [../README.md](../README.md) | Quick start and SQL reference | - -## How it fits QuackTail - -``` - Client Server - │ │ - │ ① Tailscale (wire) │ CALL tailscale_up - │ TS_AUTHKEY / login │ → node on tailnet - │ │ - │ ② Quack HTTP :9494 │ CALL quack_serve(..., token => quack_token()) - │ QUACK_TAILNET_TOKEN │ → SQL API on tailnet IP - └─────────────────────────────────────────┘ -``` - -Tailscale ACLs control **who can open TCP to port 9494**. Quack tokens control **who may run SQL** once connected. See [Quack security](https://duckdb.org/docs/current/quack/security). +Tailnet ACLs control **who can open TCP to port 9494**. Quack tokens control **who may execute SQL** once connected. See [Quack security](https://duckdb.org/docs/current/quack/security). --- -QuackScale embeds [libtailscale](https://github.com/tailscale/libtailscale) (Go **tsnet**). Joining a tailnet matches other embedded Tailscale apps: **auth keys**, **environment variables**, **persisted state**, or **interactive browser login**. +## Tailnet login (Tailscale SaaS) -## How tsnet authenticates +QuackScale embeds [libtailscale](https://github.com/tailscale/libtailscale) (tsnet). Joining matches other embedded Tailscale apps. | Mode | How | Best for | |------|-----|----------| | **Auth key** | `authkey` in `CALL tailscale_up`, or `TS_AUTHKEY` env | Servers, CI, automation | -| **Persisted state** | `state_dir` — keys on disk after first login | Laptops, repeat use | -| **Interactive login** | Login URL in logs; open in browser | First-time dev setup | -| **Headscale** | `control_url` → your [Headscale](https://github.com/juanfont/headscale) URL + Headscale preauth key | Self-hosted tailnet (Tailscale-compatible) | -| **Test control** | `control_url` → [tstestcontrol](https://github.com/tailscale/libtailscale/tree/main/tstestcontrol) | libtailscale unit tests | +| **Persisted state** | `state_dir` on disk after first login | Laptops, repeat use | +| **Browser login** | `CALL tailscale_login` → open `login_url` | First-time dev setup | + +### Production server + +```sh +export TS_AUTHKEY='tskey-auth-...' +``` -The libtailscale C API exposes `tailscale_set_authkey`, `tailscale_set_dir`, `tailscale_set_control_url`, `tailscale_set_logfd`, and `tailscale_up`. There is no C API that returns a login URL directly — tsnet prints `https://login.tailscale.com/a/…` on the **log stream** (see [libtailscale Python README](https://github.com/tailscale/libtailscale/blob/main/python/README.md)). +```sql +LOAD quackscale; -Reference: [tsnet.Server · Tailscale Docs](https://tailscale.com/kb/1522/tsnet-server). +CALL tailscale_up( + hostname => 'analytics-hub', + state_dir => '/var/lib/duckdb/tailscale' +); +``` -## Loopback forward (Quack HTTP over the tailnet) +Do not commit auth keys in SQL — use env or your secret store. -Embedded tsnet can dial peers (`tailscale_ping`), but **Quack uses normal HTTP/TCP**. Kernel sockets cannot reach tailnet IPs without help. +### Developer laptop -The native libtailscale path ([tsnetctest](https://github.com/tailscale/libtailscale/blob/main/tsnetctest/tsnetctest.go)) uses `tailscale_dial`. QuackScale exposes that for Quack via a **localhost TCP forwarder** — no SOCKS, no `ALL_PROXY`: +`CALL tailscale_up()` **blocks** until login completes. For a non-blocking flow: ```sql -CALL tailscale_up(hostname => 'my-client', authkey => '...', state_dir => '/var/lib/duckdb/ts'); -CALL tailscale_quack_forward(host => 'peer-hostname', port => 9494, local_port => 19494); --- quack_uri => quack:127.0.0.1:19494 - -CREATE SECRET (TYPE quack, TOKEN '...', SCOPE 'quack:127.0.0.1:19494'); -ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack, DISABLE_SSL true); +CALL tailscale_login( + hostname => 'my-laptop', + state_dir => '~/.local/share/duckdb/quackscale' +); +CALL tailscale_login_status(); -- poll until status = 'up' ``` -`tailscale_quack_forward` listens on `127.0.0.1:local_port` and dials `host:port` over tsnet for each Quack HTTP connection. +Open `login_url` in a browser. Reuse `state_dir` on later runs. + +### Environment variables (tailnet) + +| Variable | Effect | +|----------|--------| +| `TS_AUTHKEY` | Auth key if not passed in `CALL tailscale_up` | +| `TSNET_FORCE_LOGIN` | Force browser login even when an auth key is set (rare) | -Legacy: `CALL tailscale_quack_proxy()` (SOCKS + `ALL_PROXY`) remains but is deprecated. +--- -## Recommended patterns +## Headscale (self-hosted control plane) -### Production / servers — auth key +[Headscale](https://github.com/juanfont/headscale) implements the Tailscale control server API. QuackScale uses the same parameters as `tailscale up --login-server`: -Create a [reusable or ephemeral auth key](https://tailscale.com/kb/1085/auth-keys), then: +| Tailscale CLI | QuackScale | +|---------------|------------| +| `--login-server https://hs.example.com` | `control_url => 'https://hs.example.com'` | +| `--authkey …` | `authkey => '…'` or `TS_AUTHKEY` | +| `--hostname` | `hostname => '…'` | +| state directory | `state_dir => '…'` | + +Create Headscale preauth keys with `headscale preauthkeys create` (not the Tailscale admin UI). ```sh -export TS_AUTHKEY='tskey-auth-...' +headscale users create quackscale +headscale preauthkeys create --user 1 --reusable --expiration 168h ``` ```sql -LOAD quackscale; - CALL tailscale_up( - hostname => 'analytics-duck-1', - state_dir => '/var/lib/duckdb/tailscale' + hostname => 'duckdb-node-a', + control_url => 'https://headscale.example.com', + authkey => '', + state_dir => '/var/lib/duckdb/headscale-state' ); ``` -Or pass the key in SQL: `CALL tailscale_up(authkey => 'tskey-auth-...', ...)`. +**Compose demo:** control URL `http://headscale:8080`, preauth key written to `/work/authkey`. See [examples/README.md](../examples/README.md). -Do not commit auth keys in SQL files — use env or your orchestrator’s secret store. +**Notes:** Production `server_url` should be HTTPS. MagicDNS is optional; `quack_uri()` prefers MagicDNS when available, else tailnet IP. -### Developer laptop — browser login +--- -`CALL tailscale_up()` **blocks** until login completes. For a non-blocking flow: +## Quack HTTP tokens + +After a node is on the tailnet, Quack still requires application-level auth. + +### Default Quack behavior (why you override it) + +`CALL quack_serve(...)` generates a **random** token unless you pass `token => '...'`. That is fine for local experiments; **fleets need a shared token or allowlist**. + +QuackScale provides `quack_token()` to read a shared secret from the environment on the **server**. Clients use the same value via `CREATE SECRET` or `TOKEN`. + +### Environment variables (Quack) + +Set on **both** servers and clients: + +| Variable | Role | +|----------|------| +| `QUACK_TAILNET_TOKEN` | **Preferred** — shared token (≥ 4 characters) | +| `QUACK_TOKEN` | Fallback if `QUACK_TAILNET_TOKEN` is unset | + +Keep **`TS_AUTHKEY`** separate from Quack tokens. + +--- + +## Quack auth modes + +### Mode 1 — Single shared token (recommended) + +**Server:** ```sql +LOAD quack; LOAD quackscale; -CALL tailscale_login( - hostname => 'my-laptop-duckdb', - state_dir => '~/.local/share/duckdb/quackscale' +CALL tailscale_up(hostname => 'warehouse-a', state_dir => '…'); + +CALL quack_serve( + 'quack:127.0.0.1:9494', + allow_other_hostname => true, + token => quack_token() ); --- Returns status, login_url, message +CALL tailscale_serve_local(port => 9494); +``` -CALL tailscale_login_status(); -- poll until status = 'up' +**Client** (after `tailscale_quack_forward` — see [GUIDE.md](GUIDE.md)): + +```sql +LOAD quack; + +CREATE SECRET ( + TYPE quack, + TOKEN 'your-shared-quack-secret', + SCOPE 'quack:127.0.0.1:19494' +); + +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack, DISABLE_SSL true); ``` -Open `login_url` in a browser and approve the device. tsnet may also print the same URL on DuckDB stderr. +`SCOPE` must match how the client reaches the server. With the forwarder, that is `quack:127.0.0.1:`. + +**Stateless queries:** -After the first login, reuse `state_dir`; later `CALL tailscale_up()` usually needs no browser. +```sql +FROM quack_query( + 'quack:127.0.0.1:19494', + 'SELECT 42', + token => 'your-shared-quack-secret', + disable_ssl => true +); +``` -### Self-hosted — Headscale +### Mode 2 — Token allowlist (rotation / teams) -[Headscale](https://github.com/juanfont/headscale) implements the Tailscale control server API. QuackScale uses the same knobs as the Tailscale CLI: +Use Quack’s [multi-token table](https://duckdb.org/docs/current/quack/security#example-multi-token-table): ```sql -CALL tailscale_up( - hostname => 'my-node', - control_url => 'https://headscale.example.com', - authkey => '', - state_dir => '/var/lib/duckdb/headscale-state' +CREATE TABLE quacktail_tokens (auth_token VARCHAR PRIMARY KEY, label VARCHAR); +INSERT INTO quacktail_tokens VALUES ('primary-2026', 'analytics'); + +CREATE MACRO quacktail_check_token(sid, client_token, server_token) AS ( + EXISTS (SELECT 1 FROM quacktail_tokens WHERE auth_token = client_token) ); +SET GLOBAL quack_authentication_function = 'quacktail_check_token'; ``` -Create keys with `headscale preauthkeys create`. Full walkthrough: **[HEADSCALE.md](HEADSCALE.md)** and [examples/headscale_quacktail.sql](../examples/headscale_quacktail.sql). +Validate **`client_token`** (what the caller sent), not `server_token`. -### CI / tests +### Mode 3 — Developer mode (lab only) -| Workflow | Control plane | -|----------|----------------| -| [headscale-integration.yml](../.github/workflows/headscale-integration.yml) | Docker Headscale + `CALL tailscale_up` | -| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | Two-node QuackTail e2e (linux, manual dispatch) | -| [libtailscale-integration.yml](../.github/workflows/libtailscale-integration.yml) | libtailscale `tstestcontrol` (`go test`) | +```sql +CREATE MACRO quacktail_dev_auth(sid, client_token, server_token) AS true; +SET GLOBAL quack_authentication_function = 'quacktail_dev_auth'; +``` -## SQL surface (Tailscale only) +**Not for production.** See [Quack developer mode](https://duckdb.org/docs/current/quack/security#example-developer-mode-always-allow). -Invoke with **`CALL`**, like Quack: +--- -| Command | Purpose | -|---------|---------| -| `CALL tailscale_up(...)` | Blocking join; `authkey` or `TS_AUTHKEY`; optional `state_dir`, `control_url`, `ephemeral` | -| `CALL tailscale_login(...)` | Background join; returns `login_url` | -| `CALL tailscale_login_status()` | Poll `status`, `login_url`, tailnet IPs | -| `CALL tailscale_status()` | Linked?, running, hostname, IPs | +## End-to-end checklist -## Environment variables +**Each long-lived server** -| Variable | Effect | -|----------|--------| -| `TS_AUTHKEY` | Tailscale auth key if not passed in `CALL tailscale_up` | -| `TSNET_FORCE_LOGIN` | Force interactive login even if an auth key is set (rare) | +1. `export TS_AUTHKEY` (or Headscale preauth key) and `export QUACK_TAILNET_TOKEN` +2. `LOAD quack; LOAD quackscale;` +3. `CALL tailscale_up(...)` with persistent `state_dir` +4. Optional: `SET GLOBAL quack_authentication_function` (Modes 2–3) +5. `CALL quack_serve(..., token => quack_token()); CALL tailscale_serve_local(port => 9494);` +6. Do **not** call `tailscale_down()` on steady-state servers + +**Each one-shot client** -**Quack tokens are separate:** `QUACK_TAILNET_TOKEN` / `QUACK_TOKEN` — see [QUACK_AUTH.md](QUACK_AUTH.md). +1. Same `QUACK_TAILNET_TOKEN` available for secrets / `quack_query` +2. `LOAD quackscale; CALL tailscale_up(...); CALL tailscale_quack_forward(...);` +3. `LOAD quack; CREATE SECRET ...;` then query / attach +4. `DETACH remote; SELECT 'done'; CALL tailscale_down();` — required or the process hangs + +--- -## Security notes +## Security -- Treat `TS_AUTHKEY` like any infrastructure secret. -- Tailnet [ACLs](https://tailscale.com/kb/1018/acls) should restrict who can reach peer TCP **9494** (Quack). -- QuackScale advertises `quack:` URIs; it does not replace Quack’s application-level auth. +- Rotate `QUACK_TAILNET_TOKEN` like an API key; update servers and clients together +- Restrict tailnet ACLs to who may reach peer TCP **9494** +- `allow_other_hostname => true` is for tailnet binds — do not expose raw Quack on the public internet without TLS in front ([Quack exposure model](https://duckdb.org/docs/current/quack/security#exposure-model)) -## Related reading +## References -- [QUACK_AUTH.md](QUACK_AUTH.md) — Quack / QuackTail application tokens -- [HEADSCALE.md](HEADSCALE.md) — self-hosted Headscale -- [libtailscale](https://github.com/tailscale/libtailscale) -- [Headscale](https://github.com/juanfont/headscale) +- [Quack security](https://duckdb.org/docs/current/quack/security) +- [Quack overview — Authentication](https://duckdb.org/docs/current/quack/overview#authentication) - [Tailscale auth keys](https://tailscale.com/kb/1085/auth-keys) +- [Headscale docs](https://headscale.net/) diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md new file mode 100644 index 0000000..095181d --- /dev/null +++ b/docs/DEVELOPMENT.md @@ -0,0 +1,109 @@ +# Development + +This document is for **extension contributors** — building QuackScale, updating DuckDB, and CI. Integrators should read [GUIDE.md](GUIDE.md) and [AUTHENTICATION.md](AUTHENTICATION.md). + +## What QuackScale is + +QuackScale is a DuckDB community extension embedding [libtailscale](https://github.com/tailscale/libtailscale) so a DuckDB process can join a tailnet and reach the [Quack](https://duckdb.org/docs/current/quack/overview) HTTP protocol on tailnet addresses. + +QuackScale does **not** reimplement Quack. It provides tailnet lifecycle SQL, a localhost forwarder for Quack clients, and helpers such as `attach_ducklake`. + +```text +DuckDB + quackscale + libtailscale + → tailscale_up, tailscale_quack_forward, quack_uri, attach_ducklake +DuckDB + quack (core) + → quack_serve, ATTACH, quack_query +``` + +## Build + +Prerequisites: C++17, cmake, ninja or make, Go 1.25+ (CGO), git submodules. + +```sh +git clone --recurse-submodules https://github.com/quackscience/duckdb-quackscale.git +cd duckdb-quackscale +GEN=ninja make release +``` + +Artifacts: + +- `build/release/duckdb` +- `build/release/extension/quackscale/quackscale.duckdb_extension` + +Disable libtailscale (stub build): + +```sh +make CMAKE_VARS="-DQUACKSCALE_WITH_TAILSCALE=OFF" +``` + +Docker Compose images build from source by default — see [examples/Dockerfile](../examples/Dockerfile) and `.dockerignore`. + +## Repository layout + +```text +cmake/Libtailscale.cmake Go c-archive build + Go 1.25.5 bootstrap +third_party/libtailscale/ git submodule +src/ C++ extension (bridge, forwarder, attach_ducklake) +scripts/e2e/ Compose entrypoint, bootstrap, verify-image +examples/ Docker Compose two-node demo +duckdb/ DuckDB submodule +extension-ci-tools/ Extension build makefile submodule +``` + +## libtailscale integration + +- Built with `go build -buildmode=c-archive` → `libtailscale.a` +- C API: `tailscale_up`, `tailscale_dial`, `tailscale_close`, etc. +- CMake option `QUACKSCALE_WITH_TAILSCALE` (default ON) +- Ubuntu Docker builder needs `build-essential` and `patch` for the libtailscale patch step + +## Updating DuckDB + +When bumping the DuckDB target: + +1. Update `./duckdb` submodule to the latest stable tag +2. Update `./extension-ci-tools` to the branch matching that DuckDB version (e.g. `v1.5.3`) +3. Update `duckdb_version` in [MainDistributionPipeline.yml](../.github/workflows/MainDistributionPipeline.yml) +4. Rebuild — the DuckDB C++ API is not stable; fix compile breaks using [release notes](https://github.com/duckdb/duckdb/releases) and core extension patches + +## CI workflows + +| Workflow | Purpose | +|----------|---------| +| [headscale-integration.yml](../.github/workflows/headscale-integration.yml) | Build from source + Headscale smoke | +| [docker-compose-build.yml](../.github/workflows/docker-compose-build.yml) | Build compose image + verify-image | +| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | Two-node e2e with release binary | +| [Release.yml](../.github/workflows/Release.yml) | Linux release tarball on GitHub Release | +| [libtailscale-integration.yml](../.github/workflows/libtailscale-integration.yml) | libtailscale `go test` | + +## Roadmap (selected) + +| Item | Status | +|------|--------| +| `tailscale_up`, `tailscale_quack_forward`, `tailscale_down` | Done | +| `attach_ducklake` (Tier 2 remote lake views) | Done | +| Headscale + Compose e2e | Done | +| `ATTACH … TYPE quacktail_lake` (Tier 3 native catalog) | Planned | +| `ducklake_discover()` enriched discovery | Planned | +| `quackscale_serve()` one-call server bootstrap | Planned | +| Community extension descriptor publish | Planned | + +## Risks + +| Risk | Mitigation | +|------|------------| +| Large binary (Go runtime) | Document size; `QUACKSCALE_WITH_TAILSCALE=OFF` stub | +| Quack API churn | Pin DuckDB; integration tests against pinned quack | +| Secrets in SQL | Env / orchestrator secrets — see [AUTHENTICATION.md](AUTHENTICATION.md) | + +## Tests + +```sh +make test +``` + +SQL unit tests do not require a live tailnet. E2e: [test/e2e/README.md](../test/e2e/README.md), [examples/README.md](../examples/README.md). + +## License + +MIT (extension template). libtailscale is [BSD-3-Clause](https://github.com/tailscale/libtailscale/blob/main/LICENSE). diff --git a/docs/DUCKLAKE_REMOTE_ATTACH.md b/docs/DUCKLAKE_REMOTE_ATTACH.md deleted file mode 100644 index 4492695..0000000 --- a/docs/DUCKLAKE_REMOTE_ATTACH.md +++ /dev/null @@ -1,116 +0,0 @@ -# Remote DuckLake attach (server-owned Parquet) - -Goal: query **`lake.inventory`** on a tailnet client with normal SQL — no hand-written `quack_query(...)` — when the DuckLake catalog and Parquet files live **only on the server**. - -Existing patterns stay supported; this adds an optional facade. - -## Why the obvious attaches fail - -| Approach | What happens | -|----------|----------------| -| `ATTACH 'quack:…' AS remote` | Primary catalog only. **`remote.lake.*` does not exist.** | -| `ATTACH 'ducklake:quack:…' AS lake (DATA_PATH '…')` | Catalog metadata over Quack; **Parquet reads use client `DATA_PATH`**. Compose demo stores files on the server volume only → hang or empty reads. | -| `quack_query(uri, 'SELECT … FROM lake.t')` | **Works** — SQL runs on server where DuckLake is attached. Verbose and easy to get wrong (quoting, session ordering). | - -Reference: [DuckDB 1.5.3 DuckLake + Quack](https://duckdb.org/2026/05/20/announcing-duckdb-153.html), [usage.md](usage.md) patterns B/C. - -## Design tiers - -### Tier 1 — `quack_query` (today, unchanged) - -```sql -FROM quack_query( - 'quack:127.0.0.1:19494', - 'SELECT * FROM lake.inventory', - token => '…', - disable_ssl => true -); -``` - -### Tier 2 — `CALL attach_ducklake(...)` (**implemented**) - -One-time setup per session: discover remote tables via `quack_query`, create **local views** that delegate to the server. - -```sql -LOAD quack; -LOAD quackscale; - -CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); - -CALL attach_ducklake( - 'quack:127.0.0.1:19494', - remote_catalog => 'lake', - alias => 'lake', - token => '…', - disable_ssl => true -); --- → local_view | remote_table | status --- lake.inventory | lake.inventory | created - -SELECT * FROM lake.inventory ORDER BY item_id; -SELECT 'LAKE_PASSED' AS status, COUNT(*)::INTEGER AS inventory_rows FROM lake.inventory; -``` - -**How it works** - -1. `quack_query` → `duckdb_tables()` on the **server** for `database_name = remote_catalog` -2. For each table: `CREATE OR REPLACE VIEW alias.table AS FROM quack_query(..., 'SELECT * FROM lake.table', ...)` -3. Client reads look like normal SQL; execution still happens on the server (server reads its Parquet) - -**Limits (Tier 2)** - -- No predicate/column pushdown (each scan is `SELECT *` on the server unless you add filters manually) -- Inserts/updates/deletes not supported through views (read path only for now) -- Re-run `CALL attach_ducklake` after server schema changes to refresh views -- Still obeys [Quack streaming-scan rules](QUACK_STREAMING.md) for attached `quack:` catalogs in the **same** statement; view scans use `quack_query`, not Quack attach scans - -### Tier 3 — `ATTACH … TYPE quacktail_lake` (planned) - -Register a **StorageExtension** in quackscale so attach syntax is native: - -```sql -ATTACH 'quacktail-lake:quack:127.0.0.1:19494/lake' AS lake ( - TYPE quacktail_lake, - TOKEN '…', - DISABLE_SSL true -); -SELECT * FROM lake.inventory; -``` - -Requires a custom `Catalog` + table scan operator that issues remote reads (same transport as Tier 2, better planner integration and room for pushdown later). Does not need client `DATA_PATH`. - -**Not implemented yet** — Tier 2 unblocks compose and docs without DuckDB catalog internals. - -## Comparison - -| | `quack_query` | Tier 2 views | `ducklake:quack:` | Tier 3 attach | -|--|---------------|--------------|-------------------|---------------| -| Server-owned Parquet | Yes | Yes | No (needs shared path) | Yes | -| SQL ergonomics | Poor | Good | Best (when paths align) | Best | -| Pushdown | N/A | No | Yes (local Parquet) | Possible | -| DML | Via wrapped SQL | Read-only | Full DuckLake | Planned | -| Changes quack/ducklake | No | No | No | No | - -## Compose demo - -When the built quackscale includes `attach_ducklake`, bootstrap generates: - -```sql -CALL attach_ducklake('quack:127.0.0.1:19494', …); -SELECT * FROM lake.inventory …; -``` - -Older images fall back to `quack_query` automatically. - -## Session order (unchanged) - -1. `tailscale_up` → `tailscale_quack_forward` -2. `LOAD quack` + secret -3. **Lake reads** (Tier 2 or `quack_query`) **before** `ATTACH quack AS remote` if mixing with e2e attach in one session -4. `DETACH remote` + `tailscale_down` for one-shot clients - -## Related - -- [DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md) — tailnet-specific notes -- [usage.md](usage.md) — patterns A–D -- `ducklake_discover()` — planned enriched discovery (hostname + lake catalogs) diff --git a/docs/DUCKLAKE_TAILNET.md b/docs/DUCKLAKE_TAILNET.md deleted file mode 100644 index f3023ed..0000000 --- a/docs/DUCKLAKE_TAILNET.md +++ /dev/null @@ -1,62 +0,0 @@ -# DuckLake over QuackTail - -Goal: serve DuckLake on a QuackTail node via **Quack**, reachable on the **Headscale tailnet** — **find** endpoints and **query** tables from any tailnet client. - -## Status on branch `ducklake` - -| Piece | Status | -|-------|--------| -| Server: local DuckLake + `quack_serve` + `tailscale_serve_local` | **Done** | -| Client: `quack_query` → `lake.inventory` (before ATTACH remote) | **Done** | -| Client `ducklake:quack:` attach (client-side `DATA_PATH`) | Documented — use when Parquet is local/shared | -| `ducklake_discover()` / enriched `quack_discover` | TBD | - -## Find + query on tailnet - -### 1) Find Quack servers on the tailnet - -Use **`tailscale_quack_forward`** (returns `quack_uri`) or **`CALL quack_discover()`** on a node that runs quackscale locally. - -Do **not** run `quack_query(..., 'FROM quack_discover()')` — it can deadlock when the server executes discover inside Quack's query handler (tsnet + quack lock contention). Remote discovery is TBD (`ducklake_discover()`). - -```sql -CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); --- → quack_uri = quack:127.0.0.1:19494 -``` - -### 2) Query DuckLake tables - -Run lake SQL **before** `ATTACH … AS remote` in the same session (mixing attached Quack catalog + extra `quack_query` calls can stall). - -```sql -FROM quack_query( - 'quack:127.0.0.1:19494', - 'SELECT * FROM lake.inventory ORDER BY item_id', - token => '…', - disable_ssl => true -); -``` - -**Why not `remote.lake.inventory`?** Plain `ATTACH 'quack:…' AS remote` exposes the primary catalog only — not nested attached DuckLake catalogs. - -**Why not `ducklake:quack:` in compose?** That pattern ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)) uses Quack as the catalog DB and requires client `DATA_PATH` to resolve Parquet. Our demo stores Parquet on the **server volume** only; `ducklake:quack:` attach can block when paths don't align. Use it when the client has a local or object-store `DATA_PATH` (`s3://…`). - -## Architecture - -```text -┌─────────────────────┐ tailnet ┌─────────────────────┐ -│ quacktail-client │ ◄──────────────► │ quacktail-server │ -│ quack_query(…) │ │ ATTACH ducklake:… │ -│ (find + lake SQL) │ │ quack_serve │ -└─────────────────────┘ │ ducklake-lake vol │ - └─────────────────────┘ -``` - -## Constraints - -- **Quack streaming-scan limit** — one remote Quack read/write per SQL statement; see [QUACK_STREAMING.md](QUACK_STREAMING.md). Each `quack_query` call is one statement. -- **Discovery** — use `tailscale_quack_forward` / local `quack_discover()`; not `quack_query(..., quack_discover)` (deadlocks). - -## Demo - -[examples/ducklake/README.md](../examples/ducklake/README.md) diff --git a/docs/GUIDE.md b/docs/GUIDE.md new file mode 100644 index 0000000..a55cff6 --- /dev/null +++ b/docs/GUIDE.md @@ -0,0 +1,382 @@ +# QuackTail integration guide + +QuackTail combines: + +1. **Tailscale or Headscale** — private mesh between nodes +2. **Quack** — DuckDB’s HTTP protocol (`quack:` URIs, port **9494**) +3. **QuackScale** — joins DuckDB to the tailnet and forwards Quack across it +4. **DuckLake** (optional) — lakehouse catalog + Parquet on a QuackTail node + +QuackScale does **not** replace Quack or DuckLake. It makes them reachable on MagicDNS / `100.x.x.x` without exposing the public internet. + +Credentials: [AUTHENTICATION.md](AUTHENTICATION.md). Build and SQL reference: [README.md](../README.md). + +--- + +## Mental model + +```text +┌─────────────────────────────────────────────────────────────────┐ +│ quacktail-server (long-lived) │ +│ tailscale_up → quack_serve(127.0.0.1:9494) → tailscale_serve_local +│ optional: ATTACH ducklake:… AS lake (local or s3:// Parquet) │ +└───────────────────────────────┬─────────────────────────────────┘ + │ tailscale_dial (encrypted) +┌───────────────────────────────▼─────────────────────────────────┐ +│ quacktail-client (job, laptop, container) │ +│ tailscale_up → tailscale_quack_forward → quack:127.0.0.1:19494 │ +│ quack_query / attach_ducklake / ATTACH quack AS remote │ +│ tailscale_down() at end of one-shot sessions │ +└─────────────────────────────────────────────────────────────────┘ +``` + +**Why `tailscale_quack_forward`?** Quack uses normal HTTP/TCP. Embedded tsnet does not route kernel TCP to tailnet IPs. The forwarder listens on loopback and dials the peer via `tailscale_dial`. + +**Why `tailscale_down`?** `tailscale_up` and the forwarder start background threads. One-shot DuckDB processes **hang after SQL finishes** unless tsnet is shut down. + +--- + +## Choose a pattern + +```text +Remote DuckDB tables (CRUD, dashboards)? + └─► Pattern A: ATTACH 'quack:…' AS remote + +Lakehouse tables (Parquet, server owns all files)? + └─► Pattern B+: CALL attach_ducklake(...) then SELECT FROM lake.* + +Lakehouse (shared Parquet — S3, NFS, identical mount on each reader)? + └─► Pattern C: ATTACH 'ducklake:quack:…' AS lake (DATA_PATH 's3://…') + +Both operational tables + lake on one node? + └─► Pattern D: Lake queries first, then ATTACH quack AS remote (separate statements) +``` + +| Pattern | Client SQL | Parquet location | Best for | +|---------|------------|------------------|----------| +| **A — Quack attach** | `ATTACH 'quack:…' AS remote` | Server DuckDB / memory | Shared tables, multi-writer Quack | +| **B — quack_query** | `quack_query(uri, 'SELECT … FROM lake.t')` | Server-only | Fallback when `attach_ducklake` unavailable | +| **B+ — attach_ducklake** | `CALL attach_ducklake(...)` then `SELECT … FROM lake.t` | Server-only | **Preferred** for server-owned lakes | +| **C — ducklake:quack** | `ATTACH 'ducklake:quack:…' (DATA_PATH '…')` | Shared store / mount | Many readers with object-store access | +| **D — Hybrid** | B/B+ then A in same session | Mixed | Ops tables + lake on one tailnet node | + +**Common mistake:** `SELECT * FROM remote.lake.inventory` — plain Quack attach exposes the **primary catalog only**, not nested DuckLake databases. Use B, B+, or C. + +--- + +## Standard client connection recipe + +This sequence is what the [Compose demo](../examples/README.md) proves: + +```sql +LOAD quackscale; + +CALL tailscale_up( + hostname => 'my-client', + control_url => 'http://headscale:8080', -- omit for Tailscale SaaS + authkey => '…', + state_dir => '/tmp/client-tailscale', + ephemeral => true +); + +CALL tailscale_quack_forward( + host => 'quacktail-server', + port => 9494, + local_port => 19494 +); +CALL tailscale_ping(host => 'quacktail-server', port => 9494); -- optional + +LOAD quack; +CREATE SECRET ( + TYPE quack, + TOKEN 'your-shared-token', + SCOPE 'quack:127.0.0.1:19494' +); + +FROM quack_query( + 'quack:127.0.0.1:19494', + 'SELECT 1 AS probe', + token => 'your-shared-token', + disable_ssl => true +); + +-- Pattern B+, A, C, or D statements here … + +DETACH remote; -- if Pattern A used +SELECT 'CLIENT_DEMO_DONE' AS status; -- before tailscale_down (compose watchdog) +CALL tailscale_down(); +``` + +--- + +## Use case 1 — Remote DuckDB hub (Pattern A) + +**Story:** A central DuckDB node serves live tables to analysts and services on the tailnet. + +### Server (long-lived) + +```sql +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(hostname => 'analytics-hub', state_dir => '/var/lib/quacktail/hub', …); + +CREATE TABLE IF NOT EXISTS events (id INTEGER, payload VARCHAR, ts TIMESTAMP); + +CALL quack_serve( + 'quack:127.0.0.1:9494', + allow_other_hostname => true, + token => quack_token() +); +CALL tailscale_serve_local(port => 9494); + +FROM quack_discover(); +``` + +Run under systemd, Kubernetes, or the `quacktail-server` container. **Do not** call `tailscale_down()`. + +### Client + +```sql +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(…); +CALL tailscale_quack_forward(host => 'analytics-hub', port => 9494, local_port => 19494); + +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); + +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); + +SELECT * FROM remote.events ORDER BY ts DESC LIMIT 10; + +DETACH remote; +CALL tailscale_down(); +``` + +--- + +## Use case 2 — DuckLake on the server (Patterns B / B+) + +**Story:** One node holds the DuckLake catalog and Parquet; tailnet clients query without copying files. + +### Server + +```sql +LOAD quack; +LOAD ducklake; +LOAD quackscale; + +CALL tailscale_up(hostname => 'lake-server', …); + +ATTACH 'ducklake:/data/lake/metadata/warehouse.ducklake' AS lake ( + DATA_PATH '/data/lake/parquet/' +); +-- Or: DATA_PATH 's3://bucket/prefix/' with httpfs secrets on the server + +CREATE TABLE IF NOT EXISTS inventory (item_id INTEGER, quantity INTEGER); +INSERT INTO inventory VALUES (101, 50); + +CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => quack_token()); +CALL tailscale_serve_local(port => 9494); +``` + +### Client — `attach_ducklake` (preferred) + +Creates local **views** that delegate to the server via `quack_query`: + +```sql +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(…); +CALL tailscale_quack_forward(host => 'lake-server', port => 9494, local_port => 19494); + +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); + +CALL attach_ducklake( + 'quack:127.0.0.1:19494', + remote_catalog => 'lake', + alias => 'lake', + token => '…', + disable_ssl => true +); + +SELECT * FROM lake.inventory ORDER BY item_id; +``` + +**How it works:** discovers tables on the server with `quack_query` → `duckdb_tables()`, then `CREATE VIEW lake.t AS FROM quack_query(..., 'SELECT * FROM lake.t', ...)`. + +**Limits:** read-only through views; no predicate pushdown; re-run after server schema changes. + +### Client — raw `quack_query` (fallback) + +```sql +FROM quack_query( + 'quack:127.0.0.1:19494', + 'SELECT * FROM lake.inventory ORDER BY item_id', + token => '…', + disable_ssl => true +); +``` + +Run lake SQL **before** `ATTACH 'quack:…' AS remote` in the same session. + +**Why not `ATTACH 'ducklake:quack:…'` here?** That pattern needs client-side `DATA_PATH` to resolve Parquet. When files live only on the server volume, reads hang or return empty. Use B/B+ instead. + +--- + +## Use case 3 — Shared Parquet (Pattern C) + +**Story:** Catalog metadata flows over Quack; every reader reads Parquet from a **shared** path ([DuckDB 1.5.3 pattern](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)). + +### Server (catalog only) + +```sql +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(…); +CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => quack_token()); +CALL tailscale_serve_local(port => 9494); +``` + +### Client + +```sql +LOAD ducklake; +LOAD quack; +LOAD quackscale; + +CALL tailscale_up(…); +CALL tailscale_quack_forward(host => 'lake-catalog', port => 9494, local_port => 19494); + +CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); + +ATTACH 'ducklake:quack:127.0.0.1:19494' AS warehouse ( + DATA_PATH 's3://my-bucket/lake/parquet/' +); + +SELECT * FROM warehouse.inventory; +CALL tailscale_down(); +``` + +**Requirements:** `DATA_PATH` reachable from **each client**; configure `httpfs` / cloud secrets on clients for `s3://`. + +--- + +## Use case 4 — Hybrid hub (Pattern D) + +Same tailnet node serves operational Quack tables **and** a DuckLake catalog: + +1. Lake reads: `attach_ducklake` or `quack_query` +2. Operational tables: `ATTACH 'quack:…' AS remote` +3. **One remote Quack read/write per SQL statement** (see limitations below) +4. End one-shot clients with `DETACH remote; CALL tailscale_down()` + +--- + +## Finding peers + +| Method | Status | Notes | +|--------|--------|-------| +| **`tailscale_quack_forward(host => '…')`** | Use today | Returns `quack_uri` for a known hostname | +| **`FROM quack_discover()`** on server | Use today | URIs this node advertises | +| **Config / DNS** | Use today | Stable Headscale hostnames in Helm, compose | +| **`quack_query(…, 'FROM quack_discover()')`** | Avoid | Can deadlock on the server | + +**Fleet pattern:** deploy nodes with stable hostnames (`analytics-hub`, `lake-server`); clients call `tailscale_quack_forward(host => 'analytics-hub', …)`. + +**Multiple servers:** use different local ports (`19494`, `19495`) or sequential sessions with `tailscale_down()` between peers. + +**Multiple lakes on one server:** attach each with a distinct alias; clients query with fully qualified names in `quack_query` or `attach_ducklake`. + +--- + +## Production notes + +### Object storage + +| Role | Approach | +|------|----------| +| Server owns lake | `DATA_PATH 's3://…'` in server `ATTACH ducklake`; clients use Pattern B+ | +| Readers with credentials | Pattern C — each client `ATTACH 'ducklake:quack:…' (DATA_PATH 's3://…')` | + +### Lifecycle + +| Deployment | `state_dir` | `tailscale_down` | +|------------|-------------|-------------------| +| Long-lived server | Persistent volume | Never on steady state | +| Cron / CI / compose client | Ephemeral OK | Always at end | + +DuckLake metadata: file (`*.ducklake`), Postgres, or DuckDB — see [DuckLake attach](https://duckdb.org/docs/stable/duckdb/attach). + +### Observability + +- Server: `CALL tailscale_status()`, `/work/server.log` in compose +- Client: `/work/client.out` in compose +- Readiness: `CALL tailscale_ping(host => 'peer', port => 9494)` before heavy queries + +--- + +## Limitations and workarounds + +| Issue | Workaround | +|-------|------------| +| `remote.lake.table` does not exist | Use `attach_ducklake`, `quack_query`, or `ducklake:quack:` | +| Client hangs after SQL completes | Emit done marker, then `CALL tailscale_down()` | +| Kernel TCP to `100.x:9494` fails from tsnet client | Use `tailscale_quack_forward` | +| `quack_query` + `ATTACH remote` stalls | Run lake queries **before** attach; separate statements | +| `quack_query(…, quack_discover())` hangs | Discover locally or use known hostname | + +### Quack “multiple streaming scans” + +This is a **core `quack` extension** limit, not QuackScale. A single SQL statement cannot perform more than one streaming read (or read + write) on the same attached Quack catalog. + +Fails: + +```sql +INSERT INTO remote.t SELECT 1 WHERE NOT EXISTS (SELECT 1 FROM remote.t); +``` + +Works (separate statements): + +```sql +INSERT INTO remote.t VALUES (1, 'x') ON CONFLICT DO NOTHING; +SELECT * FROM remote.t; +``` + +Upstream: [duckdb/duckdb#22605](https://github.com/duckdb/duckdb/issues/22605). Split statements or use `quack_query` for one-off remote SQL. + +--- + +## Runnable demos + +| Demo | Command | +|------|---------| +| **Two-node cluster + DuckLake** | [examples/README.md](../examples/README.md) | +| **DuckLake compose details** | [examples/ducklake/README.md](../examples/ducklake/README.md) | +| **Host DuckDB → compose stack** | `scripts/local_remote_headscale_test.sh` | +| **Network-only probe** | `docker compose --profile debug run --rm tailscale-probe` | + +```bash +git submodule update --init --recursive +cd examples +docker compose build quacktail-server quacktail-client +docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-client +docker compose up -d --force-recreate headscale quacktail-server +docker compose --profile test run --rm quacktail-client +``` + +Expect `LAKE_PASSED`, `PASSED`, and `✓ Demo passed`. + +--- + +## Further reading + +| Resource | Topic | +|----------|--------| +| [AUTHENTICATION.md](AUTHENTICATION.md) | Tailnet + Quack credentials | +| [DEVELOPMENT.md](DEVELOPMENT.md) | Extension architecture and roadmap | +| [Quack overview](https://duckdb.org/docs/current/quack/overview) | Upstream Quack protocol | +| [DuckLake docs](https://duckdb.org/docs/stable/duckdb/ducklake) | Catalog, Parquet, attach | diff --git a/docs/HEADSCALE.md b/docs/HEADSCALE.md deleted file mode 100644 index 95a94f1..0000000 --- a/docs/HEADSCALE.md +++ /dev/null @@ -1,90 +0,0 @@ -# Headscale (self-hosted control plane) - -[Headscale](https://github.com/juanfont/headscale) is an open-source, self-hosted implementation of the **Tailscale control server**. Tailscale clients (and embedded **tsnet** / [libtailscale](https://github.com/tailscale/libtailscale)) treat it like Tailscale SaaS: you point at a custom **login server** URL and register with a **preauth key** or browser flow. - -QuackScale adds **no Headscale-specific code**. The integration curve is effectively **zero**: use `control_url` on `CALL tailscale_up` / `CALL tailscale_login` the same way you would pass `tailscale up --login-server`. - -| Topic | Doc | -|-------|-----| -| Tailscale SaaS auth keys, browser login | [AUTHENTICATION.md](AUTHENTICATION.md) | -| Quack HTTP tokens on the tailnet | [QUACK_AUTH.md](QUACK_AUTH.md) | -| Example SQL | [../examples/headscale_quacktail.sql](../examples/headscale_quacktail.sql) | - -## Mapping: Tailscale CLI → QuackScale - -| Tailscale CLI | QuackScale | -|---------------|------------| -| `tailscale up --login-server https://hs.example.com` | `control_url => 'https://hs.example.com'` | -| `--authkey tskey-...` or Headscale preauth key | `authkey => '...'` or `TS_AUTHKEY` | -| `--hostname` | `hostname => '...'` | -| state under `~/.local/share/tailscale` | `state_dir => '...'` | - -Headscale preauth keys are created with `headscale preauthkeys create` (not the Tailscale admin UI). See [Headscale — Getting started](https://headscale.net/stable/usage/getting-started/). - -## Minimal server setup - -1. Install and configure Headscale ([releases](https://github.com/juanfont/headscale/releases), [docs](https://headscale.net/)). -2. Set `server_url` in `config.yaml` to the URL **clients** use (e.g. `https://headscale.my.net`). -3. Create a user and reusable key: - -```sh -headscale users create quackscale -headscale preauthkeys create --user 1 --reusable --expiration 168h -``` - -4. On each DuckDB host: - -```sh -export HEADSCALE_URL='https://headscale.my.net' -export HEADSCALE_PREAUTH_KEY='' -export QUACK_TAILNET_TOKEN='' -./build/release/duckdb -``` - -```sql -LOAD quack; -LOAD quackscale; - -CALL tailscale_up( - hostname => 'duckdb-node-a', - control_url => getenv('HEADSCALE_URL'), - authkey => getenv('HEADSCALE_PREAUTH_KEY'), - state_dir => '/var/lib/duckdb/headscale-state' -); - -CALL quack_serve(quack_uri(), allow_other_hostname => true, token => quack_token()); -``` - -## Docker (lab / CI) - -Uses the [official Headscale container](https://headscale.net/stable/setup/install/container/) (`docker.io/headscale/headscale:0.28.0`) — no custom images. Started after checkout on Docker network `quacktail-ci` with hostname alias **`headscale`**. Control URL: **`http://headscale:8080`**. - -```sh -export HEADSCALE_CI_ROOT=$PWD -source scripts/lib/headscale_ci.sh -headscale_ci_start /tmp/headscale-data -./scripts/ci_headscale_smoke.sh # after make release -headscale_ci_stop -``` - -## CI in this repository - -| Workflow | What it validates | -|----------|-------------------| -| [libtailscale-integration.yml](../.github/workflows/libtailscale-integration.yml) | libtailscale `go test` (tstestcontrol) | -| [headscale-integration.yml](../.github/workflows/headscale-integration.yml) | Headscale + `CALL tailscale_up` smoke | -| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | Two-node QuackTail e2e (manual; uses [release](../.github/workflows/Release.yml) binary) | -| [Release.yml](../.github/workflows/Release.yml) | Build linux `duckdb` + quackscale on **Release published** | - -## Notes - -- **DERP / NAT**: Headscale can use public Tailscale DERP relays (`derp.urls` in config) or your own; mesh connectivity depends on your network, not QuackScale. -- **TLS**: Production `server_url` should be `https://…`; lab CI uses plain `http://127.0.0.1:8080`. -- **MagicDNS**: Optional; `quack_uri()` prefers MagicDNS when Headscale provides it, else tailnet IP. -- Headscale is **not** affiliated with Tailscale Inc.; QuackScale links both projects as compatible stacks for QuackTail. - -## Related - -- [Headscale repository](https://github.com/juanfont/headscale) -- [Tailscale tsnet](https://tailscale.com/kb/1522/tsnet-server) -- [AUTHENTICATION.md](AUTHENTICATION.md) diff --git a/docs/PLAN.md b/docs/PLAN.md deleted file mode 100644 index 95fbaaf..0000000 --- a/docs/PLAN.md +++ /dev/null @@ -1,150 +0,0 @@ -# QuackScale — Research & Implementation Plan - -QuackScale is a DuckDB **community extension** that embeds [libtailscale](https://github.com/tailscale/libtailscale) so a DuckDB process can join a tailnet and expose (or reach) the [Quack](https://duckdb.org/docs/current/quack/overview) remote protocol on tailnet addresses instead of only localhost. - -## Problem - -[Quack](https://duckdb.org/docs/current/quack/overview) turns DuckDB into an HTTP server (`quack_serve`) so other DuckDB clients can `ATTACH` or run `quack_query` remotely. By default, `quack_serve` only binds **localhost** unless `allow_other_hostname => true`, and production setups typically use a TLS reverse proxy. - -For private teams, binding Quack on a **Tailscale IP** gives: - -- Encrypted tailnet transport without exposing the service on the public internet -- Stable reachability via MagicDNS / tailnet IPs -- No custom wire protocol — Quack stays HTTP + `application/duckdb` - -QuackScale does **not** reimplement Quack. It brings the process onto the tailnet; Quack remains the core `quack` extension. - -## Architecture - -```mermaid -flowchart LR - subgraph process [DuckDB + QuackScale] - QS[tailscale_up / status] - LT[libtailscale / tsnet] - Q[quack extension] - QS --> LT - Q --> HTTP[HTTP :9494] - LT --> TSIP[tailnet IP] - HTTP --> TSIP - end - Client[DuckDB client] -->|quack:100.x.x.x:9494| TSIP -``` - -### Component roles - -| Component | Role | -|-----------|------| -| **libtailscale** | Userspace Tailscale (tsnet): auth, tailnet IP, listen/dial on tailnet | -| **QuackScale** | C++ extension: lifecycle SQL API, tailnet IP → `quack:` URI helpers | -| **Quack (core)** | HTTP server, serialization, attach/catalog — unchanged | - -### Target user flow (phase 1 — manual compose) - -```sql -LOAD quack; -LOAD quackscale; - --- Join tailnet (authkey via env/secret in production) -CALL tailscale_up(authkey => 'tskey-auth-...', hostname => 'analytics-duck-1'); - --- Advertise Quack on the tailnet (default port 9494) -CALL quack_discover(); - --- Expose Quack on tailnet with shared token (env: QUACK_TAILNET_TOKEN) -CALL quack_serve( - quack_uri(), - allow_other_hostname => true, - token => quack_token() -); - --- Remote client: same token via CREATE SECRET or TOKEN (see QUACK_AUTH.md) -ATTACH 'quack:analytics-duck-1:9494' AS remote (TYPE quack, DISABLE_SSL true); -``` - -Phase 2 may add `quackscale_serve()` that chains up + `quack_serve` in one call (needs stable inter-extension hooks or documented SQL orchestration). - -## Authentication (two layers) - -| Layer | Doc | -|-------|-----| -| **Tailscale** (node on tailnet) | [AUTHENTICATION.md](AUTHENTICATION.md) — `TS_AUTHKEY`, `CALL tailscale_up`, browser login | -| **Quack** (SQL over HTTP) | [QUACK_AUTH.md](QUACK_AUTH.md) — `QUACK_TAILNET_TOKEN`, `quack_token()`, `CREATE SECRET`, `quack_authentication_function` | - -QuackTail fleets should use a **shared Quack token** (or allowlist), not per-server random `auth_token` values from `quack_serve`. See [Quack — Overriding authentication](https://duckdb.org/docs/current/quack/security#overriding-authentication). - -## libtailscale integration notes - -- Built with `go build -buildmode=c-archive` → `libtailscale.a` + generated header (see [libtailscale README](https://github.com/tailscale/libtailscale)). -- C API: `tailscale_new`, `tailscale_up`, `tailscale_getips`, `tailscale_listen` / `tailscale_dial`, etc. ([`tailscale.h`](https://github.com/tailscale/libtailscale/blob/main/tailscale.h)). -- **Build requirement**: Go toolchain + CGO; CMake option `QUACKSCALE_WITH_TAILSCALE` (default ON). -- **CI implication**: extension distribution jobs must install Go; first bootstrapped CI may only run tests that do not call `tailscale_up` without credentials. - -### Risks - -| Risk | Mitigation | -|------|------------| -| Large binary (Go runtime) | Document size; optional `QUACKSCALE_WITH_TAILSCALE=OFF` stub build | -| macOS min OS (libtailscale Makefile targets 15.0 for some Swift paths) | CMake sets `MACOSX_DEPLOYMENT_TARGET=11.0` for archive build; validate in CI | -| Quack API churn (beta) | Pin DuckDB version; integration tests against pinned `quack` | -| Auth secrets in SQL | `TS_AUTHKEY` + `QUACK_TAILNET_TOKEN` via env/secrets; see [QUACK_AUTH.md](QUACK_AUTH.md) | - -## Quack protocol recap (relevant bits) - -From the [Quack overview](https://duckdb.org/docs/current/quack/overview): - -- HTTP(S), default port **9494**, URI scheme `quack:host:port` -- Server: `CALL quack_serve('quack:...', allow_other_hostname => true, token => '...')` -- Client: `ATTACH 'quack:host' AS db (TOKEN '...', DISABLE_SSL true)` for tailnet HTTP without public TLS -- Token auth via secrets or explicit `TOKEN` - -QuackScale’s job is **advertising** reachable **`quack::9494`** endpoints on the tailnet after `tailscale_up` — MagicDNS hostname first (for discovery), plus each tailnet IP. - -## SQL API (bootstrapped) - -| Function | Purpose | -|----------|---------| -| `CALL tailscale_status()` | Whether libtailscale is linked, running, hostname, tailnet IPs | -| `CALL tailscale_up(...)` | Join tailnet; named params: `hostname`, `authkey`, `control_url`, `state_dir`, `ephemeral` | -| `CALL quack_discover(port => 9494)` | All tailnet `quack:` URIs clients can use (default port **9494**) | -| `quack_uri()` | Scalar for `CALL quack_serve(quack_uri(), ...)` (hostname if set, else first IP; port **9494**) | -| `quack_token()` | Scalar — shared token from `QUACK_TAILNET_TOKEN` / `QUACK_TOKEN` env | - -Planned: - -- `tailscale_down()` — `tailscale_close` -- `quack_serve_on_tailnet(port, ...)` — orchestrate Quack when `quack` is loaded -- Settings: default port, auto-load `quack`, state directory -- [x] Headscale CI smoke (`scripts/ci_headscale_smoke.sh`, `.github/workflows/headscale-integration.yml`) -- [x] Two-node QuackTail e2e over Headscale ([`scripts/ci_headscale_e2e.sh`](../scripts/ci_headscale_e2e.sh), [`.github/workflows/headscale-e2e.yml`](../.github/workflows/headscale-e2e.yml)) - -## Repository layout - -``` -duckdb_tailscale/ -├── cmake/Libtailscale.cmake # Go c-archive build -├── third_party/libtailscale/ # git submodule -├── src/ -│ ├── quackscale_extension.cpp -│ └── tailscale_bridge.cpp -├── docs/PLAN.md -├── test/sql/quackscale.test -└── duckdb/ + extension-ci-tools/ # submodules -``` - -## Community extension checklist - -- [x] Fork/bootstrap [extension-template](https://github.com/duckdb/extension-template) -- [x] Rename to `quackscale` -- [x] libtailscale submodule + CMake -- [ ] Green `make` / `make test` locally -- [ ] Add Go to GitHub Actions (custom step or workflow env) -- [ ] PR to [community-extensions](https://github.com/duckdb/community-extensions) descriptor -- [ ] README: install `INSTALL quackscale FROM community` + Quack dependency docs -- [ ] Security section: tailnet ACLs, tokens, no Funnel unless explicit - -## Phased delivery - -1. **Bootstrap (current)** — template, libtailscale link, status/up/uri SQL, plan doc -2. **Quack glue** — docs + example script; optional `quackscale_serve` wrapper -3. **CI hardening** — Go in matrix, optional e2e with test auth key -4. **Community release** — descriptor, versioning aligned with DuckDB 1.5.x + Quack beta diff --git a/docs/QUACK_AUTH.md b/docs/QUACK_AUTH.md deleted file mode 100644 index fd4cec4..0000000 --- a/docs/QUACK_AUTH.md +++ /dev/null @@ -1,258 +0,0 @@ -# Quack authentication on a tailnet (QuackTail) - -This document covers **only Quack** — HTTP application tokens after your node is already on the tailnet. - -For **Tailscale node login** (`TS_AUTHKEY`, browser URL, `state_dir`), see **[AUTHENTICATION.md](AUTHENTICATION.md)**. - -| Doc | Topic | -|-----|--------| -| [AUTHENTICATION.md](AUTHENTICATION.md) | Tailscale — `tailscale_up`, `TS_AUTHKEY`, browser login | -| [PLAN.md](PLAN.md) | Architecture and roadmap | -| [../README.md](../README.md) | End-to-end quick start | - ---- - -## Goal: semi-automatic QuackTail peers - -You want DuckDB servers and clients on the **same tailnet** to: - -1. Find each other via `quack:hostname:9494` (QuackScale + MagicDNS). -2. Authenticate to Quack **without** copying a new random `auth_token` from every `CALL quack_serve`. - -That is supported today using Quack’s built-in **`token =>`** parameter and **[Overriding authentication](https://duckdb.org/docs/current/quack/security#overriding-authentication)** — no QuackScale changes to Quack’s wire protocol. - -## Two layers (do not merge them) - -| Layer | Proves | You configure | -|-------|--------|----------------| -| **Tailscale** | Machine is on the tailnet | `TS_AUTHKEY`, `CALL tailscale_up`, ACLs — [AUTHENTICATION.md](AUTHENTICATION.md) | -| **Quack** | Caller may use this DuckDB session over HTTP | `QUACK_TAILNET_TOKEN`, `CREATE SECRET`, or custom `quack_authentication_function` | - -A host on your tailnet is **not** automatically trusted for SQL. Tailscale is necessary but not sufficient. - -## What Quack does by default (and why you override it) - -From [Quack security — Default configuration](https://duckdb.org/docs/current/quack/security#default-configuration): - -1. `CALL quack_serve(...)` **generates a random token** and returns it in the `auth_token` column — unless you pass `token => '...'`. -2. The default hook `quack_check_token` requires **client token == server token** for that listener. - -That default is fine for a single local experiment. For a **fleet** of QuackTail nodes, you want: - -- The **same** token on every server and client (env / secret manager), **or** -- A **shared allowlist** of valid tokens (SQL table + custom macro). - -QuackScale’s `quack_token()` only helps read a shared token from the environment on the **server** side. Clients still use `CREATE SECRET` or `TOKEN` with the same value. - -## Environment variables (Quack layer) - -Set on **both** servers and clients (container env, systemd, K8s `Secret`, etc.): - -| Variable | Role | -|----------|------| -| `QUACK_TAILNET_TOKEN` | **Preferred** — shared Quack auth token (≥ 4 characters) | -| `QUACK_TOKEN` | Fallback alias if `QUACK_TAILNET_TOKEN` is unset | - -Keep **`TS_AUTHKEY`** separate — it is Tailscale-only ([AUTHENTICATION.md](AUTHENTICATION.md)). - -Example provisioning: - -```sh -export TS_AUTHKEY='tskey-auth-...' -export QUACK_TAILNET_TOKEN='your-shared-quack-secret' -duckdb -``` - ---- - -## Mode 1 — Single shared token (recommended for QuackTail) - -One secret, same everywhere. Matches default `quack_check_token` when server and client use the same string. - -### Server - -```sql -LOAD quack; -LOAD quackscale; - -CALL tailscale_up( - hostname => 'warehouse-a', - state_dir => '/var/lib/duckdb/tailscale' -); - -CALL quack_serve( - quack_uri(), - allow_other_hostname => true, - token => quack_token() -- reads QUACK_TAILNET_TOKEN / QUACK_TOKEN -); - -CALL quack_discover(); -``` - -`quack_token()` fails fast if the env var is missing or shorter than 4 characters (Quack’s minimum). - -Without the helper, pass the literal explicitly: - -```sql -CALL quack_serve(quack_uri(), allow_other_hostname => true, token => 'same-secret-as-clients'); -``` - -### Client — semi-automatic with `CREATE SECRET` - -Create the secret once per client process (inject the token from your shell/env when launching DuckDB — do not hardcode in shared SQL files): - -```sql -LOAD quack; - -CREATE SECRET ( - TYPE quack, - TOKEN 'your-shared-quack-secret', - SCOPE 'quack:warehouse-a:9494' -); - -ATTACH 'quack:warehouse-a:9494' AS warehouse ( - TYPE quack, - DISABLE_SSL true -); - -FROM warehouse.query('SELECT 42'); -``` - -- **`SCOPE`** must match how clients reach the server (`quack::9494`). Use the hostname from `CALL tailscale_up(hostname => 'warehouse-a')`. -- After the secret exists, `ATTACH` needs no `TOKEN` clause — Quack picks it up automatically ([overview — Authentication](https://duckdb.org/docs/current/quack/overview#authentication)). - -### Client — explicit token per attach - -```sql -ATTACH 'quack:warehouse-a:9494' AS warehouse ( - TYPE quack, - TOKEN 'your-shared-quack-secret', - DISABLE_SSL true -); -``` - -### Client — stateless `quack_query` - -```sql -FROM quack_query( - 'quack:warehouse-a:9494', - 'SELECT 42', - token => 'your-shared-quack-secret', - disable_ssl => true -); -``` - ---- - -## Mode 2 — Shared allowlist (multiple tokens / rotation) - -When several tokens should work across the fleet (teams, rotation, read-only clients), use Quack’s **[multi-token table](https://duckdb.org/docs/current/quack/security#example-multi-token-table)** pattern. - -Run once per DuckDB database (or in your bootstrap migration): - -```sql -CREATE TABLE quacktail_tokens ( - auth_token VARCHAR PRIMARY KEY, - label VARCHAR -); - -INSERT INTO quacktail_tokens VALUES - ('primary-team-token-2026', 'analytics'), - ('readonly-team-token-2026', 'readonly'); - -CREATE MACRO quacktail_check_token(sid, client_token, server_token) AS ( - EXISTS (SELECT 1 FROM quacktail_tokens WHERE auth_token = client_token) -); - -SET GLOBAL quack_authentication_function = 'quacktail_check_token'; -``` - -Important behavior ([Overriding authentication](https://duckdb.org/docs/current/quack/security#overriding-authentication)): - -| Argument | Meaning | -|----------|---------| -| `client_token` | What the client sent (`TOKEN`, secret, or `quack_query`) — **validate this** | -| `server_token` | From `quack_serve(token => ...)` — you may **ignore** it when using a table | - -Any token listed in `quacktail_tokens` is accepted on **every** server that uses this macro (unless you add per-server logic). - -Start the server with a token that satisfies Quack’s length check (still pass `token =>`): - -```sql -CALL quack_serve(quack_uri(), allow_other_hostname => true, token => quack_token()); -``` - -Clients use **their** token from the allowlist (via `CREATE SECRET` or `TOKEN`). - -Populate `quacktail_tokens` from your deployment tool at startup (INSERT from env). The auth callback runs in a **fresh transient connection** — it cannot read your shell env by itself. - ---- - -## Mode 3 — Developer mode (tailnet is the only gate) - -Only on isolated lab tailnets. Admits **all** Quack clients with no token check: - -```sql -CREATE MACRO quacktail_dev_auth(sid, client_token, server_token) AS true; -SET GLOBAL quack_authentication_function = 'quacktail_dev_auth'; -``` - -See [developer mode](https://duckdb.org/docs/current/quack/security#example-developer-mode-always-allow). **Not for production.** - ---- - -## End-to-end checklist - -**On each server** - -1. `export TS_AUTHKEY` and `export QUACK_TAILNET_TOKEN` -2. `LOAD quack; LOAD quackscale;` -3. `CALL tailscale_up(hostname => '...', state_dir => '...');` -4. (Optional) `SET GLOBAL quack_authentication_function` if using Mode 2 or 3 -5. `CALL quack_serve(quack_uri(), allow_other_hostname => true, token => quack_token());` - -**On each client** - -1. Same `QUACK_TAILNET_TOKEN` available when creating secrets or attaching -2. `LOAD quack;` -3. `CREATE SECRET (TYPE quack, TOKEN '...', SCOPE 'quack::9494');` -4. `ATTACH 'quack::9494' AS ... (TYPE quack, DISABLE_SSL true);` - -**Network** - -- Tailscale ACL: allow tagged nodes → TCP 9494 on peers -- Quack default port: **9494** ([overview](https://duckdb.org/docs/current/quack/overview)) - ---- - -## Comparison: default vs QuackTail - -| | Default Quack | QuackTail (Mode 1) | -|---|---------------|---------------------| -| Server token | Random per `quack_serve` | Fixed via `QUACK_TAILNET_TOKEN` / `quack_token()` | -| Client setup | Copy `auth_token` from server output | Same env secret or `CREATE SECRET` | -| Discovery | Manual URI | `CALL quack_discover()`, `quack_uri()` | -| Transport | Often localhost | Tailscale tailnet + `allow_other_hostname => true` | - ---- - -## What QuackScale does not do (yet) - -- Does not call `quack_serve` or install auth macros automatically — compose SQL after `CALL tailscale_up` -- Does not sync tokens over Tailscale — use env, K8s, Vault, etc. -- Planned: `quacktail_serve()` helper chaining tailnet up + shared token + `quack_serve` - ---- - -## Security notes - -- Rotate `QUACK_TAILNET_TOKEN` like an API key; update servers, client secrets, and `quacktail_tokens` together -- Use [Tailscale ACLs](https://tailscale.com/kb/1018/acls) to limit who reaches port 9494 -- `allow_other_hostname => true` is for tailnet binds — do not expose raw Quack to the public internet without a TLS reverse proxy ([Quack security — Exposure model](https://duckdb.org/docs/current/quack/security#exposure-model)) - -## References - -- [Quack security](https://duckdb.org/docs/current/quack/security) — overriding authentication & authorization -- [Quack overview — Authentication](https://duckdb.org/docs/current/quack/overview#authentication) -- [DuckDB secrets manager](https://duckdb.org/docs/current/configuration/secrets_manager) -- [Tailscale authentication (QuackScale)](AUTHENTICATION.md) diff --git a/docs/QUACK_STREAMING.md b/docs/QUACK_STREAMING.md deleted file mode 100644 index a5c988a..0000000 --- a/docs/QUACK_STREAMING.md +++ /dev/null @@ -1,65 +0,0 @@ -# Quack “Multiple streaming scans” limitation - -This is **not** a QuackScale (`quackscale`) limitation. It comes from the **core `quack` extension** shipped with DuckDB. - -## Source code - -[`duckdb-quack/src/storage/quack_optimizer.cpp`](https://github.com/duckdb/duckdb-quack/blob/main/src/storage/quack_optimizer.cpp) - -Before executing a query, `QuackOptimizer` walks the plan and counts, per Quack connection: - -- **streaming scans** — reads from attached Quack tables (`LogicalGet` on Quack scans) -- **writes** — `INSERT` / `CREATE TABLE AS` targeting a Quack catalog - -If `scans + inserts > 1` **within the same query**, it throws: - -```text -Not implemented Error: Multiple streaming scans or streaming scans + CTAS / insert in the same query are not currently supported -``` - -## What triggers it - -Any **single SQL statement** that both: - -1. reads from an attached Quack catalog (`remote.table`, `FROM remote…`, subqueries), and -2. writes to the same attached Quack catalog (`INSERT INTO remote…`, CTAS into `remote`) - -Examples that fail: - -```sql --- INSERT + correlated read in one statement -INSERT INTO remote.t -SELECT 1, 'x' -WHERE NOT EXISTS (SELECT 1 FROM remote.t WHERE id = 1); - --- Multiple remote reads in one statement (e.g. SHOW TABLES on nested Quack catalogs) -SHOW TABLES; -``` - -Examples that work (separate statements, one remote op each): - -```sql -ATTACH 'quack:host:9494' AS remote (TYPE quack, DISABLE_SSL true); - -INSERT INTO remote.t VALUES (1, 'x') -ON CONFLICT DO NOTHING; - -SELECT * FROM remote.t; -``` - -`CALL tailscale_up()` is a **local** QuackScale table function — it is **not** a Quack streaming scan and is not the cause of this error. - -## Upstream status - -- Reported in [duckdb/duckdb#22605](https://github.com/duckdb/duckdb/issues/22605) (remote catalog / `SHOW TABLES`). -- A community PR to lift the restriction ([duckdb/duckdb-quack#126](https://github.com/duckdb/duckdb-quack/pull/126)) was **not merged** as of May 2026 — maintainers want smaller, incremental changes. - -QuackScale cannot patch this inside `quackscale`; fixes belong in **`duckdb-quack`** (or query shape/workarounds in client SQL). - -## Demo / DuckLake guidance - -For attached remote writes: - -- Prefer plain `INSERT INTO remote.t VALUES (…)` or `ON CONFLICT DO NOTHING` for idempotency. -- Avoid `INSERT … SELECT … WHERE NOT EXISTS (SELECT … FROM remote.t)` in one statement. -- Split read and write into **separate SQL statements**, or use `quack_query(uri, '…')` for one-off remote SQL when ATTACH + DML in one plan is awkward. diff --git a/docs/README.md b/docs/README.md index 3ea18f3..381aa4d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,23 +1,36 @@ -# QuackScale documentation +# QuackTail documentation + +QuackTail is **DuckDB + Quack + QuackScale** on a private tailnet (Tailscale or Headscale). These docs are for **integrators** — operators wiring servers, clients, tokens, and lake catalogs — not for extension C++ development. ## Start here -1. **[usage.md](usage.md)** — **use cases & solution patterns** (Quack, DuckLake, S3, discovery) -2. **[../README.md](../README.md)** — build, quick start, SQL reference -3. **[AUTHENTICATION.md](AUTHENTICATION.md)** — Tailscale (`TS_AUTHKEY`, `tailscale_up`, browser login) -4. **[HEADSCALE.md](HEADSCALE.md)** — self-hosted [Headscale](https://github.com/juanfont/headscale) (`control_url`, preauth keys) -5. **[QUACK_AUTH.md](QUACK_AUTH.md)** — Quack tokens for QuackTail (`QUACK_TAILNET_TOKEN`, shared secrets, auth macros) -6. **[DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md)** — DuckLake over the tailnet (compose demo, patterns B/C) -7. **[DUCKLAKE_REMOTE_ATTACH.md](DUCKLAKE_REMOTE_ATTACH.md)** — transparent remote lake reads (`attach_ducklake`) -8. **[PLAN.md](PLAN.md)** — architecture, API roadmap, risks -9. **[../examples/README.md](../examples/README.md)** — Docker Compose two-node Headscale demo - -## QuackTail authentication at a glance - -| Step | Layer | Action | -|------|--------|--------| -| 1 | Tailscale | `export TS_AUTHKEY=...` → `CALL tailscale_up(hostname => 'node-a', ...)` | -| 2 | Quack (server) | `export QUACK_TAILNET_TOKEN=...` → `CALL quack_serve(..., token => quack_token())` + `tailscale_serve_local` | -| 3 | Quack (client) | `CALL tailscale_quack_forward(...)` → `CREATE SECRET` → `ATTACH 'quack:127.0.0.1:19494'` | - -Do **not** rely on the random `auth_token` column from default `quack_serve`. Use a **shared** token or [override `quack_authentication_function`](https://duckdb.org/docs/current/quack/security#overriding-authentication). +| Document | Read when you need to… | +|----------|-------------------------| +| **[GUIDE.md](GUIDE.md)** | Pick a pattern, run use cases, connect clients, query DuckLake, avoid known pitfalls | +| **[AUTHENTICATION.md](AUTHENTICATION.md)** | Configure tailnet login, Headscale, and Quack HTTP tokens | +| **[../examples/README.md](../examples/README.md)** | Run the two-node Docker Compose demo | +| **[../README.md](../README.md)** | Build the extension from source and SQL command reference | + +## Extension developers + +| Document | Contents | +|----------|----------| +| **[DEVELOPMENT.md](DEVELOPMENT.md)** | Architecture, roadmap, updating DuckDB submodules, CI | + +## Quick orientation + +```text +Tailscale / Headscale → Is this machine on our mesh? +Quack token → May this caller run SQL over HTTP? +tailscale_quack_forward → Route Quack from embedded tsnet to 127.0.0.1 +quack_serve + serve_local → Expose DuckDB on the tailnet (:9494) +``` + +Load both extensions in every session: + +```sql +LOAD quack; -- HTTP server, ATTACH, quack_query +LOAD quackscale; -- tailscale_up, forwarder, attach_ducklake, … +``` + +Do **not** copy the random `auth_token` from each `CALL quack_serve`. Use a **shared** fleet token — see [AUTHENTICATION.md](AUTHENTICATION.md). diff --git a/docs/UPDATING.md b/docs/UPDATING.md deleted file mode 100644 index a3ac73e..0000000 --- a/docs/UPDATING.md +++ /dev/null @@ -1,23 +0,0 @@ -# Extension updating -When cloning this template, the target version of DuckDB should be the latest stable release of DuckDB. However, there -will inevitably come a time when a new DuckDB is released and the extension repository needs updating. This process goes -as follows: - -- Bump submodules - - `./duckdb` should be set to latest tagged release - - `./extension-ci-tools` should be set to updated branch corresponding to latest DuckDB release. So if you're building for DuckDB `v1.1.0` there will be a branch in `extension-ci-tools` named `v1.1.0` to which you should check out. -- Bump versions in `./github/workflows` - - `duckdb_version` input in `duckdb-stable-build` job in `MainDistributionPipeline.yml` should be set to latest tagged release - - `duckdb_version` input in `duckdb-stable-deploy` job in `MainDistributionPipeline.yml` should be set to latest tagged release - - the reusable workflow `duckdb/extension-ci-tools/.github/workflows/_extension_distribution.yml` for the `duckdb-stable-build` job should be set to latest tagged release - -# API changes -DuckDB extensions built with this extension template are built against the internal C++ API of DuckDB. This API is not guaranteed to be stable. -What this means for extension development is that when updating your extensions DuckDB target version using the above steps, you may run into the fact that your extension no longer builds properly. - -Currently, DuckDB does not (yet) provide a specific change log for these API changes, but it is generally not too hard to figure out what has changed. - -For figuring out how and why the C++ API changed, we recommend using the following resources: -- DuckDB's [Release Notes](https://github.com/duckdb/duckdb/releases) -- DuckDB's history of [Core extension patches](https://github.com/duckdb/duckdb/commits/main/.github/patches/extensions) -- The git history of the relevant C++ Header file of the API that has changed \ No newline at end of file diff --git a/docs/usage.md b/docs/usage.md deleted file mode 100644 index f91852f..0000000 --- a/docs/usage.md +++ /dev/null @@ -1,471 +0,0 @@ -# QuackTail usage guide - -QuackTail combines three ideas: - -1. **Tailscale (or Headscale)** — private mesh network between nodes -2. **Quack** — DuckDB’s HTTP client/server protocol (`quack:` URIs) -3. **QuackScale** — joins DuckDB to the tailnet and bridges Quack across it - -Optional fourth ingredient: **DuckLake** — transactional lakehouse catalog + Parquet, which [integrates with Quack in DuckDB 1.5.3+](https://duckdb.org/2026/05/20/announcing-duckdb-153.html). - -This guide is for **designing solutions**: what works today, how the compose demo maps to real deployments, and how to grow toward S3, many lakes, and fleet discovery. - ---- - -## What you can build - -| Idea | Building blocks | Tailnet role | -|------|-----------------|--------------| -| **Shared analytics DB** | `quack_serve` + client `ATTACH` | One DuckDB process serves tables; many clients query/write over Quack | -| **Edge ingest → central DuckDB** | Quack `INSERT` from clients | Laptops/containers push rows to a central server without copying files | -| **Lakehouse catalog server** | DuckLake on server + Quack | Server owns metadata + Parquet; clients query lake tables remotely | -| **Distributed lake readers** | `ducklake:quack:` + shared `DATA_PATH` | Catalog over Quack; Parquet on S3 or a path every reader can see | -| **Hybrid** | Quack primary DB + attached DuckLake | Operational tables via `remote.*`; historical Parquet via lake SQL | - -QuackScale’s job is **not** to replace Quack or DuckLake — it makes them reachable on `100.x.x.x` / MagicDNS without exposing the public internet. - ---- - -## Mental model - -```text -┌─────────────────────────────────────────────────────────────────┐ -│ quacktail-server (long-lived) │ -│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │ -│ │ quackscale │──►│ tsnet │──►│ tailnet :9494 │ │ -│ │ tailscale_up│ │ serve_local │ │ (MagicDNS / 100.x) │ │ -│ └─────────────┘ └──────────────┘ └───────────┬─────────────┘ │ -│ ┌─────────────┐ quack_serve(127.0.0.1:9494) │ │ -│ │ quack │◄────────────────────────────────┘ │ -│ │ DuckDB │ optional: ATTACH ducklake:… AS lake │ -│ └─────────────┘ Parquet → local disk or s3://bucket/prefix/ │ -└─────────────────────────────────────────────────────────────────┘ - ▲ - │ tailscale_dial (encrypted) - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ quacktail-client (batch job, laptop, second container) │ -│ tailscale_up → tailscale_quack_forward → quack:127.0.0.1:19494 │ -│ ATTACH / quack_query / ducklake:quack:… │ -│ CALL tailscale_down() ← one-shot clients must tear down tsnet │ -└─────────────────────────────────────────────────────────────────┘ -``` - -**Two credentials, always:** - -| Layer | Question | Typical secret | -|-------|----------|----------------| -| Tailscale | Is this node on our mesh? | `TS_AUTHKEY` / Headscale preauth key | -| Quack | May this caller run SQL? | `QUACK_TAILNET_TOKEN` + `CREATE SECRET` | - -See [AUTHENTICATION.md](AUTHENTICATION.md) and [QUACK_AUTH.md](QUACK_AUTH.md). - ---- - -## Choose a pattern - -```text -Need remote DuckDB tables (CRUD, dashboards)? - └─► Pattern A: ATTACH 'quack:…' AS remote - -Need lakehouse tables (Parquet, time travel, many files)? - ├─► Server owns all Parquet? - │ └─► Pattern B: quack_query(server, 'SELECT … FROM lake.t') - └─► Clients can read same Parquet path (disk mount, S3, GCS)? - └─► Pattern C: ATTACH 'ducklake:quack:…' AS lake (DATA_PATH 's3://…') - -Both operational + lake on one node? - └─► Pattern D: Hybrid (B + A in one session — mind ordering & limits) -``` - -### Pattern comparison - -| Pattern | Client SQL | Parquet location | Best for | -|---------|------------|------------------|----------| -| **A — Quack attach** | `ATTACH 'quack:host:9494' AS remote` | Server DuckDB file / memory | Shared tables, multi-writer Quack | -| **B — quack_query lake** | `quack_query(uri, 'SELECT … FROM lake.t')` | Server-only paths | Compose demo fallback | -| **B+ — remote lake views** | `CALL attach_ducklake(...)` then `SELECT … FROM lake.t` | Server-only paths | **Preferred** when quackscale ≥ ducklake branch ([DUCKLAKE_REMOTE_ATTACH.md](DUCKLAKE_REMOTE_ATTACH.md)) | -| **C — ducklake:quack** | `ATTACH 'ducklake:quack:host' AS lake (DATA_PATH '…')` | Shared object store or mount | Fleet of readers, [DuckDB 1.5.3 pattern](https://duckdb.org/2026/05/20/announcing-duckdb-153.html) | -| **D — Hybrid** | B first, then A (separate statements) | Mixed | Apps + analytics on one tailnet node | - -**Common mistake:** `SELECT * FROM remote.lake.inventory` — plain Quack attach exposes the **primary catalog only**, not nested attached DuckLake databases. Use pattern B or C instead. - ---- - -## Use case 1 — Remote DuckDB over Quack (analytics hub) - -**Story:** A central DuckDB node serves live tables to analysts and services on the tailnet. - -### Server (long-lived) - -```sql -LOAD quack; -LOAD quackscale; - -CALL tailscale_up( - hostname => 'analytics-hub', - state_dir => '/var/lib/quacktail/hub', - authkey => getenv('TS_AUTHKEY') -- or Headscale preauth key -); - -CREATE TABLE IF NOT EXISTS events (id INTEGER, payload VARCHAR, ts TIMESTAMP); -INSERT INTO events VALUES (1, 'hello-tailnet', now()); - -CALL quack_serve( - 'quack:127.0.0.1:9494', - allow_other_hostname => true, - token => quack_token() -); -CALL tailscale_serve_local(port => 9494); - --- What clients can use: -FROM quack_discover(); -``` - -Keep this process running (systemd, Kubernetes, `quacktail-server` container). Do **not** call `tailscale_down()` on the server. - -### Client (analyst laptop or job) - -```sql -LOAD quack; -LOAD quackscale; - -CALL tailscale_up(hostname => 'analyst-laptop', state_dir => '…', authkey => '…'); - --- Forward tailnet Quack to loopback (required when client uses embedded tsnet) -CALL tailscale_quack_forward(host => 'analytics-hub', port => 9494, local_port => 19494); - -CREATE SECRET ( - TYPE quack, - TOKEN getenv('QUACK_TAILNET_TOKEN'), - SCOPE 'quack:127.0.0.1:19494' -); - -ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); - -SELECT * FROM remote.events ORDER BY ts DESC LIMIT 10; -INSERT INTO remote.events VALUES (2, 'from-client', now()); - --- One-shot jobs: tear down so the process exits -DETACH remote; -CALL tailscale_down(); -``` - -**Expand:** add read replicas (multiple Quack servers), token allowlists ([Quack security](https://duckdb.org/docs/current/quack/security)), TLS termination in front of Quack for non-tailnet clients. - -**Runnable demo:** [examples/README.md](../examples/README.md) - ---- - -## Use case 2 — DuckLake on a QuackTail node (server owns Parquet) - -**Story:** One node holds the DuckLake catalog and Parquet files; tailnet clients query inventory, metrics, or historical tables without copying the lake. - -### Server - -```sql -LOAD quack; -LOAD ducklake; -LOAD quackscale; - -CALL tailscale_up(hostname => 'lake-server', …); - -ATTACH 'ducklake:/data/lake/metadata/warehouse.ducklake' AS warehouse ( - DATA_PATH '/data/lake/parquet/' -); -USE warehouse; - -CREATE TABLE IF NOT EXISTS inventory (item_id INTEGER, quantity INTEGER); -INSERT INTO inventory VALUES (101, 50), (102, 120); - -CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => quack_token()); -CALL tailscale_serve_local(port => 9494); -``` - -Persist `/data/lake/` on disk, EBS, or sync to object storage out-of-band. - -### Client — query via `quack_query` (verified in compose) - -Run lake SQL **on the server** through stateless Quack HTTP. Use **`quack_query` before `ATTACH remote`** in the same session. - -```sql -LOAD quack; -LOAD quackscale; - -CALL tailscale_up(…); -CALL tailscale_quack_forward(host => 'lake-server', port => 9494, local_port => 19494); - -CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); - --- Lake query (executes on server where `warehouse` is attached) -FROM quack_query( - 'quack:127.0.0.1:19494', - 'SELECT * FROM warehouse.inventory ORDER BY item_id', - token => '…', - disable_ssl => true -); - --- Optional: also use Quack attach for non-lake tables -ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); -SELECT * FROM remote.some_operational_table; - -DETACH remote; -CALL tailscale_down(); -``` - -**Expand to S3 on the server:** attach DuckLake with `DATA_PATH 's3://my-bucket/lake/parquet/'` ([DuckLake remote data path](https://duckdb.org/docs/stable/duckdb/guides/using_a_remote_data_path)). Clients keep using `quack_query` — they never need direct S3 credentials if the server holds them. - -**Runnable demo:** [examples/ducklake/README.md](../examples/ducklake/README.md) (branch `ducklake`) - ---- - -## Use case 3 — DuckLake over Quack with shared Parquet (client-side `DATA_PATH`) - -**Story:** Catalog metadata flows over Quack; every reader resolves Parquet from a **shared** location (S3, GCS, NFS, read-only volume). - -Official [DuckDB 1.5.3 example](https://duckdb.org/2026/05/20/announcing-duckdb-153.html): - -### Server (catalog via Quack only) - -```sql -LOAD quack; -LOAD quackscale; - -CALL tailscale_up(hostname => 'lake-catalog', …); -CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => quack_token()); -CALL tailscale_serve_local(port => 9494); -``` - -### Client - -```sql -LOAD ducklake; -LOAD quack; -LOAD quackscale; - -CALL tailscale_up(…); -CALL tailscale_quack_forward(host => 'lake-catalog', port => 9494, local_port => 19494); - -CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); - -ATTACH 'ducklake:quack:127.0.0.1:19494' AS warehouse ( - DATA_PATH 's3://my-bucket/lake/parquet/' -); --- Or: DATA_PATH '/mnt/shared/lake/parquet/' when NFS/K8s volume is mounted identically - -SELECT * FROM warehouse.inventory; -INSERT INTO warehouse.inventory VALUES (103, 7); - -CALL tailscale_down(); -``` - -**When to choose this:** many readers, object-store Parquet, clients need local DuckLake semantics (attach, `USE`, DML) — not just single-statement `quack_query`. - -**Requirements:** - -- `DATA_PATH` must be reachable from **each client** (same bucket prefix or shared mount). -- Configure DuckDB `httpfs` / cloud secrets on clients for `s3://` paths. -- Align with [DuckLake attach options](https://duckdb.org/docs/stable/duckdb/attach) (`OVERRIDE_DATA_PATH`, etc.) when migrating paths. - ---- - -## Use case 4 — Hybrid hub (Quack tables + DuckLake) - -**Story:** Same tailnet node serves live operational tables **and** a lake catalog. - -```text -analytics-hub -├── primary catalog → remote.orders, remote.users (Pattern A) -└── ATTACH ducklake AS warehouse → lake SQL via Pattern B or C -``` - -**Session discipline:** - -1. Lake reads/writes: `quack_query(…, '… lake …')` **or** `ducklake:quack:` attach -2. Operational tables: `ATTACH 'quack:…' AS remote` -3. **One remote Quack read/write per SQL statement** ([QUACK_STREAMING.md](QUACK_STREAMING.md)) -4. End one-shot clients with `DETACH remote; CALL tailscale_down();` - ---- - -## Connection recipe (tailnet client) - -This sequence is what the compose e2e proves end-to-end: - -```sql -LOAD quackscale; -CALL tailscale_up(hostname => '…', control_url => '…', authkey => '…', …); - -CALL tailscale_quack_forward(host => 'peer-hostname', port => 9494, local_port => 19494); -CALL tailscale_ping(host => 'peer-hostname', port => 9494); -- optional readiness - -LOAD quack; -CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); - -FROM quack_query('quack:127.0.0.1:19494', 'SELECT 1 AS probe', token => '…', disable_ssl => true); - --- Pattern B/C/D statements here … - -ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); -- Pattern A --- … - -DETACH remote; -CALL tailscale_down(); -``` - -**Why `tailscale_quack_forward`?** Quack clients use normal HTTP/TCP. Embedded tsnet does not automatically route kernel TCP to tailnet IPs. The forwarder listens on `127.0.0.1:19494` and dials the peer via `tailscale_dial`. - -**Why `tailscale_down`?** `tailscale_up` and the forwarder start background threads. One-shot DuckDB processes **hang after SQL finishes** unless you shut tsnet down. - ---- - -## Finding peers on the tailnet - -| Method | Scope | Status | -|--------|--------|--------| -| **`CALL tailscale_quack_forward(…)`** | Returns `quack_uri` for a **known** host | **Use today** | -| **`FROM quack_discover()`** on **this** node | Lists URIs this node would advertise | **Use today** (server-side) | -| **Config / service registry** | Helm values, Consul, env vars | **Use today** (operations) | -| **`quack_query(…, 'FROM quack_discover()')`** | Remote discover via Quack | **Avoid** — can deadlock on server | -| **`ducklake_discover()`** | Enriched discovery (lake + Quack) | **Planned** ([DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md)) | - -**Practical fleet pattern today:** - -1. Deploy lake/analytics nodes with stable Headscale/Tailscale hostnames (`analytics-hub`, `lake-server`). -2. Document `quack_uri` from server bootstrap (`quack_discover()` in server init logs). -3. Clients use MagicDNS names in `tailscale_quack_forward(host => 'analytics-hub', …)`. - -**Multiple lakes on one server:** attach each with a distinct alias: - -```sql -ATTACH 'ducklake:/data/sales.ducklake' AS sales (DATA_PATH 's3://bucket/sales/'); -ATTACH 'ducklake:/data/support.ducklake' AS support (DATA_PATH 's3://bucket/support/'); -``` - -Clients query with fully qualified names: - -```sql -FROM quack_query(uri, 'SELECT * FROM sales.orders LIMIT 100', …); -FROM quack_query(uri, 'SELECT count(*) FROM support.tickets', …); -``` - -Each call is one statement — compatible with Quack streaming limits. - -**Multiple Quack servers:** one forwarder port per peer (`19494`, `19495`, …) or sequential client sessions: - -```sql -CALL tailscale_quack_forward(host => 'hub-a', port => 9494, local_port => 19494); --- work with hub-a … -CALL tailscale_down(); - -CALL tailscale_up(…); -- or reuse session if your app keeps tsnet up -CALL tailscale_quack_forward(host => 'hub-b', port => 9494, local_port => 19494); -``` - ---- - -## Expanding toward production - -### Object storage (S3 / GCS / Azure) - -| Role | Approach | -|------|----------| -| **Server owns lake** | `DATA_PATH 's3://bucket/prefix/'` in server `ATTACH ducklake:…`; clients use Pattern B | -| **Readers with credentials** | Pattern C — each client `ATTACH 'ducklake:quack:…' (DATA_PATH 's3://…')` | -| **Inline small files** | DuckLake [data inlining](https://duckdb.org/docs/stable/duckdb/ducklake) — future Quack+DuckLake perf wins on tailnet | - -Load `httpfs` / cloud extensions and set secrets on whichever node reads/writes `s3://`. - -### Persistence & lifecycle - -| Deployment | `state_dir` | `tailscale_down` | -|------------|-------------|------------------| -| Long-lived server | Persistent volume | **Never** on steady state | -| Cron / CI job | Ephemeral or persistent | **Always** at end | -| Compose client profile | `/tmp/client-tailscale` | **Always** (see compose bootstrap) | - -DuckLake metadata: file (`*.ducklake`), Postgres, or DuckDB file — see [DuckLake catalog options](https://duckdb.org/docs/stable/duckdb/attach). - -### Security hardening - -- Rotate `QUACK_TAILNET_TOKEN`; use [multi-token tables](https://duckdb.org/docs/current/quack/security#example-multi-token-table) -- Prefer Headscale ACLs ([examples compose policy](../examples/docker-compose.yml)) -- Do not commit auth keys; use K8s secrets / systemd `EnvironmentFile` -- Consider TLS in front of Quack for non-tailnet callers (QuackScale handles tailnet encryption only) - -### Observability - -- Server: `CALL tailscale_status()`, Quack logs, `/work/server.log` in compose -- Client: `/work/client.out` -- Readiness: `CALL tailscale_ping(host => 'peer', port => 9494)` before heavy queries - ---- - -## Limitations & workarounds - -| Issue | Workaround | -|-------|------------| -| `remote.lake.table` does not exist | Use `quack_query` or `ducklake:quack:` (patterns B/C) | -| Multiple Quack scans in one SQL | Split statements; see [QUACK_STREAMING.md](QUACK_STREAMING.md) | -| `quack_query` + `ATTACH remote` stalls | Run lake `quack_query` **before** attach; separate statements | -| Client hangs after success | `CALL tailscale_down()` (and `DETACH remote`) | -| `quack_query(…, quack_discover())` hangs | Discover locally or via known hostname — not via remote quack_query | -| Kernel TCP to `100.x:9494` fails from tsnet client | Use `tailscale_quack_forward` | - ---- - -## Runnable demos - -| Demo | Command | Proves | -|------|---------|--------| -| **Quack two-node cluster** | [examples/README.md](../examples/README.md) | tailnet + forward + Quack ATTACH | -| **DuckLake + Quack** | [examples/ducklake/README.md](../examples/ducklake/README.md) | Pattern B lake queries over tailnet | -| **Host DuckDB → compose stack** | `scripts/local_remote_headscale_test.sh` | Laptop joins same Headscale | -| **Vanilla tailnet probe** | `docker compose --profile debug run tailscale-probe` | Network vs DuckDB isolation | - -Quick start: - -```bash -cd examples -docker compose build quacktail-server quacktail-client -docker compose up -d headscale quacktail-server -docker compose --profile test run --rm quacktail-client # Quack + DuckLake on ducklake branch -``` - ---- - -## Sketch: multi-lake SaaS on one tailnet (future-friendly) - -```text - ┌─ sales.ducklake ── s3://tenant-a/sales/ -lake-server ────────┼─ metrics.ducklake ─ s3://tenant-a/metrics/ - quack_serve └─ archive.ducklake ─ s3://tenant-a/archive/ - ▲ - │ quack_query / ducklake:quack: - │ - ┌────┴────┬────────────┐ - │ │ │ - BI tool ETL job notebook - (Pattern C) (B) (B + A) -``` - -1. **ETL** (batch): `quack_query` to run server-side `COPY` / `INSERT` into lake tables. -2. **BI** (interactive): `ducklake:quack:` + `DATA_PATH` on S3 with read-only IAM. -3. **Ops** (live): Quack `ATTACH` to `remote.*` staging tables before lake merge. - -QuackScale roadmap for richer discovery: [PLAN.md](PLAN.md), [DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md). - ---- - -## Further reading - -| Doc | Topic | -|-----|--------| -| [README.md](../README.md) | Build, SQL reference | -| [AUTHENTICATION.md](AUTHENTICATION.md) | Tailscale keys, browser login | -| [HEADSCALE.md](HEADSCALE.md) | Self-hosted control plane | -| [QUACK_AUTH.md](QUACK_AUTH.md) | Shared Quack tokens | -| [QUACK_STREAMING.md](QUACK_STREAMING.md) | One remote op per statement | -| [DUCKLAKE_TAILNET.md](DUCKLAKE_TAILNET.md) | Lake-specific tailnet notes | -| [Quack overview](https://duckdb.org/docs/current/quack/overview) | Upstream Quack protocol | -| [DuckLake docs](https://duckdb.org/docs/stable/duckdb/ducklake) | Catalog, Parquet, attach | diff --git a/examples/README.md b/examples/README.md index b12e812..5c97753 100644 --- a/examples/README.md +++ b/examples/README.md @@ -2,7 +2,7 @@ Two-node **Headscale + QuackTail** demo on Linux: server joins the tailnet and serves Quack; client `ATTACH`es via `tailscale_quack_forward`. -**DuckLake combo (branch `ducklake`):** [ducklake/README.md](ducklake/README.md) +**Integration guide:** [docs/GUIDE.md](../docs/GUIDE.md) · **DuckLake demo:** [ducklake/README.md](ducklake/README.md) **Requires:** Linux, Docker Compose v2, `/dev/net/tun`, outbound HTTPS. @@ -178,4 +178,4 @@ docker compose --profile test run --rm quacktail-client **Client logs:** `docker compose exec quacktail-server cat /work/client.out` (last run, shared volume) -See also [docs/AUTHENTICATION.md](../docs/AUTHENTICATION.md) (Tailscale + forwarder) and [docs/QUACK_AUTH.md](../docs/QUACK_AUTH.md) (Quack tokens). +See also [docs/GUIDE.md](../docs/GUIDE.md) (integration patterns) and [docs/AUTHENTICATION.md](../docs/AUTHENTICATION.md) (credentials). diff --git a/examples/ducklake/README.md b/examples/ducklake/README.md index 61aae51..e6d06c8 100644 --- a/examples/ducklake/README.md +++ b/examples/ducklake/README.md @@ -1,6 +1,6 @@ # DuckLake + Quack on QuackTail -Branch **`ducklake`** extends the compose demo: the server attaches a local DuckLake catalog, seeds an `inventory` table, then `quack_serve` exposes it on the tailnet. The client queries the lake **via `quack_query`** (SQL runs on the server where DuckLake is attached). +The compose demo on branch **`ducklake`**: the server attaches a local DuckLake catalog, seeds `inventory`, and exposes it on the tailnet. The client queries the lake via **`attach_ducklake`** (preferred) or `quack_query`. ## Architecture @@ -8,8 +8,8 @@ Branch **`ducklake`** extends the compose demo: the server attaches a local Duck quacktail-server quacktail-client ───────────────── ───────────────── tailscale_up tailscale_up -ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward → quack_uri - └─ ducklake-lake volume quack_query → lake.inventory (before ATTACH) +ATTACH ducklake:… AS lake (local Parquet) tailscale_quack_forward → quack:127.0.0.1:19494 + └─ ducklake-lake volume attach_ducklake → SELECT FROM lake.inventory quack_serve(127.0.0.1:9494) ATTACH quack:… AS remote (e2e) tailscale_serve_local ``` @@ -20,10 +20,13 @@ Parquet + metadata live on **`ducklake-lake`** on the server only (`/var/lib/duc | Pattern | When to use | |---------|-------------| -| **`quack_query(uri, '…')`** | Server owns DuckLake files (compose demo). Run **before** `ATTACH quack AS remote`. | -| **`tailscale_quack_forward`** | Find/connect — returns `quack_uri` for the forwarder. | -| **`ATTACH 'ducklake:quack:…' AS lake (DATA_PATH '…')`** | Client has local or shared Parquet ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)). | -| **`ATTACH 'quack:…' AS remote`** | Primary catalog only (`remote.e2e_payload`). **Not** nested `remote.lake.*`. | +| **`CALL attach_ducklake(...)`** | Server owns DuckLake files — **preferred** | +| **`quack_query(uri, '…')`** | Same as above; fallback for older images | +| **`tailscale_quack_forward`** | Required on tsnet clients before Quack ATTACH | +| **`ATTACH 'ducklake:quack:…' (DATA_PATH '…')`** | Client has shared Parquet ([DuckDB 1.5.3](https://duckdb.org/2026/05/20/announcing-duckdb-153.html)) | +| **`ATTACH 'quack:…' AS remote`** | Primary catalog only — **not** `remote.lake.*` | + +Full pattern guide: [docs/GUIDE.md](../docs/GUIDE.md). ## Run the demo @@ -34,21 +37,26 @@ docker compose up -d --force-recreate headscale quacktail-server docker compose --profile test run --rm quacktail-client ``` -Expect `DISCOVERED`, inventory rows, `LAKE_PASSED`, and `PASSED`. +Expect `LAKE_PASSED`, `PASSED`, and `✓ Demo passed`. -## Tailnet client SQL +## Tailnet client SQL (sketch) ```sql CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); --- Find: quack_uri from forward result (do not quack_query quack_discover — deadlocks) CREATE SECRET (TYPE quack, TOKEN 'quackscale-demo-token', SCOPE 'quack:127.0.0.1:19494'); --- Query lake on server (before ATTACH remote) -FROM quack_query('quack:127.0.0.1:19494', 'SELECT * FROM lake.inventory', token => '…', disable_ssl => true); +CALL attach_ducklake( + 'quack:127.0.0.1:19494', + remote_catalog => 'lake', + alias => 'lake', + token => 'quackscale-demo-token', + disable_ssl => true +); +SELECT * FROM lake.inventory; ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); SELECT * FROM remote.e2e_payload; ``` -See [docs/DUCKLAKE_TAILNET.md](../docs/DUCKLAKE_TAILNET.md) and [local-demo.sql](local-demo.sql). +See [local-demo.sql](local-demo.sql) for a standalone script. From fffe850dc4a5d5c9f55437a154687dae68f37687 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 10:26:16 +0200 Subject: [PATCH 22/25] Improve README with embedded Tailscale value prop and SQL API map. Highlight WireGuard security, identity-based access, and in-process tailnet management; reorganize functions by category with doc links. Co-authored-by: Cursor --- README.md | 254 ++++++++++++++++++++++++++---------------------------- 1 file changed, 124 insertions(+), 130 deletions(-) diff --git a/README.md b/README.md index cc6544b..5dfbab2 100644 --- a/README.md +++ b/README.md @@ -1,77 +1,125 @@ # QuackScale -DuckDB community extension that joins a [Tailscale](https://tailscale.com) tailnet and exposes the [Quack](https://duckdb.org/docs/current/quack/overview) remote protocol on tailnet addresses — so DuckDB peers can `ATTACH` and query each other over easily and securely. +**QuackScale** embeds a [Tailscale](https://tailscale.com) / WireGuard client ([libtailscale](https://github.com/tailscale/libtailscale)) inside DuckDB so a process can join a private tailnet and reach peers over encrypted mesh networking — without a separate VPN sidecar, tunnel daemon, or public ingress. -**QuackTail** = DuckDB + `quack` (core) + `quackscale` (this extension) on the same tailnet. - -QuackScale does **not** replace the core `quack` extension. Load both: +Combined with DuckDB’s [Quack](https://duckdb.org/docs/current/quack/overview) HTTP protocol, you get **QuackTail**: SQL engines that discover each other on `100.x.x.x` / MagicDNS, authenticate callers, and run `ATTACH`, `quack_query`, and DuckLake workloads across the mesh. ```sql -LOAD quack; -- HTTP server, quack_serve, ATTACH quack:... -LOAD quackscale; -- tailscale_up, quack_uri, quack_token, ... +LOAD quack; -- HTTP server, ATTACH, quack_query +LOAD quackscale; -- tailnet join, dial, forward, serve — all from SQL ``` -## Documentation +QuackScale does **not** replace `quack` or `ducklake`. It provides the **network layer** Quack needs on a tailnet. -| Doc | Audience | Contents | -|-----|----------|----------| -| [docs/GUIDE.md](docs/GUIDE.md) | **Integrators** | Use cases, patterns, DuckLake, demos, limitations | -| [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md) | **Integrators** | Tailscale, Headscale, Quack tokens | -| [docs/README.md](docs/README.md) | Everyone | Documentation index | -| [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) | Contributors | Build, CI, roadmap | -| [examples/README.md](examples/README.md) | Integrators | Docker Compose two-node demo | +| Goal | Start here | +|------|------------| +| Design a deployment (patterns, DuckLake, demos) | [docs/GUIDE.md](docs/GUIDE.md) | +| Tailnet login, Headscale, Quack tokens | [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md) | +| Two-node proof (Docker Compose) | [examples/README.md](examples/README.md) | +| Build from source, CI, roadmap | [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) | +| Full doc index | [docs/README.md](docs/README.md) | -## Authentication (two layers) +--- -QuackTail uses **two separate** credential systems. See [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md). +## Why embedded Tailscale in DuckDB? -| Layer | Question | Provisioned via | -|-------|----------|-----------------| -| **Tailnet** | Is this process on our mesh? | `TS_AUTHKEY`, Headscale preauth key, or browser login | -| **Quack** | May this caller run SQL on this server? | `QUACK_TAILNET_TOKEN`, `CREATE SECRET`, or auth macro | +Traditional setups expose DuckDB/Quack on localhost or bind a public IP and add TLS, firewalls, and VPN appliances around it. QuackScale flips that model: **each DuckDB process carries its own tailnet identity** and speaks WireGuard to peers that your control plane already trusts. -**Do not** copy the random `auth_token` column from each `CALL quack_serve` by hand. For a fleet of servers and clients, use a **network-wide shared token** (or allowlist) as described in [Quack security — Overriding authentication](https://duckdb.org/docs/current/quack/security#overriding-authentication). +| Benefit | What it means for you | +|---------|------------------------| +| **WireGuard encryption** | Traffic between tailnet nodes is encrypted end-to-end ([Noise](https://tailscale.com/blog/how-tailscale-works) / WireGuard). Quack HTTP rides inside that mesh — not cleartext on the public internet. | +| **No public listen by default** | Nodes get tailnet IPs (`100.64.0.0/10`). Quack binds loopback; `tailscale_serve_local` exposes **9494** only on the mesh. Nothing needs a world-routable address. | +| **Identity-based access** | Tailscale or [Headscale](https://github.com/juanfont/headscale) ACLs decide **which nodes** may open TCP to a peer. Quack tokens decide **which callers** may run SQL — [defense in depth](docs/AUTHENTICATION.md). | +| **No sidecar VPN** | libtailscale (tsnet) runs in-process. One binary, one lifecycle — ideal for containers, batch jobs, and edge nodes that should not run `tailscaled` separately. | +| **NAT traversal** | Mesh connectivity works across NATs and regions (direct paths or DERP relays). DuckDB nodes on laptops, cloud VMs, and on-prem can mesh without manual port forwarding. | +| **Self-hosted or SaaS control plane** | Same SQL API for [Tailscale](https://tailscale.com) and [Headscale](https://headscale.net/) — set `control_url` and a preauth key. | +| **Manage the tailnet from SQL** | Join, status, ping, forward, serve, and teardown are **`CALL` table functions** — scriptable in migrations, init SQL, and orchestration hooks. | -```sh -# Same value on every QuackTail server and client (K8s secret, systemd, etc.) -export QUACK_TAILNET_TOKEN='your-shared-secret-at-least-4-chars' -export TS_AUTHKEY='tskey-auth-...' # Tailscale — separate secret -``` +QuackScale handles **reachability and transport**. You still configure [Quack application auth](docs/AUTHENTICATION.md) (`QUACK_TAILNET_TOKEN`, secrets, allowlists) for who may execute SQL. -## Prerequisites +--- -- C++17 toolchain, `cmake`, `make` (or `ninja` + `ccache`) -- **Go 1.25+** with CGO (for libtailscale; CMake bootstraps Go 1.25.5 automatically if the host toolchain is older) -- DuckDB with core **`quack`** extension (e.g. v1.5.3+) -- Git submodules: `duckdb`, `extension-ci-tools`, `third_party/libtailscale` +## How QuackTail fits together -```sh -git clone --recurse-submodules https://github.com/quackscience/duckdb_tailscale.git -cd duckdb_tailscale -git submodule update --init --recursive # if you cloned without --recurse-submodules +```text + Server (long-lived) Client (job / laptop) + ─────────────────── ───────────────────── + CALL tailscale_up(...) CALL tailscale_up(...) + CALL quack_serve(127.0.0.1:9494) CALL tailscale_quack_forward(host => …) + CALL tailscale_serve_local(:9494) │ + │ ▼ + │ WireGuard mesh quack:127.0.0.1:19494 + └◄──────── tailscale_dial ────────────┘ + ATTACH / quack_query / attach_ducklake ``` -## Build +**`tailscale_quack_forward`** is required when the client uses embedded tsnet: Quack speaks normal HTTP/TCP, which kernel routing does not send over the tailnet by itself. The forwarder listens on loopback and dials peers via `tailscale_dial`. -```sh -make -# faster rebuilds: GEN=ninja make -``` +End-to-end recipes and DuckLake patterns: **[docs/GUIDE.md](docs/GUIDE.md)**. + +--- + +## SQL API (`LOAD quackscale`) + +Use **`CALL`** for table functions (same style as `CALL quack_serve`). Parameters for `tailscale_up` / `tailscale_login`: `hostname`, `authkey` (or `TS_AUTHKEY` env), `control_url`, `state_dir`, `ephemeral`, `loopback_proxy`. + +### Tailnet lifecycle + +| Command | Purpose | +|---------|---------| +| [`CALL tailscale_up(...)`](docs/AUTHENTICATION.md#tailnet-login-tailscale-saas) | Join the tailnet (blocking). Server automation and CI. | +| [`CALL tailscale_login(...)`](docs/AUTHENTICATION.md#developer-laptop) | Non-blocking join; returns `login_url` for browser auth. | +| [`CALL tailscale_login_status()`](docs/AUTHENTICATION.md#developer-laptop) | Poll login state (`starting` / `needs_login` / `up` / `error`). | +| [`CALL tailscale_status()`](docs/GUIDE.md#observability) | Linked?, running, hostname, tailnet IPs. | +| [`CALL tailscale_down()`](docs/GUIDE.md#standard-client-connection-recipe) | Stop forwarder and close tsnet. **Required** for one-shot clients or the process hangs. | + +### Connectivity on the mesh -Artifacts: +| Command | Purpose | +|---------|---------| +| [`CALL tailscale_serve_local(port => 9494)`](docs/GUIDE.md#use-case-1--remote-duckdb-hub-pattern-a) | Tailscale Serve: tailnet TCP **→** `127.0.0.1:9494`. Run after local `quack_serve`. | +| [`CALL tailscale_ping(host => 'peer', port => 9494)`](docs/GUIDE.md#observability) | TCP dial to a peer over tsnet — readiness before Quack `ATTACH`. | +| [`CALL tailscale_quack_forward(host => 'peer', port => 9494)`](docs/GUIDE.md#standard-client-connection-recipe) | Listen on loopback; dial peer for each Quack HTTP connection. Returns `quack_uri`. **Preferred client path.** | +| [`CALL tailscale_quack_proxy()`](docs/DEVELOPMENT.md) | Legacy SOCKS proxy + `ALL_PROXY` — deprecated; use `tailscale_quack_forward`. | +| [`CALL tailscale_proxy_status()`](docs/DEVELOPMENT.md) | Legacy SOCKS status. | -- `./build/release/duckdb` — shell with extension preloaded -- `./build/release/extension/quackscale/quackscale.duckdb_extension` — loadable binary +### Quack on tailnet (helpers; `LOAD quack` required for serve/attach) -Disable Tailscale embedding (stub build, no Go): +| Function | Purpose | +|----------|---------| +| `quack_uri()` | This node’s client-facing `quack::9494` (MagicDNS or tailnet IP). | +| `quack_token()` | Shared Quack secret from `QUACK_TAILNET_TOKEN` / `QUACK_TOKEN` env. | +| [`CALL quack_discover(port => 9494)`](docs/GUIDE.md#finding-peers) | All `quack:` URIs this node advertises on the tailnet. | + +Core Quack (`LOAD quack`): `quack_serve`, `quack_stop`, `ATTACH`, `quack_query`, etc. + +### Remote DuckLake + +| Command | Purpose | +|---------|---------| +| [`CALL attach_ducklake(uri, ...)`](docs/GUIDE.md#use-case-2--ducklake-on-the-server-patterns-b--b) | Local views over a remote DuckLake catalog when Parquet lives on the server. | + +--- + +## Authentication (two layers) + +| Layer | Question | Details | +|-------|----------|---------| +| **Tailnet** | Is this machine on our mesh? | [docs/AUTHENTICATION.md — Tailnet login](docs/AUTHENTICATION.md#tailnet-login-tailscale-saas) | +| **Quack** | May this caller run SQL? | [docs/AUTHENTICATION.md — Quack tokens](docs/AUTHENTICATION.md#quack-http-tokens) | ```sh -make CMAKE_VARS="-DQUACKSCALE_WITH_TAILSCALE=OFF" +export TS_AUTHKEY='tskey-auth-...' # or Headscale preauth key +export QUACK_TAILNET_TOKEN='shared-quack-secret' # same on servers and clients ``` -## Quick start — QuackTail server +Do **not** copy the random `auth_token` from each `CALL quack_serve`. Use a fleet-wide shared token or [Quack allowlist](https://duckdb.org/docs/current/quack/security#overriding-authentication). + +--- + +## Quick start -Set env vars **before** starting DuckDB (see [authentication](#authentication-two-layers)): +### Server ```sh export TS_AUTHKEY='tskey-auth-...' @@ -83,13 +131,11 @@ export QUACK_TAILNET_TOKEN='your-shared-quack-token' LOAD quack; LOAD quackscale; --- 1) Join tailnet CALL tailscale_up( hostname => 'my-duckdb-node', state_dir => '~/.local/share/duckdb/quackscale' ); --- 2) Quack on loopback; Tailscale Serve exposes port 9494 on the tailnet CALL quack_serve( 'quack:127.0.0.1:9494', allow_other_hostname => true, @@ -97,99 +143,55 @@ CALL quack_serve( ); CALL tailscale_serve_local(port => 9494); --- 3) See what clients should connect to -CALL quack_discover(); +FROM quack_discover(); ``` -For **local-only** (no tailnet), the [Quack docs](https://duckdb.org/docs/current/quack/overview) use `CALL quack_serve('quack:localhost', token => ...)` and `ATTACH 'quack:localhost' AS remote (TYPE quack)` with `SCOPE 'quack:localhost'` — plain HTTP is automatic for local URIs. +Long-lived servers: persistent `state_dir`, **no** `tailscale_down()`. Headscale: add `control_url` and preauth key — [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md). -## Quick start — QuackTail client - -Same `QUACK_TAILNET_TOKEN` on the client machine: +### Client ```sql +LOAD quackscale; LOAD quack; +CALL tailscale_up(hostname => 'my-client', state_dir => '…', …); +CALL tailscale_quack_forward(host => 'my-duckdb-node', port => 9494, local_port => 19494); + CREATE SECRET ( TYPE quack, TOKEN 'your-shared-quack-token', - SCOPE 'quack:my-duckdb-node:9494' -); - -ATTACH 'quack:my-duckdb-node:9494' AS remote ( - TYPE quack, - DISABLE_SSL true + SCOPE 'quack:127.0.0.1:19494' ); +ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack, DISABLE_SSL true); FROM remote.query('SELECT 42'); -``` - -Use the hostname from `tailscale_up(hostname => ...)` and Quack’s default port **9494**. Details: [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md). - -### Tailscale login (first-time / laptop) - -| Scenario | Command | -|----------|---------| -| Server / automation | `export TS_AUTHKEY=...` then `CALL tailscale_up(...)` | -| Interactive browser | `CALL tailscale_login(...)` → open `login_url` → `CALL tailscale_login_status()` until `status = 'up'` | -| Repeat visits | Reuse `state_dir` — usually no browser | - -See [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md). - -### Headscale (self-hosted tailnet) - -[Headscale](https://github.com/juanfont/headscale) is API-compatible with Tailscale’s control server — no extra QuackScale APIs: -```sql -CALL tailscale_up( - hostname => 'my-duckdb-node', - control_url => 'https://headscale.example.com', - authkey => '', - state_dir => '~/.local/share/duckdb/quackscale' -); +DETACH remote; +CALL tailscale_down(); ``` -Example: [examples/headscale_quacktail.sql](examples/headscale_quacktail.sql). CI runs [`.github/workflows/headscale-integration.yml`](.github/workflows/headscale-integration.yml). +Full client recipe (probe, DuckLake, compose markers): **[docs/GUIDE.md](docs/GUIDE.md)**. -### Quack auth modes (pick one) +--- -| Mode | When | How | -|------|------|-----| -| **Shared env token** | Default for QuackTail fleets | `QUACK_TAILNET_TOKEN` + `quack_token()` on serve; matching `CREATE SECRET` or `TOKEN` on clients | -| **Multi-token allowlist** | Teams, rotation, multiple clients | `SET GLOBAL quack_authentication_function = '...'` + token table — [Quack docs](https://duckdb.org/docs/current/quack/security#example-multi-token-table) | -| **Developer mode** | Lab tailnet only | Auth macro always `true` — [Quack docs](https://duckdb.org/docs/current/quack/security#example-developer-mode-always-allow) | - -Full walkthrough: [docs/AUTHENTICATION.md](docs/AUTHENTICATION.md). +## Build -## SQL reference +**Prerequisites:** C++17, cmake, make or ninja, Go 1.25+ (CGO; CMake bootstraps Go 1.25.5 if needed), DuckDB with core **`quack`**, git submodules (`duckdb`, `extension-ci-tools`, `third_party/libtailscale`). -Load with `LOAD quackscale;`. Use **`CALL`** for table functions (same style as `CALL quack_serve`), not `SELECT` / `FROM`. +```sh +git clone --recurse-submodules https://github.com/quackscience/duckdb-quackscale.git +cd duckdb-quackscale +GEN=ninja make release +``` -### Tailscale (`quackscale` extension) +- `./build/release/duckdb` — shell with extension +- `./build/release/extension/quackscale/quackscale.duckdb_extension` — loadable binary -| Command | Description | -|---------|-------------| -| `CALL tailscale_up(...)` | Join tailnet; params: `hostname`, `state_dir`, `control_url`, `ephemeral`, `authkey` / `TS_AUTHKEY` | -| `CALL tailscale_login(...)` | Non-blocking join; returns `login_url` for browser auth | -| `CALL tailscale_login_status()` | Poll login (`starting` / `needs_login` / `up` / `error`) | -| `CALL tailscale_status()` | libtailscale linked?, running, hostname, tailnet IPs | -| `CALL tailscale_quack_forward(host => 'peer', port => 9494)` | Localhost TCP → `tailscale_dial` (preferred for Quack ATTACH; no ALL_PROXY) | -| `CALL tailscale_down()` | Stop forwarder + close tsnet (one-shot clients — required or process hangs) | -| `CALL attach_ducklake(uri, …)` | Create local views over a remote DuckLake catalog (server-owned Parquet) — see [docs/GUIDE.md](docs/GUIDE.md) | -| `CALL tailscale_quack_proxy()` | Legacy SOCKS + ALL_PROXY | -| `CALL tailscale_proxy_status()` | Legacy SOCKS status | +Stub build without Tailscale: `make CMAKE_VARS="-DQUACKSCALE_WITH_TAILSCALE=OFF"`. -### Quack on tailnet (helpers; requires core `quack` for `quack_serve`) +Docker images (source build + verify): **[examples/README.md](examples/README.md)**. -| Function | Description | -|----------|-------------| -| `quack_uri()` | Client-facing `quack::9494` for discovery/ATTACH | -| `CALL tailscale_serve_local(port => 9494)` | Tailscale Serve: tailnet TCP → `127.0.0.1:9494` (run after local `quack_serve`) | -| `CALL tailscale_ping(host => 'peer', port => 9494)` | tsnet TCP dial to peer (readiness check before Quack ATTACH) | -| `quack_token()` | Shared Quack token from `QUACK_TAILNET_TOKEN` / `QUACK_TOKEN` env | -| `CALL quack_discover()` | All `quack:` URIs this node advertises (`magicdns` / `tailnet_ip`) | - -Core Quack (`LOAD quack`): `quack_serve`, `quack_stop`, `ATTACH`, `quack_query`, etc. +--- ## Tests @@ -197,20 +199,12 @@ Core Quack (`LOAD quack`): `quack_serve`, `quack_stop`, `ATTACH`, `quack_query`, make test ``` -SQL unit tests do not require a live tailnet or `QUACK_TAILNET_TOKEN`. See [test/README.md](test/README.md). - -### Integration (Headscale + QuackTail) - -- **Docker Compose demo:** [examples/README.md](examples/README.md) — two-node cluster with `tailscale_quack_forward` + Quack `ATTACH` -- **CI e2e:** [`.github/workflows/headscale-e2e.yml`](.github/workflows/headscale-e2e.yml) (`workflow_dispatch`, release binary `v1.0.2` by default) -- **Host helper:** `scripts/local_remote_headscale_test.sh` — join a running compose stack from host DuckDB +Unit tests need no live tailnet. Integration: [examples/README.md](examples/README.md), [test/e2e/README.md](test/e2e/README.md), [`.github/workflows/headscale-integration.yml`](.github/workflows/headscale-integration.yml). -Details: [test/e2e/README.md](test/e2e/README.md). - -## Based on - -[duckdb/extension-template](https://github.com/duckdb/extension-template) +--- ## License MIT (extension template). libtailscale is [BSD-3-Clause](https://github.com/tailscale/libtailscale/blob/main/LICENSE). + +Based on [duckdb/extension-template](https://github.com/duckdb/extension-template). From 3ecca9e8dbeee86bb51a85c02ba1b467fe7f8354 Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 10:35:25 +0200 Subject: [PATCH 23/25] Align CI with compose demo: source build e2e and verify-image. Replace release-binary headscale-e2e with scripts/ci_compose_e2e.sh (examples/docker-compose.yml). Add patch to headscale-integration build deps; document workflow split in test/e2e and DEVELOPMENT docs. Co-authored-by: Cursor --- .github/workflows/docker-compose-build.yml | 25 +++-- .github/workflows/headscale-e2e.yml | 109 ++++++++------------ .github/workflows/headscale-integration.yml | 4 +- README.md | 2 +- docs/DEVELOPMENT.md | 6 +- scripts/ci_compose_e2e.sh | 41 ++++++++ scripts/ci_headscale_e2e.sh | 4 +- test/e2e/README.md | 73 +++++++------ 8 files changed, 149 insertions(+), 115 deletions(-) create mode 100755 scripts/ci_compose_e2e.sh diff --git a/.github/workflows/docker-compose-build.yml b/.github/workflows/docker-compose-build.yml index 8b16b90..250db1b 100644 --- a/.github/workflows/docker-compose-build.yml +++ b/.github/workflows/docker-compose-build.yml @@ -1,4 +1,4 @@ -# Build examples/docker-compose images from source (catches missing apt deps like patch). +# Fast PR check: build examples images and verify quackscale functions (no full e2e). name: Docker compose build on: @@ -9,15 +9,19 @@ on: - 'examples/Dockerfile' - 'examples/docker-compose.yml' - 'scripts/e2e/docker-build-quackscale.sh' + - 'scripts/e2e/quacktail-verify-image.sh' - '.dockerignore' - 'src/**' - 'cmake/**' +env: + COMPOSE_PROJECT_NAME: quacktail-ci-build + jobs: compose-build: - name: docker compose build (source) + name: docker compose build + verify runs-on: ubuntu-latest - timeout-minutes: 60 + timeout-minutes: 90 permissions: contents: read @@ -26,12 +30,13 @@ jobs: with: submodules: recursive - - name: Build quacktail-server image - run: | - cd examples - docker compose build quacktail-server - - - name: Verify image + - name: Build and verify quacktail-server image + working-directory: examples run: | - cd examples + docker compose build quacktail-server quacktail-client docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-server + + - name: Teardown + if: always() + working-directory: examples + run: docker compose down --remove-orphans || true diff --git a/.github/workflows/headscale-e2e.yml b/.github/workflows/headscale-e2e.yml index 78bed2b..fed03d2 100644 --- a/.github/workflows/headscale-e2e.yml +++ b/.github/workflows/headscale-e2e.yml @@ -1,88 +1,65 @@ -# QuackTail e2e over Headscale — one job, Headscale service + concurrent DuckDB workers. +# QuackTail e2e — same flow as examples/README.md (source-built compose demo). name: Headscale QuackTail e2e on: workflow_dispatch: - inputs: - release_tag: - description: 'GitHub release tag (use "latest" for newest release)' - required: false - default: v1.0.2 - type: string + pull_request: + paths: + - '.github/workflows/headscale-e2e.yml' + - 'examples/**' + - 'scripts/e2e/**' + - 'scripts/ci_compose_e2e.sh' + - 'scripts/lib/quacktail_ext.sh' + - 'src/**' + - 'cmake/**' + - 'third_party/libtailscale/**' + - '.dockerignore' env: - HEADSCALE_HOST: headscale - HEADSCALE_CONTROL_URL: http://headscale:8080 - HEADSCALE_MAGICDNS_BASE_DOMAIN: quackscale-ci.test - E2E_QUACK_ATTACH_HOST: hostname - E2E_QUACK_SERVE_MODE: loopback_serve - E2E_CLIENT_TIMEOUT_SEC: 180 - QUACKTAIL_CLIENT_ATTEMPTS: 15 - QUACKTAIL_CLIENT_POLL_SEC: 2 + COMPOSE_PROJECT_NAME: quacktail-ci jobs: - quacktail-e2e: - name: QuackTail e2e (Headscale + server + client) + compose-e2e: + name: Compose e2e (Headscale + DuckLake + Quack) runs-on: ubuntu-latest - timeout-minutes: 30 + timeout-minutes: 90 permissions: contents: read steps: - uses: actions/checkout@v4 + with: + submodules: recursive - - name: Download QuackTail release binary - run: | - chmod +x scripts/ci_download_release_duckdb.sh - eval "$(./scripts/ci_download_release_duckdb.sh "${{ inputs.release_tag }}")" - echo "DUCKDB=$DUCKDB" >> "$GITHUB_ENV" - "$DUCKDB" -version || "$DUCKDB" --version - - - name: Install quack extension - env: - DUCKDB_EXTENSION_DIRECTORY: ${{ runner.temp }}/duckdb_extensions - run: | - echo "DUCKDB_EXTENSION_DIRECTORY=$DUCKDB_EXTENSION_DIRECTORY" >> "$GITHUB_ENV" - chmod +x scripts/ci_ensure_quack.sh scripts/lib/quacktail_ci.sh scripts/lib/quacktail_ext.sh - ./scripts/ci_ensure_quack.sh - - - name: Start Headscale - run: | - chmod +x scripts/lib/headscale_ci.sh - export HEADSCALE_CI_ROOT="$GITHUB_WORKSPACE" - export HEADSCALE_DATA_DIR="$RUNNER_TEMP/headscale-data" - # shellcheck source=scripts/lib/headscale_ci.sh - source scripts/lib/headscale_ci.sh - headscale_ci_start "$HEADSCALE_DATA_DIR" - AUTHKEY="$(headscale_ci_create_authkey)" - echo "$AUTHKEY" > "$RUNNER_TEMP/headscale-authkey" - chmod 600 "$RUNNER_TEMP/headscale-authkey" - - - name: Run QuackTail e2e (server + client concurrent) - env: - HEADSCALE_AUTHKEY_FILE: ${{ runner.temp }}/headscale-authkey - HEADSCALE_ALREADY_RUNNING: "1" - DUCKDB_EXTENSION_DIRECTORY: ${{ runner.temp }}/duckdb_extensions + - name: Run compose e2e run: | - chmod +x scripts/ci_headscale_e2e.sh scripts/lib/headscale_ci.sh scripts/lib/quacktail_ci.sh scripts/e2e/quacktail-entrypoint.sh - ./scripts/ci_headscale_e2e.sh + chmod +x scripts/ci_compose_e2e.sh + ./scripts/ci_compose_e2e.sh - - name: Stop containers - if: always() + - name: Collect compose logs on failure + if: failure() + working-directory: examples run: | - export HEADSCALE_CI_ROOT="$GITHUB_WORKSPACE" - export QUACKTAIL_CI_ROOT="$GITHUB_WORKSPACE" - # shellcheck source=scripts/lib/quacktail_ci.sh - source scripts/lib/quacktail_ci.sh - quacktail_ci_stop - # shellcheck source=scripts/lib/headscale_ci.sh - source scripts/lib/headscale_ci.sh - headscale_ci_stop + docker compose ps -a || true + docker compose logs --no-color > "${RUNNER_TEMP}/compose-logs.txt" 2>&1 || true + docker compose exec -T quacktail-server tail -100 /work/server.log 2>/dev/null \ + > "${RUNNER_TEMP}/server.log" || true + docker compose run --rm --entrypoint tail quacktail-client -100 /work/client.out 2>/dev/null \ + > "${RUNNER_TEMP}/client.out" || true - - name: Upload e2e logs - if: always() + - name: Upload e2e artifacts + if: failure() uses: actions/upload-artifact@v4 with: - name: quacktail-e2e-logs - path: .e2e-work/ + name: quacktail-compose-e2e-logs + path: | + ${{ runner.temp }}/quacktail-compose-e2e.log + ${{ runner.temp }}/compose-logs.txt + ${{ runner.temp }}/server.log + ${{ runner.temp }}/client.out if-no-files-found: warn + + - name: Teardown + if: always() + working-directory: examples + run: docker compose --profile test down --remove-orphans -v || true diff --git a/.github/workflows/headscale-integration.yml b/.github/workflows/headscale-integration.yml index e426e66..aa3c7c3 100644 --- a/.github/workflows/headscale-integration.yml +++ b/.github/workflows/headscale-integration.yml @@ -12,6 +12,8 @@ on: - 'src/**' - 'cmake/**' - 'third_party/libtailscale/**' + - 'examples/Dockerfile' + - 'scripts/e2e/**' env: HEADSCALE_HOST: headscale @@ -38,7 +40,7 @@ jobs: - name: Install build dependencies run: | sudo apt-get update - sudo apt-get install -y build-essential cmake ninja-build ccache curl + sudo apt-get install -y build-essential cmake ninja-build patch ccache curl - name: Start Headscale run: | diff --git a/README.md b/README.md index 5dfbab2..1467304 100644 --- a/README.md +++ b/README.md @@ -199,7 +199,7 @@ Docker images (source build + verify): **[examples/README.md](examples/README.md make test ``` -Unit tests need no live tailnet. Integration: [examples/README.md](examples/README.md), [test/e2e/README.md](test/e2e/README.md), [`.github/workflows/headscale-integration.yml`](.github/workflows/headscale-integration.yml). +Unit tests need no live tailnet. Integration: [examples/README.md](examples/README.md), [test/e2e/README.md](test/e2e/README.md), [`.github/workflows/headscale-e2e.yml`](.github/workflows/headscale-e2e.yml). --- diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index 095181d..70f6773 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -70,9 +70,9 @@ When bumping the DuckDB target: | Workflow | Purpose | |----------|---------| -| [headscale-integration.yml](../.github/workflows/headscale-integration.yml) | Build from source + Headscale smoke | -| [docker-compose-build.yml](../.github/workflows/docker-compose-build.yml) | Build compose image + verify-image | -| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | Two-node e2e with release binary | +| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | **Full compose e2e** — build, verify-image, DuckLake + Quack demo | +| [docker-compose-build.yml](../.github/workflows/docker-compose-build.yml) | Compose build + verify-image (PR gate) | +| [headscale-integration.yml](../.github/workflows/headscale-integration.yml) | Source build + Headscale smoke on host | | [Release.yml](../.github/workflows/Release.yml) | Linux release tarball on GitHub Release | | [libtailscale-integration.yml](../.github/workflows/libtailscale-integration.yml) | libtailscale `go test` | diff --git a/scripts/ci_compose_e2e.sh b/scripts/ci_compose_e2e.sh new file mode 100755 index 0000000..b852fa8 --- /dev/null +++ b/scripts/ci_compose_e2e.sh @@ -0,0 +1,41 @@ +#!/usr/bin/env bash +# Full QuackTail e2e via examples/docker-compose.yml (same path as examples/README.md). +set -euo pipefail + +ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +EXAMPLES="$ROOT/examples" +LOG="${CI_COMPOSE_E2E_LOG:-${RUNNER_TEMP:-/tmp}/quacktail-compose-e2e.log}" + +cd "$EXAMPLES" + +echo "=== docker compose build (source) ===" +docker compose build quacktail-server quacktail-client + +echo "=== verify image (attach_ducklake, tailscale_down, tailscale_quack_forward) ===" +docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-client + +echo "=== start headscale + quacktail-server ===" +docker compose up -d --force-recreate headscale quacktail-server + +echo "=== run quacktail-client (profile test) ===" +: >"$LOG" +docker compose --profile test run --rm quacktail-client 2>&1 | tee "$LOG" + +grep -q 'LAKE_PASSED' "$LOG" || { + echo "error: LAKE_PASSED missing from client output" >&2 + exit 1 +} +grep -q 'PASSED' "$LOG" || { + echo "error: PASSED missing from client output" >&2 + exit 1 +} +grep -qE 'Demo passed|CLIENT_DEMO_DONE' "$LOG" || { + echo "error: demo completion marker missing" >&2 + exit 1 +} +grep -q 'attach_ducklake' "$LOG" || { + echo "error: attach_ducklake path not used (rebuild image from source?)" >&2 + exit 1 +} + +echo "ok: Headscale QuackTail compose e2e passed" diff --git a/scripts/ci_headscale_e2e.sh b/scripts/ci_headscale_e2e.sh index 3f167d4..16f7875 100755 --- a/scripts/ci_headscale_e2e.sh +++ b/scripts/ci_headscale_e2e.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash -# Two-node QuackTail e2e: Headscale + server + client DuckDB containers overlap. -# Server stays up (-d); client starts while server is still booting; client polls then ATTACH. +# Legacy two-node e2e: bind-mounted host duckdb + test/e2e/Dockerfile.quacktail. +# Prefer scripts/ci_compose_e2e.sh (examples/docker-compose.yml) — same as examples/README.md. set -euo pipefail ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" diff --git a/test/e2e/README.md b/test/e2e/README.md index 10925db..7b0fc95 100644 --- a/test/e2e/README.md +++ b/test/e2e/README.md @@ -2,55 +2,64 @@ Integration tests for a two-node QuackTail cluster over [Headscale](https://github.com/juanfont/headscale). -## Where tests live +## Canonical path (matches examples demo) -| Test | How to run | -|------|------------| -| **Docker Compose demo** | [examples/README.md](../../examples/README.md) — `docker compose --profile test run --rm quacktail-client` | -| **GitHub Actions e2e** | [`.github/workflows/headscale-e2e.yml`](../../.github/workflows/headscale-e2e.yml) — manual `workflow_dispatch` | -| **Host script** | [`scripts/ci_headscale_e2e.sh`](../../scripts/ci_headscale_e2e.sh) — concurrent server + client containers | -| **Local host DuckDB** | [`scripts/local_remote_headscale_test.sh`](../../scripts/local_remote_headscale_test.sh) — join a running compose stack from the host | +```bash +git submodule update --init --recursive +chmod +x scripts/ci_compose_e2e.sh +./scripts/ci_compose_e2e.sh +``` -All paths share the same client SQL shape (see [`scripts/lib/headscale_ci.sh`](../../scripts/lib/headscale_ci.sh) `headscale_ci_sql_client_session` and [`scripts/e2e/quacktail-compose-bootstrap.sh`](../../scripts/e2e/quacktail-compose-bootstrap.sh) `write_client_session_sql`). +Same steps as [examples/README.md](../../examples/README.md): -## Server (`loopback_serve`) +1. `docker compose build` (source, `BUILD_FROM_SOURCE=1`) +2. `quacktail-verify-image.sh` (`attach_ducklake`, `tailscale_down`, `tailscale_quack_forward`) +3. `docker compose up -d headscale quacktail-server` +4. `docker compose --profile test run --rm quacktail-client` -Quack binds loopback; `tailscale_serve_local` publishes port 9494 on the tailnet: +Expect `LAKE_PASSED`, `PASSED`, `attach_ducklake`, and `✓ Demo passed`. -```sql -CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => quack_token()); -CALL tailscale_serve_local(port => 9494); -``` +## CI workflows -Healthcheck: `/work/server.log` contains `quack:127.0.0.1:9494` and `local_forward`. +| Workflow | What it runs | +|----------|----------------| +| [headscale-e2e.yml](../../.github/workflows/headscale-e2e.yml) | Full compose e2e (`scripts/ci_compose_e2e.sh`) on PR + manual dispatch | +| [docker-compose-build.yml](../../.github/workflows/docker-compose-build.yml) | Build + verify-image only (faster PR gate) | +| [headscale-integration.yml](../../.github/workflows/headscale-integration.yml) | Source build + Headscale smoke (`tailscale_up` on host) | + +## Legacy host-container e2e + +[`scripts/ci_headscale_e2e.sh`](../../scripts/ci_headscale_e2e.sh) runs concurrent server/client containers with a bind-mounted host `duckdb` binary and [`test/e2e/Dockerfile.quacktail`](Dockerfile.quacktail). Prefer **compose e2e** above — it uses [`examples/Dockerfile`](../../examples/Dockerfile) and the same entrypoint/bootstrap as the documented demo. -## Client (one DuckDB session, no curl) +## Client session shape + +Generated by [`scripts/e2e/quacktail-compose-bootstrap.sh`](../../scripts/e2e/quacktail-compose-bootstrap.sh): ```sql LOAD quackscale; CALL tailscale_up(...); CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_port => 19494); -CALL tailscale_ping(host => 'quacktail-server', port => 9494); -LOAD quack; -CREATE SECRET (TYPE quack, TOKEN '…', SCOPE 'quack:127.0.0.1:19494'); -FROM quack_query('quack:127.0.0.1:19494', 'SELECT 1 AS probe', ...); +CALL tailscale_ping(...); +LOAD quack; CREATE SECRET ...; +FROM quack_query(..., 'SELECT 1 AS probe', ...); +CALL attach_ducklake(...); +SELECT * FROM lake.inventory ...; +SELECT 'LAKE_PASSED' ...; ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); -SELECT * FROM remote.e2e_payload LIMIT 5; -SELECT 'PASSED' AS status, ... FROM remote.e2e_payload; +SELECT 'PASSED' ...; +SELECT 'CLIENT_DEMO_DONE' AS status; +CALL tailscale_down(); ``` -Invoked as: `duckdb -batch -echo -f client_session.sql` (in-memory; no `-bail` / `-init` file DB). - -Compose waits for `quacktail-server` **healthy** before starting the client. The client retries the full session until a `PASSED` row appears. - -## CI workflow +## Server (`loopback_serve`) -[`headscale-e2e.yml`](../../.github/workflows/headscale-e2e.yml): +```sql +CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => quack_token()); +CALL tailscale_serve_local(port => 9494); +``` -1. Download release binary (`v1.0.2` by default) -2. Start Headscale in Docker -3. Run `scripts/ci_headscale_e2e.sh` (server container + client container) +Healthcheck: `/work/server.log` contains `quack:127.0.0.1:9494` and `local_forward`. ## Debug probe -[`examples/docker-compose.yml`](../../examples/docker-compose.yml) profile `debug`: vanilla `tailscale/tailscale` container — isolates tailnet connectivity from DuckDB tsnet. +[examples/docker-compose.yml](../../examples/docker-compose.yml) profile `debug`: vanilla `tailscale/tailscale` — isolates tailnet connectivity from DuckDB tsnet. From a55def39a1cebc39ed22f972e66226613e8cd39a Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 10:39:41 +0200 Subject: [PATCH 24/25] Keep headscale-e2e manual-only (workflow_dispatch). Remove pull_request trigger added by mistake; PR CI uses docker-compose-build. Co-authored-by: Cursor --- .github/workflows/headscale-e2e.yml | 11 ----------- docs/DEVELOPMENT.md | 2 +- test/e2e/README.md | 2 +- 3 files changed, 2 insertions(+), 13 deletions(-) diff --git a/.github/workflows/headscale-e2e.yml b/.github/workflows/headscale-e2e.yml index fed03d2..3609f47 100644 --- a/.github/workflows/headscale-e2e.yml +++ b/.github/workflows/headscale-e2e.yml @@ -3,17 +3,6 @@ name: Headscale QuackTail e2e on: workflow_dispatch: - pull_request: - paths: - - '.github/workflows/headscale-e2e.yml' - - 'examples/**' - - 'scripts/e2e/**' - - 'scripts/ci_compose_e2e.sh' - - 'scripts/lib/quacktail_ext.sh' - - 'src/**' - - 'cmake/**' - - 'third_party/libtailscale/**' - - '.dockerignore' env: COMPOSE_PROJECT_NAME: quacktail-ci diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index 70f6773..e272b38 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -70,7 +70,7 @@ When bumping the DuckDB target: | Workflow | Purpose | |----------|---------| -| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | **Full compose e2e** — build, verify-image, DuckLake + Quack demo | +| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | **Full compose e2e** (manual `workflow_dispatch` only) | | [docker-compose-build.yml](../.github/workflows/docker-compose-build.yml) | Compose build + verify-image (PR gate) | | [headscale-integration.yml](../.github/workflows/headscale-integration.yml) | Source build + Headscale smoke on host | | [Release.yml](../.github/workflows/Release.yml) | Linux release tarball on GitHub Release | diff --git a/test/e2e/README.md b/test/e2e/README.md index 7b0fc95..fa0de37 100644 --- a/test/e2e/README.md +++ b/test/e2e/README.md @@ -23,7 +23,7 @@ Expect `LAKE_PASSED`, `PASSED`, `attach_ducklake`, and `✓ Demo passed`. | Workflow | What it runs | |----------|----------------| -| [headscale-e2e.yml](../../.github/workflows/headscale-e2e.yml) | Full compose e2e (`scripts/ci_compose_e2e.sh`) on PR + manual dispatch | +| [headscale-e2e.yml](../../.github/workflows/headscale-e2e.yml) | Full compose e2e (`scripts/ci_compose_e2e.sh`) — **manual dispatch only** | | [docker-compose-build.yml](../../.github/workflows/docker-compose-build.yml) | Build + verify-image only (faster PR gate) | | [headscale-integration.yml](../../.github/workflows/headscale-integration.yml) | Source build + Headscale smoke (`tailscale_up` on host) | From 4c108a3fba97ff7a3cb6b972714da1e048e7c6fc Mon Sep 17 00:00:00 2001 From: Lorenzo Mangani Date: Sat, 30 May 2026 10:41:04 +0200 Subject: [PATCH 25/25] Restore release-binary e2e; remove PR compose source builds. headscale-e2e is workflow_dispatch only and uses ci_headscale_e2e.sh with pre-built release duckdb. Delete docker-compose-build workflow; keep ci_compose_e2e.sh for local source-build demo only. Co-authored-by: Cursor --- .github/workflows/docker-compose-build.yml | 42 --------- .github/workflows/headscale-e2e.yml | 100 ++++++++++++++------- README.md | 2 +- docs/DEVELOPMENT.md | 16 ++-- scripts/ci_compose_e2e.sh | 31 +++---- scripts/ci_headscale_e2e.sh | 4 +- test/e2e/README.md | 66 ++++++++------ 7 files changed, 126 insertions(+), 135 deletions(-) delete mode 100644 .github/workflows/docker-compose-build.yml diff --git a/.github/workflows/docker-compose-build.yml b/.github/workflows/docker-compose-build.yml deleted file mode 100644 index 250db1b..0000000 --- a/.github/workflows/docker-compose-build.yml +++ /dev/null @@ -1,42 +0,0 @@ -# Fast PR check: build examples images and verify quackscale functions (no full e2e). -name: Docker compose build - -on: - workflow_dispatch: - pull_request: - paths: - - '.github/workflows/docker-compose-build.yml' - - 'examples/Dockerfile' - - 'examples/docker-compose.yml' - - 'scripts/e2e/docker-build-quackscale.sh' - - 'scripts/e2e/quacktail-verify-image.sh' - - '.dockerignore' - - 'src/**' - - 'cmake/**' - -env: - COMPOSE_PROJECT_NAME: quacktail-ci-build - -jobs: - compose-build: - name: docker compose build + verify - runs-on: ubuntu-latest - timeout-minutes: 90 - permissions: - contents: read - - steps: - - uses: actions/checkout@v4 - with: - submodules: recursive - - - name: Build and verify quacktail-server image - working-directory: examples - run: | - docker compose build quacktail-server quacktail-client - docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-server - - - name: Teardown - if: always() - working-directory: examples - run: docker compose down --remove-orphans || true diff --git a/.github/workflows/headscale-e2e.yml b/.github/workflows/headscale-e2e.yml index 3609f47..f308c7f 100644 --- a/.github/workflows/headscale-e2e.yml +++ b/.github/workflows/headscale-e2e.yml @@ -1,54 +1,88 @@ -# QuackTail e2e — same flow as examples/README.md (source-built compose demo). +# QuackTail e2e over Headscale — manual only; uses GitHub release binaries (no source build). name: Headscale QuackTail e2e on: workflow_dispatch: + inputs: + release_tag: + description: 'GitHub release tag (use "latest" for newest release)' + required: false + default: v1.0.2 + type: string env: - COMPOSE_PROJECT_NAME: quacktail-ci + HEADSCALE_HOST: headscale + HEADSCALE_CONTROL_URL: http://headscale:8080 + HEADSCALE_MAGICDNS_BASE_DOMAIN: quackscale-ci.test + E2E_QUACK_ATTACH_HOST: hostname + E2E_QUACK_SERVE_MODE: loopback_serve + E2E_CLIENT_TIMEOUT_SEC: 180 + QUACKTAIL_CLIENT_ATTEMPTS: 15 + QUACKTAIL_CLIENT_POLL_SEC: 2 jobs: - compose-e2e: - name: Compose e2e (Headscale + DuckLake + Quack) + quacktail-e2e: + name: QuackTail e2e (release binary + Headscale) runs-on: ubuntu-latest - timeout-minutes: 90 + timeout-minutes: 30 permissions: contents: read steps: - uses: actions/checkout@v4 - with: - submodules: recursive - - name: Run compose e2e + - name: Download QuackTail release binary + run: | + chmod +x scripts/ci_download_release_duckdb.sh + eval "$(./scripts/ci_download_release_duckdb.sh "${{ inputs.release_tag }}")" + echo "DUCKDB=$DUCKDB" >> "$GITHUB_ENV" + "$DUCKDB" -version || "$DUCKDB" --version + + - name: Install quack extension + env: + DUCKDB_EXTENSION_DIRECTORY: ${{ runner.temp }}/duckdb_extensions + run: | + echo "DUCKDB_EXTENSION_DIRECTORY=$DUCKDB_EXTENSION_DIRECTORY" >> "$GITHUB_ENV" + chmod +x scripts/ci_ensure_quack.sh scripts/lib/quacktail_ci.sh scripts/lib/quacktail_ext.sh + ./scripts/ci_ensure_quack.sh + + - name: Start Headscale + run: | + chmod +x scripts/lib/headscale_ci.sh + export HEADSCALE_CI_ROOT="$GITHUB_WORKSPACE" + export HEADSCALE_DATA_DIR="$RUNNER_TEMP/headscale-data" + # shellcheck source=scripts/lib/headscale_ci.sh + source scripts/lib/headscale_ci.sh + headscale_ci_start "$HEADSCALE_DATA_DIR" + AUTHKEY="$(headscale_ci_create_authkey)" + echo "$AUTHKEY" > "$RUNNER_TEMP/headscale-authkey" + chmod 600 "$RUNNER_TEMP/headscale-authkey" + + - name: Run QuackTail e2e (server + client concurrent) + env: + HEADSCALE_AUTHKEY_FILE: ${{ runner.temp }}/headscale-authkey + HEADSCALE_ALREADY_RUNNING: "1" + DUCKDB_EXTENSION_DIRECTORY: ${{ runner.temp }}/duckdb_extensions run: | - chmod +x scripts/ci_compose_e2e.sh - ./scripts/ci_compose_e2e.sh + chmod +x scripts/ci_headscale_e2e.sh scripts/lib/headscale_ci.sh scripts/lib/quacktail_ci.sh scripts/e2e/quacktail-entrypoint.sh + ./scripts/ci_headscale_e2e.sh - - name: Collect compose logs on failure - if: failure() - working-directory: examples + - name: Stop containers + if: always() run: | - docker compose ps -a || true - docker compose logs --no-color > "${RUNNER_TEMP}/compose-logs.txt" 2>&1 || true - docker compose exec -T quacktail-server tail -100 /work/server.log 2>/dev/null \ - > "${RUNNER_TEMP}/server.log" || true - docker compose run --rm --entrypoint tail quacktail-client -100 /work/client.out 2>/dev/null \ - > "${RUNNER_TEMP}/client.out" || true - - - name: Upload e2e artifacts - if: failure() + export HEADSCALE_CI_ROOT="$GITHUB_WORKSPACE" + export QUACKTAIL_CI_ROOT="$GITHUB_WORKSPACE" + # shellcheck source=scripts/lib/quacktail_ci.sh + source scripts/lib/quacktail_ci.sh + quacktail_ci_stop + # shellcheck source=scripts/lib/headscale_ci.sh + source scripts/lib/headscale_ci.sh + headscale_ci_stop + + - name: Upload e2e logs + if: always() uses: actions/upload-artifact@v4 with: - name: quacktail-compose-e2e-logs - path: | - ${{ runner.temp }}/quacktail-compose-e2e.log - ${{ runner.temp }}/compose-logs.txt - ${{ runner.temp }}/server.log - ${{ runner.temp }}/client.out + name: quacktail-e2e-logs + path: .e2e-work/ if-no-files-found: warn - - - name: Teardown - if: always() - working-directory: examples - run: docker compose --profile test down --remove-orphans -v || true diff --git a/README.md b/README.md index 1467304..20f5506 100644 --- a/README.md +++ b/README.md @@ -199,7 +199,7 @@ Docker images (source build + verify): **[examples/README.md](examples/README.md make test ``` -Unit tests need no live tailnet. Integration: [examples/README.md](examples/README.md), [test/e2e/README.md](test/e2e/README.md), [`.github/workflows/headscale-e2e.yml`](.github/workflows/headscale-e2e.yml). +Unit tests need no live tailnet. **E2e (manual):** [`.github/workflows/headscale-e2e.yml`](.github/workflows/headscale-e2e.yml) — release binary, `workflow_dispatch` only. **Local full demo:** [examples/README.md](examples/README.md). **PR smoke:** [headscale-integration.yml](.github/workflows/headscale-integration.yml). --- diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index e272b38..82d5a47 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -68,13 +68,15 @@ When bumping the DuckDB target: ## CI workflows -| Workflow | Purpose | -|----------|---------| -| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | **Full compose e2e** (manual `workflow_dispatch` only) | -| [docker-compose-build.yml](../.github/workflows/docker-compose-build.yml) | Compose build + verify-image (PR gate) | -| [headscale-integration.yml](../.github/workflows/headscale-integration.yml) | Source build + Headscale smoke on host | -| [Release.yml](../.github/workflows/Release.yml) | Linux release tarball on GitHub Release | -| [libtailscale-integration.yml](../.github/workflows/libtailscale-integration.yml) | libtailscale `go test` | +| Workflow | Trigger | Purpose | +|----------|---------|---------| +| [headscale-e2e.yml](../.github/workflows/headscale-e2e.yml) | **Manual only** | Release-binary two-node e2e (no source build) | +| [headscale-integration.yml](../.github/workflows/headscale-integration.yml) | PR | Source build + Headscale smoke | +| [Release.yml](../.github/workflows/Release.yml) | Release published | Build linux release tarball | +| [libtailscale-integration.yml](../.github/workflows/libtailscale-integration.yml) | PR | libtailscale `go test` | +| [MainDistributionPipeline.yml](../.github/workflows/MainDistributionPipeline.yml) | PR | Extension distribution CI | + +**E2e never runs on push/PR** and never compiles DuckDB in CI — use `workflow_dispatch` on `headscale-e2e` with a release tag. Full DuckLake compose demo is local dev only (`scripts/ci_compose_e2e.sh`). ## Roadmap (selected) diff --git a/scripts/ci_compose_e2e.sh b/scripts/ci_compose_e2e.sh index b852fa8..4cd89c0 100755 --- a/scripts/ci_compose_e2e.sh +++ b/scripts/ci_compose_e2e.sh @@ -1,5 +1,6 @@ #!/usr/bin/env bash -# Full QuackTail e2e via examples/docker-compose.yml (same path as examples/README.md). +# Local/dev only: full compose e2e with SOURCE-built images (examples/docker-compose.yml). +# CI e2e uses release binaries — see .github/workflows/headscale-e2e.yml (workflow_dispatch). set -euo pipefail ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" @@ -8,10 +9,10 @@ LOG="${CI_COMPOSE_E2E_LOG:-${RUNNER_TEMP:-/tmp}/quacktail-compose-e2e.log}" cd "$EXAMPLES" -echo "=== docker compose build (source) ===" +echo "=== docker compose build (BUILD_FROM_SOURCE=1 — local dev only) ===" docker compose build quacktail-server quacktail-client -echo "=== verify image (attach_ducklake, tailscale_down, tailscale_quack_forward) ===" +echo "=== verify image ===" docker compose run --rm --entrypoint /usr/local/bin/quacktail-verify-image.sh quacktail-client echo "=== start headscale + quacktail-server ===" @@ -21,21 +22,9 @@ echo "=== run quacktail-client (profile test) ===" : >"$LOG" docker compose --profile test run --rm quacktail-client 2>&1 | tee "$LOG" -grep -q 'LAKE_PASSED' "$LOG" || { - echo "error: LAKE_PASSED missing from client output" >&2 - exit 1 -} -grep -q 'PASSED' "$LOG" || { - echo "error: PASSED missing from client output" >&2 - exit 1 -} -grep -qE 'Demo passed|CLIENT_DEMO_DONE' "$LOG" || { - echo "error: demo completion marker missing" >&2 - exit 1 -} -grep -q 'attach_ducklake' "$LOG" || { - echo "error: attach_ducklake path not used (rebuild image from source?)" >&2 - exit 1 -} - -echo "ok: Headscale QuackTail compose e2e passed" +grep -q 'LAKE_PASSED' "$LOG" || { echo "error: LAKE_PASSED missing" >&2; exit 1; } +grep -q 'PASSED' "$LOG" || { echo "error: PASSED missing" >&2; exit 1; } +grep -qE 'Demo passed|CLIENT_DEMO_DONE' "$LOG" || { echo "error: demo completion marker missing" >&2; exit 1; } +grep -q 'attach_ducklake' "$LOG" || { echo "error: attach_ducklake path not used" >&2; exit 1; } + +echo "ok: compose e2e passed (source build)" diff --git a/scripts/ci_headscale_e2e.sh b/scripts/ci_headscale_e2e.sh index 16f7875..246342d 100755 --- a/scripts/ci_headscale_e2e.sh +++ b/scripts/ci_headscale_e2e.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash -# Legacy two-node e2e: bind-mounted host duckdb + test/e2e/Dockerfile.quacktail. -# Prefer scripts/ci_compose_e2e.sh (examples/docker-compose.yml) — same as examples/README.md. +# CI e2e: release duckdb bind-mounted into minimal containers (no DuckDB compile). +# For source-built compose demo locally, use scripts/ci_compose_e2e.sh instead. set -euo pipefail ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" diff --git a/test/e2e/README.md b/test/e2e/README.md index fa0de37..83d2b56 100644 --- a/test/e2e/README.md +++ b/test/e2e/README.md @@ -2,38 +2,48 @@ Integration tests for a two-node QuackTail cluster over [Headscale](https://github.com/juanfont/headscale). -## Canonical path (matches examples demo) +## CI e2e (release binary, manual only) + +GitHub Actions: [`.github/workflows/headscale-e2e.yml`](../../.github/workflows/headscale-e2e.yml) + +- **Trigger:** `workflow_dispatch` only (never on push/PR) +- **DuckDB:** pre-built from a [GitHub release](https://github.com/quackscience/duckdb-quackscale/releases) via `scripts/ci_download_release_duckdb.sh` (default tag `v1.0.2`, or `latest`) +- **Runner:** `scripts/ci_headscale_e2e.sh` — Headscale + concurrent server/client containers with the release `duckdb` bind-mounted (minimal `test/e2e/Dockerfile.quacktail`, **no compile in CI**) ```bash -git submodule update --init --recursive -chmod +x scripts/ci_compose_e2e.sh -./scripts/ci_compose_e2e.sh +# Local equivalent (after downloading a release binary): +export DUCKDB=/path/to/release/duckdb +chmod +x scripts/ci_headscale_e2e.sh +./scripts/ci_headscale_e2e.sh ``` -Same steps as [examples/README.md](../../examples/README.md): +Expect `PASSED`, client `insert-from-client`, and server `seed-from-server` in client logs. + +Release binaries may not include newer SQL helpers (`attach_ducklake`, `tailscale_down`) — the release e2e validates **Quack over tailnet** (`tailscale_quack_forward`, `ATTACH`, DML), not the full DuckLake compose demo. -1. `docker compose build` (source, `BUILD_FROM_SOURCE=1`) -2. `quacktail-verify-image.sh` (`attach_ducklake`, `tailscale_down`, `tailscale_quack_forward`) -3. `docker compose up -d headscale quacktail-server` -4. `docker compose --profile test run --rm quacktail-client` +## Local compose e2e (source build — not CI) -Expect `LAKE_PASSED`, `PASSED`, `attach_ducklake`, and `✓ Demo passed`. +For the full DuckLake + `attach_ducklake` demo (builds DuckDB in Docker): -## CI workflows +```bash +git submodule update --init --recursive +chmod +x scripts/ci_compose_e2e.sh +./scripts/ci_compose_e2e.sh +``` -| Workflow | What it runs | -|----------|----------------| -| [headscale-e2e.yml](../../.github/workflows/headscale-e2e.yml) | Full compose e2e (`scripts/ci_compose_e2e.sh`) — **manual dispatch only** | -| [docker-compose-build.yml](../../.github/workflows/docker-compose-build.yml) | Build + verify-image only (faster PR gate) | -| [headscale-integration.yml](../../.github/workflows/headscale-integration.yml) | Source build + Headscale smoke (`tailscale_up` on host) | +Same as [examples/README.md](../../examples/README.md). Use this on a dev machine; **do not** wire it to push/PR workflows. -## Legacy host-container e2e +## PR / push CI (not e2e) -[`scripts/ci_headscale_e2e.sh`](../../scripts/ci_headscale_e2e.sh) runs concurrent server/client containers with a bind-mounted host `duckdb` binary and [`test/e2e/Dockerfile.quacktail`](Dockerfile.quacktail). Prefer **compose e2e** above — it uses [`examples/Dockerfile`](../../examples/Dockerfile) and the same entrypoint/bootstrap as the documented demo. +| Workflow | Trigger | Builds DuckDB? | +|----------|---------|----------------| +| [headscale-integration.yml](../../.github/workflows/headscale-integration.yml) | PR | Yes — smoke test only | +| [libtailscale-integration.yml](../../.github/workflows/libtailscale-integration.yml) | PR | Go tests | +| [MainDistributionPipeline.yml](../../.github/workflows/MainDistributionPipeline.yml) | PR / release | Extension CI | -## Client session shape +## Client session (release e2e) -Generated by [`scripts/e2e/quacktail-compose-bootstrap.sh`](../../scripts/e2e/quacktail-compose-bootstrap.sh): +Generated by [`scripts/lib/headscale_ci.sh`](../../scripts/lib/headscale_ci.sh) `headscale_ci_sql_client_session`: ```sql LOAD quackscale; @@ -42,15 +52,15 @@ CALL tailscale_quack_forward(host => 'quacktail-server', port => 9494, local_por CALL tailscale_ping(...); LOAD quack; CREATE SECRET ...; FROM quack_query(..., 'SELECT 1 AS probe', ...); -CALL attach_ducklake(...); -SELECT * FROM lake.inventory ...; -SELECT 'LAKE_PASSED' ...; ATTACH 'quack:127.0.0.1:19494' AS remote (TYPE quack); -SELECT 'PASSED' ...; -SELECT 'CLIENT_DEMO_DONE' AS status; -CALL tailscale_down(); +INSERT INTO remote.e2e_payload ...; +SELECT 'PASSED' ... FROM remote.e2e_payload; ``` +## Compose demo client session (source / local) + +See [`scripts/e2e/quacktail-compose-bootstrap.sh`](../../scripts/e2e/quacktail-compose-bootstrap.sh) — adds DuckLake, `attach_ducklake`, `CLIENT_DEMO_DONE`, `tailscale_down`. + ## Server (`loopback_serve`) ```sql @@ -58,8 +68,6 @@ CALL quack_serve('quack:127.0.0.1:9494', allow_other_hostname => true, token => CALL tailscale_serve_local(port => 9494); ``` -Healthcheck: `/work/server.log` contains `quack:127.0.0.1:9494` and `local_forward`. - ## Debug probe -[examples/docker-compose.yml](../../examples/docker-compose.yml) profile `debug`: vanilla `tailscale/tailscale` — isolates tailnet connectivity from DuckDB tsnet. +[examples/docker-compose.yml](../../examples/docker-compose.yml) profile `debug`: vanilla `tailscale/tailscale` container.