Skip to content

Feature/migrar questdb a extension timescaledb cu 869c821qk#53

Open
alafca wants to merge 43 commits into
devfrom
feature/Migrar-questdb-a-extension-timescaledb_CU-869c821qk
Open

Feature/migrar questdb a extension timescaledb cu 869c821qk#53
alafca wants to merge 43 commits into
devfrom
feature/Migrar-questdb-a-extension-timescaledb_CU-869c821qk

Conversation

@alafca

@alafca alafca commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Switched the app’s data storage and metrics stack to TimescaleDB.
    • Chat history now persists reliably with improved timestamp handling.
  • Bug Fixes

    • Updated database queries and API models for better Postgres-compatible behavior.
    • Improved session and activity reporting calculations.
  • Documentation

    • Refreshed setup, deployment, and environment examples to match the new database setup.
  • Chores

    • Updated container and local run configuration, plus ignored local TimescaleDB files.

@alafca alafca self-assigned this Jul 3, 2026
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The database backend is migrated from QuestDB to TimescaleDB/PostgreSQL. Connection configuration, LangGraph checkpointing/store, chat storage, auth/dashboard SQL queries, and server endpoints are updated to use pg_pool/PostgresChatStore. New SQL init scripts, Docker Compose services, and documentation reflect the change.

Changes

QuestDB to TimescaleDB migration

Layer / File(s) Summary
TimescaleDB schema and init scripts
sql/init_tsdb.sql, sql/init_tsdb.sh
New scripts create TimescaleDB extension, chats/messages/metrics/credential tables, hypertables, compression policies, and readiness/psql init logic.
Postgres connection configuration
.env.example, backend/src/metrics/connection.py, backend/pyproject.toml
Environment variables and connection factories move from QUESTDB_* to PG_*/DATABASE_URL, with new psycopg/langgraph-checkpoint-postgres dependencies.
LangGraph Postgres checkpointer and store
backend/graph/agent.py, backend/graph/__init__.py, backend/graph/tools.py, backend/benchmark.py
Adds get_checkpointer/get_store Postgres helpers, updates build_graph signature, and switches pool config keys from questdb_pool to pg_pool.
Postgres-backed chat storage
backend/src/chat/store.py, backend/src/model/endpoints.py
ChatStore (SQLite) is replaced by PostgresChatStore with cursor-based transactions; chat timestamp fields become datetime.
Server runtime migration
backend/server.py
Lifespan, endpoints, and metrics jobs switch from QuestDB pool to pg_pool/async_pg_pool and PostgresChatStore.
SQL query syntax updates
backend/src/config/auth.py, backend/src/metrics/dashboard_queries.py, backend/src/metrics/metrics.py
Auth queries use SELECT DISTINCT ON; dashboard queries use EXTRACT(EPOCH...)/to_char(); unused log_request helper removed.
Docker Compose TimescaleDB services
docker-compose*.yml, .gitignore
Compose files add timescaledb/timescaledb-init services, adjust volumes, depends_on, and port bindings.
Documentation updates
README.md, DOKPLOY_HOME_SERVER.md, LOGTO_SETUP.md, frontend/README.md, run.sh
Docs updated to describe TimescaleDB configuration, exposure rules, and deployment guidance.

Estimated code review effort: 4 (Complex) | ~60 minutes

Possibly related PRs

  • in2ai/ASM2-client#26: Introduces QuestDB pool wiring in backend/server.py/backend/src/metrics/connection.py that this PR migrates to pg_pool.
  • in2ai/ASM2-client#33: Touches the same auth/token-refresh plumbing (refresh_tokens, build_sources_status, backend/src/config/auth.py) modified here for Postgres.
  • in2ai/ASM2-client#47: Overlaps on backend/graph/tools.py (vectordb_search) and backend/server.py metrics recording logic changed in this PR.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title reflects the main change: migrating from QuestDB to TimescaleDB, though it includes extra branch/commit noise.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/Migrar-questdb-a-extension-timescaledb_CU-869c821qk

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/server.py (1)

321-350: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Move Postgres access off the event loop. These async handlers still call synchronous psycopg2 helpers directly (login_source, source-status/auth flows, chat store methods, and metrics endpoints). getconn()/execute()/putconn() blocks the event loop on every DB round trip, so concurrent requests can stall each other. Offload the DB work to worker threads or switch to an async Postgres driver.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/server.py` around lines 321 - 350, The login_source handler is still
calling synchronous Postgres helper work directly on the event loop via
add_credentials and pg_pool access. Move the database write off the async path
by offloading the add_credentials/pg_pool work to a worker thread or replacing
the psycopg2-based flow with an async Postgres client, and apply the same
pattern anywhere login_source relies on synchronous DB helpers.
🧹 Nitpick comments (5)
sql/init_tsdb.sql (2)

90-98: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

No supporting index for the credentials refresh query.

get_credentials_to_refresh (backend/src/config/auth.py) filters on needs_refresh_at IS NOT NULL AND needs_refresh_at < NOW(), but only idx_credentials_user_source_issued (user_id, source, issued_at DESC) exists, which doesn't cover this predicate. Given credentials tables are typically small, this is likely low impact, but worth a partial index if the table grows.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sql/init_tsdb.sql` around lines 90 - 98, The refresh query in
get_credentials_to_refresh is not backed by an index that matches its filter on
needs_refresh_at, so add a supporting partial index for credentials using
needs_refresh_at with a predicate for non-null values and timestamps before now.
Update the credentials table definition in init_tsdb.sql near the CREATE TABLE
IF NOT EXISTS credentials block, keeping the existing
idx_credentials_user_source_issued untouched, and make the new index specific
enough to speed the refresh lookup without broadening write overhead
unnecessarily.

30-71: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Add filter-column indexes on hypertables before enabling compression.

create_hypertable() only creates a default index on ts. Downstream dashboard queries (backend/src/metrics/dashboard_queries.py) filter/group heavily by user_id (and tag) over time ranges (e.g. WHERE user_id = %s, PARTITION BY user_id ORDER BY ts), which will force sequential/chunk scans without a supporting index. Consider adding explicit composite indexes matching your compression segmentby columns, e.g. (user_id, ts DESC) on metrics, word_counts, topic_counts, user_activity, and (topic_id, ts DESC) on topic_intl, mirroring the segmentby definitions already declared at Lines 73-77.

♻️ Suggested indexes
+CREATE INDEX IF NOT EXISTS idx_metrics_user_ts ON metrics (user_id, ts DESC);
+CREATE INDEX IF NOT EXISTS idx_word_counts_user_ts ON word_counts (user_id, ts DESC);
+CREATE INDEX IF NOT EXISTS idx_topic_counts_user_ts ON topic_counts (user_id, ts DESC);
+CREATE INDEX IF NOT EXISTS idx_topic_intl_topic_ts ON topic_intl (topic_id, ts DESC);
+CREATE INDEX IF NOT EXISTS idx_user_activity_user_ts ON user_activity (user_id, ts DESC);

 SELECT create_hypertable('metrics', 'ts', if_not_exists => TRUE);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sql/init_tsdb.sql` around lines 30 - 71, The hypertables currently rely only
on the default ts index, which won’t support the common dashboard filters and
grouping patterns. Add explicit composite indexes in the init script on the
affected tables, matching the query and compression segment keys used by
metrics, word_counts, topic_counts, user_activity, and topic_intl, so lookups
like user_id/tag/topic_id with ts are efficient. Place these index definitions
alongside the existing CREATE TABLE and create_hypertable statements in the same
init flow before compression is enabled.
backend/src/metrics/connection.py (1)

18-28: 🚀 Performance & Scalability | 🔵 Trivial

Consider revisiting pool size now that chat storage shares this pool.

This pool (maxconn=10) previously served metrics/credentials queries; it's now also backing PostgresChatStore for every chat CRUD/message operation (server.py: PostgresChatStore(app.state.pg_pool)). Combined with synchronous calls issued directly from async request handlers (see server.py comment), 10 connections may become a bottleneck under concurrent chat + dashboard traffic. Consider making pool size configurable via env var and monitoring saturation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/metrics/connection.py` around lines 18 - 28, The fixed maxconn=10
in get_pg_pool is now a shared bottleneck for both metrics queries and
PostgresChatStore traffic. Update get_pg_pool to read pool sizing from an
environment/config value instead of hardcoding it, and keep the existing
ThreadedConnectionPool setup in place. Make sure the new configurable max
connection setting is easy to tune for concurrent chat CRUD/message operations
and dashboard traffic.
backend/graph/agent.py (1)

13-17: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Consider enabling strict msgpack deserialization for the new Postgres checkpointer.

The library's docs recommend setting LANGGRAPH_STRICT_MSGPACK=true (or passing allowed_msgpack_modules) to restrict checkpoint deserialization to known-safe types, since by default any Python type can be deserialized from checkpoint data — a risk if the database is ever compromised. This is newly relevant now that checkpoints persist to a real Postgres database instead of in-memory.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/graph/agent.py` around lines 13 - 17, The new Postgres checkpointer
setup in get_checkpointer() should enable strict msgpack deserialization by
default. Update the AsyncPostgresSaver initialization path to pass a safe
allowed_msgpack_modules configuration or otherwise honor
LANGGRAPH_STRICT_MSGPACK=true so only trusted types can be deserialized from
checkpoint data. Keep get_store() unchanged, and make the change in the
checkpointer construction so the safety setting applies wherever
AsyncPostgresSaver is created.
backend/pyproject.toml (1)

68-68: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Loose version floors risk drift to incompatible releases.

langgraph-checkpoint-postgres>=2.0 allows resolving to any 2.x/3.x release. This library has had breaking schema/behavior changes between minor versions (e.g., a regression requiring rollback to 2.0.8 was reported upstream). Consider an upper bound or exact pin, especially since this pairs with custom sql/init_tsdb.sql schema that must stay compatible with the library's internal migrations.

Suggested tightening
-    "langgraph-checkpoint-postgres>=2.0",
+    "langgraph-checkpoint-postgres>=2.0,<4.0",

Also applies to: 107-108

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/pyproject.toml` at line 68, The dependency floor for
langgraph-checkpoint-postgres is too loose and can drift into incompatible
releases, so tighten the version constraint in pyproject.toml where the
langgraph-checkpoint-postgres entry is declared. Add an upper bound or pin to a
known-good version to keep it compatible with the custom sql/init_tsdb.sql
schema and the package’s internal migrations, and apply the same tightening to
the other affected dependency entries mentioned in the review.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.env.example:
- Around line 34-41: The PostgreSQL env setup is duplicated and can drift:
`PG_HOST/PG_PORT/PG_USER/PG_PASSWORD/PG_DB` are used by `get_pg_pool()` while
`DATABASE_URL` is read separately in `server.py` for the async pool. Update the
startup/config flow so `DATABASE_URL` is derived from the `PG_*` values as the
single source of truth, and handle the unset case explicitly before creating
`AsyncConnectionPool`; use `get_pg_pool()` and `DATABASE_URL`/`server.py` as the
key symbols to locate the config wiring.

In `@backend/graph/agent.py`:
- Around line 13-17: The shared Postgres pool used by get_checkpointer and
get_store is missing row_factory=dict_row, which can break AsyncPostgresSaver
and AsyncPostgresStore reads at runtime. Update the pool configuration where
that shared pool is created so it uses dict_row in addition to autocommit=True,
and keep get_checkpointer/get_store pointing at the same pool setup.

In `@backend/src/config/auth.py`:
- Around line 72-81: The credential selection in get_credentials_to_refresh is
picking rows after applying the refresh filter, which can return an older stale
credential for the same (user_id, source). Update the query so it first chooses
the latest row per (user_id, source) using the ordering on issued_at in
get_credentials_to_refresh, then applies the needs_refresh_at and expires_at
conditions to that selected row.

In `@docker-compose.dokploy.yml`:
- Around line 21-25: The backend startup dependency is missing the
timescaledb-init job, so backend may come up before the Postgres tables are
created. Update the backend service’s depends_on block to wait for
timescaledb-init to complete successfully, alongside the existing qdrant and
timescaledb health checks, so the init job finishes before backend starts
accepting traffic.

In `@docker-compose.local.yml`:
- Around line 47-49: The Logto port bindings are exposing services on all
interfaces instead of localhost-only. Update the port mappings in the docker
compose configuration so the Logto app and admin console bind to loopback only,
keeping the existing localhost-only behavior consistent with the local docs and
run script.
- Around line 3-8: The backend startup dependencies only wait for timescaledb to
be healthy, so schema initialization can still race the backend on first boot.
Update the backend service’s depends_on block in docker-compose.local.yml to
also require timescaledb-init with service_completed_successfully, alongside the
existing qdrant and timescaledb checks, so startup waits for the init job before
launching backend.

In `@docker-compose.yml`:
- Line 77: The dashboard port mapping in docker-compose should remain
loopback-only instead of exposing the UI on all interfaces. Update the dashboard
service’s port definition in docker-compose to bind through 127.0.0.1 for the
3001:80 mapping, and keep the configuration aligned with the localhost-only
behavior described in the stack docs. If broad network access is intended, make
that explicit elsewhere; otherwise restore the loopback-bound mapping.

---

Outside diff comments:
In `@backend/server.py`:
- Around line 321-350: The login_source handler is still calling synchronous
Postgres helper work directly on the event loop via add_credentials and pg_pool
access. Move the database write off the async path by offloading the
add_credentials/pg_pool work to a worker thread or replacing the psycopg2-based
flow with an async Postgres client, and apply the same pattern anywhere
login_source relies on synchronous DB helpers.

---

Nitpick comments:
In `@backend/graph/agent.py`:
- Around line 13-17: The new Postgres checkpointer setup in get_checkpointer()
should enable strict msgpack deserialization by default. Update the
AsyncPostgresSaver initialization path to pass a safe allowed_msgpack_modules
configuration or otherwise honor LANGGRAPH_STRICT_MSGPACK=true so only trusted
types can be deserialized from checkpoint data. Keep get_store() unchanged, and
make the change in the checkpointer construction so the safety setting applies
wherever AsyncPostgresSaver is created.

In `@backend/pyproject.toml`:
- Line 68: The dependency floor for langgraph-checkpoint-postgres is too loose
and can drift into incompatible releases, so tighten the version constraint in
pyproject.toml where the langgraph-checkpoint-postgres entry is declared. Add an
upper bound or pin to a known-good version to keep it compatible with the custom
sql/init_tsdb.sql schema and the package’s internal migrations, and apply the
same tightening to the other affected dependency entries mentioned in the
review.

In `@backend/src/metrics/connection.py`:
- Around line 18-28: The fixed maxconn=10 in get_pg_pool is now a shared
bottleneck for both metrics queries and PostgresChatStore traffic. Update
get_pg_pool to read pool sizing from an environment/config value instead of
hardcoding it, and keep the existing ThreadedConnectionPool setup in place. Make
sure the new configurable max connection setting is easy to tune for concurrent
chat CRUD/message operations and dashboard traffic.

In `@sql/init_tsdb.sql`:
- Around line 90-98: The refresh query in get_credentials_to_refresh is not
backed by an index that matches its filter on needs_refresh_at, so add a
supporting partial index for credentials using needs_refresh_at with a predicate
for non-null values and timestamps before now. Update the credentials table
definition in init_tsdb.sql near the CREATE TABLE IF NOT EXISTS credentials
block, keeping the existing idx_credentials_user_source_issued untouched, and
make the new index specific enough to speed the refresh lookup without
broadening write overhead unnecessarily.
- Around line 30-71: The hypertables currently rely only on the default ts
index, which won’t support the common dashboard filters and grouping patterns.
Add explicit composite indexes in the init script on the affected tables,
matching the query and compression segment keys used by metrics, word_counts,
topic_counts, user_activity, and topic_intl, so lookups like
user_id/tag/topic_id with ts are efficient. Place these index definitions
alongside the existing CREATE TABLE and create_hypertable statements in the same
init flow before compression is enabled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63cd321a-7f82-46bf-8593-d714d1b5159c

📥 Commits

Reviewing files that changed from the base of the PR and between 7d3fae8 and b1e9111.

⛔ Files ignored due to path filters (1)
  • backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (27)
  • .env.example
  • .gitignore
  • DOKPLOY_HOME_SERVER.md
  • LOGTO_SETUP.md
  • README.md
  • backend/benchmark.py
  • backend/graph/__init__.py
  • backend/graph/agent.py
  • backend/graph/tools.py
  • backend/pyproject.toml
  • backend/server.py
  • backend/src/chat/store.py
  • backend/src/config/auth.py
  • backend/src/metrics/connection.py
  • backend/src/metrics/context.py
  • backend/src/metrics/dashboard_queries.py
  • backend/src/metrics/metrics.py
  • backend/src/model/endpoints.py
  • docker-compose.dokploy.yml
  • docker-compose.local.yml
  • docker-compose.yml
  • frontend/README.md
  • run.sh
  • sql/init.sh
  • sql/init.sql
  • sql/init_tsdb.sh
  • sql/init_tsdb.sql
💤 Files with no reviewable changes (3)
  • sql/init.sh
  • backend/src/metrics/metrics.py
  • sql/init.sql

Comment thread .env.example
Comment on lines +34 to +41
# TIMESCALEDB
# ============================================

# QuestDB connection settings
# Choose ONE of the following configurations:
#
# ┌─────────────────────────────────────────────────────────────────┐
# │ OPTION A: Local QuestDB (via docker-compose.yml) │
# │ Use when running the full stack locally with QuestDB container │
# │ Command: docker compose up │
# └─────────────────────────────────────────────────────────────────┘
# QUESTDB_HOST=questdb
# QUESTDB_PORT=8812
# QUESTDB_USER=admin
# QUESTDB_PASSWORD=change_me_for_local_pgwire
# QUESTDB_HTTP_USER=admin
# QUESTDB_HTTP_PASSWORD=change_me_for_local_http
#
# ┌─────────────────────────────────────────────────────────────────┐
# │ OPTION B: Remote QuestDB on VPS │
# │ Use when connecting to QuestDB deployed on a VPS │
# │ Command: ./run.sh up --remote │
# └─────────────────────────────────────────────────────────────────┘
# QUESTDB_HOST=your-vps-ip-or-hostname
# QUESTDB_PORT=8812
# QUESTDB_USER=admin
# QUESTDB_PASSWORD=your_secure_password
# QUESTDB_HTTP_USER=your_http_username
# QUESTDB_HTTP_PASSWORD=your_http_password
#
# Required VPS firewall ports:
# - 8812: PostgreSQL wire protocol (required)
# - 9000: Web console/REST API (optional)
# - 9009: InfluxDB Line Protocol (if needed)

QUESTDB_HOST=questdb
QUESTDB_PORT=8812
QUESTDB_USER=admin
QUESTDB_PASSWORD=change_me_for_local_pgwire
QUESTDB_DB=qdb
QUESTDB_HTTP_USER=admin
QUESTDB_HTTP_PASSWORD=change_me_for_local_http
PG_HOST=
PG_PORT=5432
PG_USER=
PG_PASSWORD=
PG_DB=
DATABASE_URL=postgresql://

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Two independent DB connection configs must be kept in sync manually.

PG_HOST/PG_PORT/PG_USER/PG_PASSWORD/PG_DB feed the sync pool (get_pg_pool() in backend/src/metrics/connection.py, used for metrics/credentials/chat storage), while DATABASE_URL is a separately-typed value read directly in backend/server.py (DATABASE_URL = os.getenv("DATABASE_URL")) to build the async pool backing the LangGraph checkpointer/store. Nothing derives one from the other, so if PG_PASSWORD/PG_HOST/etc. change (e.g. credential rotation, host migration) without updating DATABASE_URL accordingly, the sync and async pools can silently point at different databases/credentials — chat history and graph checkpoints could end up split across environments. DATABASE_URL is also left unvalidated in server.py if unset (passing None as conninfo to AsyncConnectionPool).

Consider deriving DATABASE_URL from the PG_* values at startup (single source of truth) instead of requiring both to be maintained in parallel.

💡 Example: derive DATABASE_URL from PG_* in server.py
-    DATABASE_URL = os.getenv("DATABASE_URL")
+    DATABASE_URL = os.getenv("DATABASE_URL") or (
+        f"postgresql://{PG_USER}:{PG_PASSWORD}@{PG_HOST}:{PG_PORT}/{PG_DB}"
+    )
+    if not DATABASE_URL:
+        raise RuntimeError("DATABASE_URL or PG_* environment variables must be set")
🧰 Tools
🪛 dotenv-linter (4.0.0)

[warning] 39-39: [UnorderedKey] The PG_PASSWORD key should go before the PG_PORT key

(UnorderedKey)


[warning] 40-40: [UnorderedKey] The PG_DB key should go before the PG_HOST key

(UnorderedKey)


[warning] 41-41: [UnorderedKey] The DATABASE_URL key should go before the PG_DB key

(UnorderedKey)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.env.example around lines 34 - 41, The PostgreSQL env setup is duplicated
and can drift: `PG_HOST/PG_PORT/PG_USER/PG_PASSWORD/PG_DB` are used by
`get_pg_pool()` while `DATABASE_URL` is read separately in `server.py` for the
async pool. Update the startup/config flow so `DATABASE_URL` is derived from the
`PG_*` values as the single source of truth, and handle the unset case
explicitly before creating `AsyncConnectionPool`; use `get_pg_pool()` and
`DATABASE_URL`/`server.py` as the key symbols to locate the config wiring.

Comment thread backend/graph/agent.py
Comment on lines +13 to +17
def get_checkpointer(pool):
return AsyncPostgresSaver(pool)

def get_store(pool):
return AsyncPostgresStore(pool)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== file list =="
git ls-files | rg '^(backend/graph/agent\.py|backend/.*server\.py|backend/.*pool|backend/.*graph).*'

echo
echo "== backend/graph/agent.py =="
cat -n backend/graph/agent.py

echo
echo "== search for async_pg_pool and dict_row =="
rg -n "async_pg_pool|dict_row|AsyncPostgresSaver|AsyncPostgresStore|row_factory|prepare_threshold|autocommit" backend

Repository: in2ai/ASM2-client

Length of output: 3267


Add row_factory=dict_row to the shared Postgres pool. autocommit=True alone is not enough for AsyncPostgresSaver/AsyncPostgresStore; the current pool config can break checkpoint/store reads at runtime.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/graph/agent.py` around lines 13 - 17, The shared Postgres pool used
by get_checkpointer and get_store is missing row_factory=dict_row, which can
break AsyncPostgresSaver and AsyncPostgresStore reads at runtime. Update the
pool configuration where that shared pool is created so it uses dict_row in
addition to autocommit=True, and keep get_checkpointer/get_store pointing at the
same pool setup.

Comment on lines 72 to 81
def get_credentials_to_refresh(pool: ThreadedConnectionPool):
query = """
SELECT user_id, source, credentials, is_admin
SELECT DISTINCT ON (user_id, source) user_id, source, credentials, is_admin
FROM credentials
WHERE
WHERE
(expires_at IS NULL OR expires_at > NOW())
AND needs_refresh_at IS NOT NULL
AND needs_refresh_at < NOW()
LATEST ON issued_at PARTITION BY user_id, source
ORDER BY user_id, source, issued_at DESC
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect credentials table definition + add_credentials to understand row multiplicity per (user_id, source)
fd -a 'init_tsdb.sql' | xargs -I{} sed -n '1,200p' {}
rg -nP -C4 'def add_credentials' backend/src/config/auth.py

Repository: in2ai/ASM2-client

Length of output: 3782


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '--- auth.py outline ---'
ast-grep outline backend/src/config/auth.py --view expanded || true

echo '--- auth.py relevant lines ---'
nl -ba backend/src/config/auth.py | sed -n '1,260p'

echo '--- server.py relevant lines ---'
nl -ba backend/server.py | sed -n '150,240p'

echo '--- search for credential deletion/upsert ---'
rg -n "delete.*credential|upsert.*credential|ON CONFLICT|add_credentials\(" backend/src backend/server.py

Repository: in2ai/ASM2-client

Length of output: 1004


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
from pathlib import Path
for p in [Path('backend/src/config/auth.py'), Path('backend/server.py')]:
    print(f'FILE {p} lines={sum(1 for _ in p.open())}')
PY

Repository: in2ai/ASM2-client

Length of output: 230


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '--- auth.py relevant lines ---'
sed -n '1,190p' backend/src/config/auth.py

echo '--- server.py relevant lines ---'
sed -n '160,230p' backend/server.py

echo '--- search for credential deletion/upsert ---'
grep -RInE "delete.*credential|upsert.*credential|ON CONFLICT|add_credentials\(" backend/src backend/server.py || true

Repository: in2ai/ASM2-client

Length of output: 7002


Select the latest credential per (user_id, source) before filtering on needs_refresh_at. add_credentials() appends a new row, so older refreshable rows stay in the table; this query can keep returning an older stale row even after a newer credential has been written.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/config/auth.py` around lines 72 - 81, The credential selection in
get_credentials_to_refresh is picking rows after applying the refresh filter,
which can return an older stale credential for the same (user_id, source).
Update the query so it first chooses the latest row per (user_id, source) using
the ordering on issued_at in get_credentials_to_refresh, then applies the
needs_refresh_at and expires_at conditions to that selected row.

Comment on lines 21 to +25
depends_on:
qdrant:
condition: service_healthy
questdb:
condition: service_started
timescaledb:
condition: service_healthy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the compose file and relevant backend/init references.
git ls-files | rg '(^|/)docker-compose\.dokploy\.yml$|(^|/)docker-compose.*yml$|(^|/)compose.*yml$|(^|/)backend|(^|/)timescaledb-init|(^|/)init_tsdb\.sql$'

echo "---- docker-compose.dokploy.yml (lines around relevant services) ----"
nl -ba docker-compose.dokploy.yml | sed -n '1,220p'

echo "---- search for timescaledb-init / init_tsdb / backend startup dependencies ----"
rg -n "timescaledb-init|init_tsdb|service_completed_successfully|service_healthy|depends_on|postgres|timescale|pool|retry|missing table|relation .* does not exist" -S .

echo "---- outline likely backend files if present ----"
fd -a -t f '.*\.(js|ts|py|go|rb|java|kt|cs|php)$' . | sed -n '1,200p'

Repository: in2ai/ASM2-client

Length of output: 1879


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- docker-compose.dokploy.yml ----"
cat -n docker-compose.dokploy.yml | sed -n '1,260p'

echo "---- relevant compose snippets across repo ----"
rg -n -C 3 "timescaledb-init|service_completed_successfully|service_healthy|depends_on:" docker-compose*.yml

echo "---- backend files mentioning postgres/timescale/retry/pool/startup ----"
rg -n -C 3 "postgres|timescale|retry|pool|missing table|relation .* does not exist|OperationalError|psycopg|sqlalchemy|asyncpg" backend/src backend/server.py backend/graph -S

echo "---- init SQL ----"
cat -n sql/init_tsdb.sql | sed -n '1,240p'

Repository: in2ai/ASM2-client

Length of output: 50375


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- backend/server.py ----"
cat -n backend/server.py | sed -n '1,260p'

echo "---- backend/src/config/config.py ----"
cat -n backend/src/config/config.py | sed -n '1,260p'

echo "---- backend/src/config/env.py ----"
cat -n backend/src/config/env.py | sed -n '1,240p'

echo "---- backend/src/connectors/store.py ----"
cat -n backend/src/connectors/store.py | sed -n '1,260p'

echo "---- backend/src/metrics/connection.py ----"
cat -n backend/src/metrics/connection.py | sed -n '1,260p'

Repository: in2ai/ASM2-client

Length of output: 24467


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- compose file with line numbers ----"
sed -n '1,220p' docker-compose.dokploy.yml | cat -n

echo "---- compose references to backend and timescaledb-init ----"
rg -n -C 4 "backend:|timescaledb-init:|condition: service_completed_successfully|condition: service_healthy|depends_on:" docker-compose.dokploy.yml

echo "---- backend startup entrypoints ----"
rg -n -C 3 "^if __name__ ==|FastAPI|uvicorn|app =" backend/server.py backend/src -S

echo "---- direct database connection / retry code ----"
rg -n -C 3 "connect\(|create_engine|Session|engine|pool|retry|OperationalError|asyncio\.sleep|wait_for|tenacity|backoff|psycopg|sqlalchemy" backend/src backend/server.py -S

Repository: in2ai/ASM2-client

Length of output: 50375


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- docker-compose.dokploy.yml ----"
sed -n '1,220p' docker-compose.dokploy.yml

echo "---- backend/server.py ----"
sed -n '1,260p' backend/server.py

echo "---- backend/src/connectors/store.py ----"
sed -n '1,260p' backend/src/connectors/store.py

echo "---- backend/src/metrics/connection.py ----"
sed -n '1,260p' backend/src/metrics/connection.py

echo "---- backend/src/config/config.py ----"
sed -n '1,260p' backend/src/config/config.py

Repository: in2ai/ASM2-client

Length of output: 23837


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- sql/init_tsdb.sh ----"
cat -n sql/init_tsdb.sh | sed -n '1,240p'

echo "---- sql/init_tsdb.sql ----"
cat -n sql/init_tsdb.sql | sed -n '1,260p'

echo "---- backend graph setup references ----"
rg -n -C 3 "setup\(\)|CREATE TABLE|CREATE EXTENSION|timescaledb|hypertable|credentials|source_preferences|user_activity|metrics|chats|messages|graph" backend/src backend/graph backend/server.py -S

Repository: in2ai/ASM2-client

Length of output: 50373


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- backend/src/utils/helpers.py ----"
cat -n backend/src/utils/helpers.py | sed -n '1,220p'

echo "---- backend/graph/agent.py ----"
cat -n backend/graph/agent.py | sed -n '1,220p'

echo "---- backend/src/chat/store.py ----"
cat -n backend/src/chat/store.py | sed -n '1,260p'

Repository: in2ai/ASM2-client

Length of output: 13175


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- backend/server.py (periodic jobs section) ----"
sed -n '258,340p' backend/server.py | cat -n

echo "---- backend/server.py (chat/message endpoints section) ----"
sed -n '410,570p' backend/server.py | cat -n

Repository: in2ai/ASM2-client

Length of output: 9608


Wait for timescaledb-init before starting backend

backend can become ready before the init job creates the Postgres tables, so early requests on a fresh deploy can fail until timescaledb-init finishes.

🔧 Suggested fix
     depends_on:
       qdrant:
         condition: service_healthy
       timescaledb:
         condition: service_healthy
+      timescaledb-init:
+        condition: service_completed_successfully
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
depends_on:
qdrant:
condition: service_healthy
questdb:
condition: service_started
timescaledb:
condition: service_healthy
depends_on:
qdrant:
condition: service_healthy
timescaledb:
condition: service_healthy
timescaledb-init:
condition: service_completed_successfully
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker-compose.dokploy.yml` around lines 21 - 25, The backend startup
dependency is missing the timescaledb-init job, so backend may come up before
the Postgres tables are created. Update the backend service’s depends_on block
to wait for timescaledb-init to complete successfully, alongside the existing
qdrant and timescaledb health checks, so the init job finishes before backend
starts accepting traffic.

Comment thread docker-compose.local.yml
Comment on lines 3 to 8
depends_on:
qdrant:
condition: service_started
questdb:
condition: service_started
timescaledb:
condition: service_healthy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Files ==\n'
git ls-files 'docker-compose*.yml' 'docker-compose*.yaml'

printf '\n== Relevant compose snippets ==\n'
for f in docker-compose.local.yml docker-compose.dokploy.yml; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    sed -n '1,220p' "$f"
    echo
  fi
done

printf '\n== Search for timescaledb-init and backend depends_on ==\n'
rg -n -C 3 'timescaledb-init|depends_on:|condition: service_completed_successfully|condition: service_healthy|backend:' docker-compose*.yml docker-compose*.yaml

Repository: in2ai/ASM2-client

Length of output: 15099


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Backend references to TimescaleDB / init SQL ==\n'
rg -n -C 3 'timescaledb|postgres|psycopg|asyncpg|sql/init_tsdb|init_tsdb|migration|migrate|schema|table' backend . --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**'

printf '\n== Files that look like backend entrypoints / DB setup ==\n'
git ls-files 'backend/**' | rg '(__main__|main\.py|app\.py|server\.py|start|init|migrat|db|database|postgres|sql)'

printf '\n== init_tsdb scripts ==\n'
for f in sql/init_tsdb.sql sql/init_tsdb.sh; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    sed -n '1,220p' "$f"
    echo
  fi
done

Repository: in2ai/ASM2-client

Length of output: 50376


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== backend/server.py outline ==\n'
ast-grep outline backend/server.py --view expanded || true

printf '\n== backend/server.py startup / DB sections ==\n'
sed -n '1,260p' backend/server.py
echo

printf '\n== backend/src/config/auth.py outline ==\n'
ast-grep outline backend/src/config/auth.py --view expanded || true

printf '\n== backend/src/config/auth.py ==\n'
sed -n '1,260p' backend/src/config/auth.py
echo

printf '\n== sql/init_tsdb.sql (first 220 lines) ==\n'
sed -n '1,220p' sql/init_tsdb.sql
echo

printf '\n== sql/init_tsdb.sh ==\n'
sed -n '1,220p' sql/init_tsdb.sh

Repository: in2ai/ASM2-client

Length of output: 18470


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Create-table / setup references in backend ==\n'
rg -n -C 2 'CREATE TABLE IF NOT EXISTS|setup\(\)|AsyncPostgresSaver|AsyncPostgresStore|PostgresChatStore|create_hypertable|source_preferences|credentials|messages|chats' backend graph --glob '!**/node_modules/**'

printf '\n== graph package files ==\n'
git ls-files 'graph/**' 'backend/src/chat/**' 'backend/src/metrics/**' | sed -n '1,200p'

printf '\n== likely startup-related graph files ==\n'
for f in graph/__init__.py graph/agent.py backend/src/chat/store.py backend/src/metrics/connection.py; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    sed -n '1,260p' "$f"
    echo
  fi
done

Repository: in2ai/ASM2-client

Length of output: 32870


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== backend/src/metrics/connection.py ==\n'
sed -n '1,240p' backend/src/metrics/connection.py
echo

printf '\n== backend/src/chat/store.py ==\n'
sed -n '1,280p' backend/src/chat/store.py
echo

printf '\n== graph files under backend ==\n'
git ls-files 'backend/graph/**' | sed -n '1,200p'
echo

for f in backend/graph/__init__.py backend/graph/agent.py backend/graph/nodes.py backend/graph/tools.py backend/graph/state.py; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    sed -n '1,240p' "$f"
    echo
  fi
done

Repository: in2ai/ASM2-client

Length of output: 17450


Wait for timescaledb-init before starting backend
backend uses tables created by sql/init_tsdb.sql (chats, messages, credentials, source_preferences), but this stack only waits for timescaledb to be healthy. Add timescaledb-init with service_completed_successfully here too, so the first boot can’t race schema setup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker-compose.local.yml` around lines 3 - 8, The backend startup
dependencies only wait for timescaledb to be healthy, so schema initialization
can still race the backend on first boot. Update the backend service’s
depends_on block in docker-compose.local.yml to also require timescaledb-init
with service_completed_successfully, alongside the existing qdrant and
timescaledb checks, so startup waits for the init job before launching backend.

Comment thread docker-compose.local.yml
Comment on lines 47 to +49
ports:
- "127.0.0.1:3011:3001"
- "127.0.0.1:3002:3002"
- "3011:3001"
- "3002:3002"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== docker-compose.local.yml (relevant section) =="
nl -ba docker-compose.local.yml | sed -n '1,120p'

echo
echo "== Search for Logto / port bindings in compose files =="
rg -n --hidden --glob '!*node_modules*' --glob '!*.git*' -e '127\.0\.0\.1:3011:3001|127\.0\.0\.1:3002:3002|3011:3001|3002:3002|logto' .

Repository: in2ai/ASM2-client

Length of output: 245


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== docker-compose.local.yml (relevant section) =="
awk 'NR>=1 && NR<=140 { printf "%4d  %s\n", NR, $0 }' docker-compose.local.yml

echo
echo "== Search for Logto / port bindings in repo =="
rg -n --hidden --glob '!*node_modules*' --glob '!*.git*' \
  -e '127\.0\.0\.1:3011:3001' \
  -e '127\.0\.0\.1:3002:3002' \
  -e '3011:3001' \
  -e '3002:3002' \
  -e 'logto' .

Repository: in2ai/ASM2-client

Length of output: 15564


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== README.md around docker-compose.local.yml mention =="
awk 'NR>=132 && NR<=148 { printf "%4d  %s\n", NR, $0 }' README.md

echo
echo "== run.sh relevant lines =="
awk 'NR>=1 && NR<=30 { printf "%4d  %s\n", NR, $0 }' run.sh

Repository: in2ai/ASM2-client

Length of output: 2647


Restore localhost-only Logto ports

These bindings publish Logto and the admin console on every interface; the local compose docs and run script still describe Logto as localhost-only, so keep the loopback binding unless public access is intended.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker-compose.local.yml` around lines 47 - 49, The Logto port bindings are
exposing services on all interfaces instead of localhost-only. Update the port
mappings in the docker compose configuration so the Logto app and admin console
bind to loopback only, keeping the existing localhost-only behavior consistent
with the local docs and run script.

Comment thread docker-compose.yml
condition: service_started
ports:
- "127.0.0.1:3001:80"
- "3001:80"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate compose files and inspect the relevant service block
git ls-files | rg '(^|/)(docker-compose|compose)(\..+)?\.ya?ml$|docker-compose\.yml$|compose\.yml$'
echo '--- docker-compose.yml (around dashboard service) ---'
cat -n docker-compose.yml | sed -n '1,180p'

echo '--- search for dashboard port / Dokploy / TimescaleDB references ---'
rg -n '127\.0\.0\.1:3001:80|3001:80|dashboard|Dokploy|TimescaleDB|QuestDB' -S .

Repository: in2ai/ASM2-client

Length of output: 17245


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '--- docker-compose.dokploy.yml dashboard block ---'
cat -n docker-compose.dokploy.yml | sed -n '60,100p'

echo '--- docker-compose.local.yml dashboard-related references ---'
rg -n 'dashboard|3001|127\.0\.0\.1:3001:80|80"' docker-compose.local.yml docker-compose.bench.yml docker-compose.gpu.yml docker-compose.qdrant-amd.yml docker-compose.qdrant-nvidia.yml README.md DOKPLOY_HOME_SERVER.md frontend/README.md

echo '--- README relevant sections ---'
cat -n README.md | sed -n '130,320p'

echo '--- DOKPLOY_HOME_SERVER relevant sections ---'
cat -n DOKPLOY_HOME_SERVER.md | sed -n '1,220p'

Repository: in2ai/ASM2-client

Length of output: 21721


Keep dashboard loopback-only in docker-compose.yml:77.
3001:80 publishes the UI on all interfaces, while the base stack docs describe it as localhost:3001. If public access is intended, document it explicitly; otherwise restore 127.0.0.1:3001:80.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker-compose.yml` at line 77, The dashboard port mapping in docker-compose
should remain loopback-only instead of exposing the UI on all interfaces. Update
the dashboard service’s port definition in docker-compose to bind through
127.0.0.1 for the 3001:80 mapping, and keep the configuration aligned with the
localhost-only behavior described in the stack docs. If broad network access is
intended, make that explicit elsewhere; otherwise restore the loopback-bound
mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant