Refactor `PgvectorDocumentStore` making it reusable for future PostgreSQL-related integrations

## Context

We currently have three PostgreSQL-backed document stores in the repo:

- **`PgvectorDocumentStore`** — pure PostgreSQL access via `psycopg`
- **`SupabaseDocumentStore`** — subclasses `PgvectorDocumentStore` and overrides only the connection layer
- **`AlloyDBDocumentStore`** — currently in an open PR, requires the `google-cloud-alloydb-connector`

`PgvectorDocumentStore` can be conceptually split into two layers:

1. **Connection layer** — responsible for the `psycopg` connection, cursor management, etc.
2. **Data layer** — SQL schema, filters, converters, retrieval logic (pure PostgreSQL)

## Problem

The Supabase integration follows a clean pattern: it subclasses `PgvectorDocumentStore` and overrides only the connection layer, inheriting all SQL/data logic.

AlloyDB duplicates the entire data layer from pgvector. This means any bug fixed or feature added in one store must be manually mirrored to the other, which is error-prone and unsustainable as more PostgreSQL-backed integrations are added.

## Proposal

Separate the connection and data layers in the pgvector package:

- Extract the data layer into a base class inside the pgvector package, exposing an abstract `_get_connection()` method (and any other connection-specific hooks needed).
- `PgvectorDocumentStore` becomes a thin subclass that implements `_get_connection()` using `psycopg` directly.
- Future PostgreSQL-backed integrations (AlloyDB, and any others) follow the same pattern as Supabase: one new class that overrides connection-related methods only, with no duplicated SQL or data-layer code.

The proposed structure

```bash
  pgvector/
  └── _base.py  ← new file: PostgreSQLDocumentStore (abstract)
        - all SQL constants (CREATE_TABLE_STATEMENT, etc.)
        - all data methods (count, filter, write, delete, retrieve...)
        - abstract _ensure_db_setup(self) -> None
        - abstract _ensure_db_setup_async(self) -> None  (optional: NotImplementedError)

  └── document_store.py  ← PgvectorDocumentStore(PostgreSQLDocumentStore)
        __init__: takes connection_string
        _ensure_db_setup: Connection.connect(conn_str)
        _ensure_db_setup_async: AsyncConnection.connect(conn_str)

  alloydb/
  └── document_store.py  ← AlloyDBDocumentStore(PostgreSQLDocumentStore)
        __init__: takes instance_uri, user, password, ip_type, enable_iam_auth
        _ensure_db_setup: Connector(...).connect(instance_uri, ...)
        _ensure_db_setup_async: NotImplementedError (until connector supports it)

  supabase/
  └── document_store.py  ← SupabasePgvectorDocumentStore(PgvectorDocumentStore)
        __init__: reads SUPABASE_DB_URL, create_extension=False  ← unchanged, already correct
```

The base class lives inside the pgvector package (no new package needed). alloydb-haystack would depend on pgvector-haystack rather than reimplementing it. supabase-haystack already does this correctly and requires no change.





## Benefits

- All SQL, filtering, and conversion logic is inherited and tested once.
- Bug fixes and improvements to the data layer automatically propagate to all PostgreSQL-backed stores.
- New integrations only need to implement the connection layer.
- Consistent behavior across `Pgvector`, `Supabase`, `AlloyDB`, and future variants.

## Related

- Open PR for AlloyDB integration (to be updated to follow this pattern once the refactor lands).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `PgvectorDocumentStore` making it reusable for future PostgreSQL-related integrations #3239

Context

Problem

Proposal

Benefits

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Refactor PgvectorDocumentStore making it reusable for future PostgreSQL-related integrations #3239

Description

Context

Problem

Proposal

Benefits

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Refactor `PgvectorDocumentStore` making it reusable for future PostgreSQL-related integrations #3239