Context
We currently have three PostgreSQL-backed document stores in the repo:
PgvectorDocumentStore — pure PostgreSQL access via psycopg
SupabaseDocumentStore — subclasses PgvectorDocumentStore and overrides only the connection layer
AlloyDBDocumentStore — currently in an open PR, requires the google-cloud-alloydb-connector
PgvectorDocumentStore can be conceptually split into two layers:
- Connection layer — responsible for the
psycopg connection, cursor management, etc.
- Data layer — SQL schema, filters, converters, retrieval logic (pure PostgreSQL)
Problem
The Supabase integration follows a clean pattern: it subclasses PgvectorDocumentStore and overrides only the connection layer, inheriting all SQL/data logic.
AlloyDB duplicates the entire data layer from pgvector. This means any bug fixed or feature added in one store must be manually mirrored to the other, which is error-prone and unsustainable as more PostgreSQL-backed integrations are added.
Proposal
Separate the connection and data layers in the pgvector package:
- Extract the data layer into a base class inside the pgvector package, exposing an abstract
_get_connection() method (and any other connection-specific hooks needed).
PgvectorDocumentStore becomes a thin subclass that implements _get_connection() using psycopg directly.
- Future PostgreSQL-backed integrations (AlloyDB, and any others) follow the same pattern as Supabase: one new class that overrides connection-related methods only, with no duplicated SQL or data-layer code.
The proposed structure
pgvector/
└── _base.py ← new file: PostgreSQLDocumentStore (abstract)
- all SQL constants (CREATE_TABLE_STATEMENT, etc.)
- all data methods (count, filter, write, delete, retrieve...)
- abstract _ensure_db_setup(self) -> None
- abstract _ensure_db_setup_async(self) -> None (optional: NotImplementedError)
└── document_store.py ← PgvectorDocumentStore(PostgreSQLDocumentStore)
__init__: takes connection_string
_ensure_db_setup: Connection.connect(conn_str)
_ensure_db_setup_async: AsyncConnection.connect(conn_str)
alloydb/
└── document_store.py ← AlloyDBDocumentStore(PostgreSQLDocumentStore)
__init__: takes instance_uri, user, password, ip_type, enable_iam_auth
_ensure_db_setup: Connector(...).connect(instance_uri, ...)
_ensure_db_setup_async: NotImplementedError (until connector supports it)
supabase/
└── document_store.py ← SupabasePgvectorDocumentStore(PgvectorDocumentStore)
__init__: reads SUPABASE_DB_URL, create_extension=False ← unchanged, already correct
The base class lives inside the pgvector package (no new package needed). alloydb-haystack would depend on pgvector-haystack rather than reimplementing it. supabase-haystack already does this correctly and requires no change.
Benefits
- All SQL, filtering, and conversion logic is inherited and tested once.
- Bug fixes and improvements to the data layer automatically propagate to all PostgreSQL-backed stores.
- New integrations only need to implement the connection layer.
- Consistent behavior across
Pgvector, Supabase, AlloyDB, and future variants.
Related
- Open PR for AlloyDB integration (to be updated to follow this pattern once the refactor lands).
Context
We currently have three PostgreSQL-backed document stores in the repo:
PgvectorDocumentStore— pure PostgreSQL access viapsycopgSupabaseDocumentStore— subclassesPgvectorDocumentStoreand overrides only the connection layerAlloyDBDocumentStore— currently in an open PR, requires thegoogle-cloud-alloydb-connectorPgvectorDocumentStorecan be conceptually split into two layers:psycopgconnection, cursor management, etc.Problem
The Supabase integration follows a clean pattern: it subclasses
PgvectorDocumentStoreand overrides only the connection layer, inheriting all SQL/data logic.AlloyDB duplicates the entire data layer from pgvector. This means any bug fixed or feature added in one store must be manually mirrored to the other, which is error-prone and unsustainable as more PostgreSQL-backed integrations are added.
Proposal
Separate the connection and data layers in the pgvector package:
_get_connection()method (and any other connection-specific hooks needed).PgvectorDocumentStorebecomes a thin subclass that implements_get_connection()usingpsycopgdirectly.The proposed structure
pgvector/ └── _base.py ← new file: PostgreSQLDocumentStore (abstract) - all SQL constants (CREATE_TABLE_STATEMENT, etc.) - all data methods (count, filter, write, delete, retrieve...) - abstract _ensure_db_setup(self) -> None - abstract _ensure_db_setup_async(self) -> None (optional: NotImplementedError) └── document_store.py ← PgvectorDocumentStore(PostgreSQLDocumentStore) __init__: takes connection_string _ensure_db_setup: Connection.connect(conn_str) _ensure_db_setup_async: AsyncConnection.connect(conn_str) alloydb/ └── document_store.py ← AlloyDBDocumentStore(PostgreSQLDocumentStore) __init__: takes instance_uri, user, password, ip_type, enable_iam_auth _ensure_db_setup: Connector(...).connect(instance_uri, ...) _ensure_db_setup_async: NotImplementedError (until connector supports it) supabase/ └── document_store.py ← SupabasePgvectorDocumentStore(PgvectorDocumentStore) __init__: reads SUPABASE_DB_URL, create_extension=False ← unchanged, already correctThe base class lives inside the pgvector package (no new package needed). alloydb-haystack would depend on pgvector-haystack rather than reimplementing it. supabase-haystack already does this correctly and requires no change.
Benefits
Pgvector,Supabase,AlloyDB, and future variants.Related