-
Notifications
You must be signed in to change notification settings - Fork 33
Vector storage #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Vector storage #212
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds end-to-end vector store support to DataPusher Plus, embedding resource data via a local SentenceTransformer model and querying via ChromaDB and OpenRouter.
- Introduces a new
DataPusherVectorStoreclass for embedding, querying, and managing vector data. - Integrates vector embedding into the upload job pipeline with optional temporal coverage extraction.
- Adds configuration settings and a helper to check embedding status.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ckanext/datapusher_plus/vector_store.py | New module implementing vector store integration |
| ckanext/datapusher_plus/jobs.py | Hooks embedding into the datapusher job pipeline |
| ckanext/datapusher_plus/helpers.py | Adds helper to query embedding status |
| ckanext/datapusher_plus/config.py | Adds configuration flags and defaults for vector store |
Comments suppressed due to low confidence (2)
ckanext/datapusher_plus/jobs.py:1599
- The new vector store embedding workflow in the job pipeline lacks corresponding unit or integration tests. Consider adding tests to cover
DataPusherVectorStore.embed_resourceand the job integration path.
if conf.ENABLE_VECTOR_STORE and VECTOR_STORE_AVAILABLE:
ckanext/datapusher_plus/jobs.py:1638
- The function
parsedateis not imported in this module, causing a NameError at runtime. Add the appropriate import (e.g.,from dateutil.parser import parse as parsedate).
min_year = parsedate(str(min_date)).year
| "ckanext.datapusher_plus.embedding_device", "cpu" | ||
| ) | ||
| # OpenRouter API Key | ||
| OPENROUTER_API_KEY = tk.config.get( |
Copilot
AI
Jul 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A default OpenRouter API key is hard-coded in source. This poses a security risk; consider loading it exclusively from a secure environment variable.
ef4f36d to
62962b5
Compare
No description provided.