Skip to content

Conversation

@minhajuddin2510
Copy link
Collaborator

No description provided.

@minhajuddin2510 minhajuddin2510 requested a review from Copilot July 15, 2025 18:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds end-to-end vector store support to DataPusher Plus, embedding resource data via a local SentenceTransformer model and querying via ChromaDB and OpenRouter.

  • Introduces a new DataPusherVectorStore class for embedding, querying, and managing vector data.
  • Integrates vector embedding into the upload job pipeline with optional temporal coverage extraction.
  • Adds configuration settings and a helper to check embedding status.

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

File Description
ckanext/datapusher_plus/vector_store.py New module implementing vector store integration
ckanext/datapusher_plus/jobs.py Hooks embedding into the datapusher job pipeline
ckanext/datapusher_plus/helpers.py Adds helper to query embedding status
ckanext/datapusher_plus/config.py Adds configuration flags and defaults for vector store
Comments suppressed due to low confidence (2)

ckanext/datapusher_plus/jobs.py:1599

  • The new vector store embedding workflow in the job pipeline lacks corresponding unit or integration tests. Consider adding tests to cover DataPusherVectorStore.embed_resource and the job integration path.
    if conf.ENABLE_VECTOR_STORE and VECTOR_STORE_AVAILABLE:

ckanext/datapusher_plus/jobs.py:1638

  • The function parsedate is not imported in this module, causing a NameError at runtime. Add the appropriate import (e.g., from dateutil.parser import parse as parsedate).
                                        min_year = parsedate(str(min_date)).year

"ckanext.datapusher_plus.embedding_device", "cpu"
)
# OpenRouter API Key
OPENROUTER_API_KEY = tk.config.get(
Copy link

Copilot AI Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A default OpenRouter API key is hard-coded in source. This poses a security risk; consider loading it exclusively from a secure environment variable.

Copilot uses AI. Check for mistakes.
@minhajuddin2510 minhajuddin2510 marked this pull request as draft July 15, 2025 18:07
@jqnatividad jqnatividad added this to the sept25-release milestone Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants