Skip to content

[Sparkth] Import files from Google Drive into Vector Store -- BE #203

@MahnoorArbisoft

Description

@MahnoorArbisoft

Enable users to connect their Google Drive via OAuth, select multiple files, and ingest them into the system. Selected files will be processed (parse → chunk → embed) and stored in the vector store. These resources will later be used as context in the chat-based course creation flow.

File Selection

  • Fetch and display user’s Google Drive files
  • Allow multi-select
  • Restrict to supported file types:
    • PDF
    • DOCX

Ingestion Pipeline
For each selected file:

  • Fetch file from Google Drive
  • Parse content to text
  • Chunk content
  • Generate embeddings
  • Store in vector DB
  • Store metadata per file:
  • file_name
  • file_type
  • source = "gdrive"
  • created_at

Resources Storage
Persist each processed file as a “Resource”. Resources should be queryable for downstream usage (chat retrieval)

Resources Tab (UI dependency)
Expose API to list resources

Fields required:

  • file_name
  • file_type

Note: Status (processing, ready, failed) + retry is out of scope for this ticket, but ingestion pipeline should be designed to support status tracking later.

Chat Integration
Expose API to:

  • Fetch available resources
  • Pass selected resource IDs to retrieval layer
  • Retrieval should use vector store based on selected resources

Acceptance Criteria
File Selection

  • User can fetch and view Drive files
  • User can select multiple files
  • Unsupported file types are excluded

Ingestion

  • Files are fetched successfully
  • Files are parsed and chunked
  • Embeddings are generated
  • Data is stored in vector DB
  • Metadata is stored and retrievable

Resources API

  • API returns list of resources
  • Each resource includes name and type

Chat Integration

  • Resource IDs can be passed to retrieval layer
  • Retrieval returns relevant chunks from selected resources only

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions