Skip to content

Harden ingestion workflow and add multi-document readiness#5

Merged
Aryan1718 merged 5 commits into
Aryan1718:developfrom
varun3011:main
Jun 8, 2026
Merged

Harden ingestion workflow and add multi-document readiness#5
Aryan1718 merged 5 commits into
Aryan1718:developfrom
varun3011:main

Conversation

@varun3011

@varun3011 varun3011 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Completes docs/task/ingestion-follow-up-hardening-and-validation.md.

This PR hardens the ingestion workflow, improves observability and recovery, adds repeatable load validation, and
extends query readiness for multi-document retrieval.

  • Align document status schema with runtime ingestion statuses.
  • Add ingestion timing fields and expose them in document APIs.
  • Add ingestion health and manual reconciliation endpoints.
  • Add stale ingestion reconciliation logic.
  • Refresh ingestion run status from worker success/failure paths.
  • Add focused tests for retry, reindex, reconciliation, ingestion runs, worker callbacks, and multi-document query
    behavior.
  • Add repeatable ingestion load-test tooling and documentation.
  • Add load-test artifacts for 10, 25, and 50 valid PDF runs.
  • Extend query APIs to support document_ids while keeping existing document_id compatibility.
  • Search retrieval chunks across selected documents and log all searched documents.

Additional Frontend Observability Update

  • Shows ingestion timing data in the Upload / Ingestion workspace page.
  • Adds an Ingestion timing section to Observability.
  • Displays average extraction, indexing, total ingestion, and batch wall time.
  • Stabilizes document ordering in the Upload page when timestamps match.

Validation

  • Real PDF upload/indexing passed.
  • 10 valid PDF load test: completed, 10 indexed, 0 failed.
  • 25 valid PDF load test: completed, 25 indexed, 0 failed.
  • 50 valid PDF load test: completed, 50 indexed, 0 failed.
  • Focused backend tests passed: 17 passed.
  • Worker regression tests passed: 2 passed.
  • Client build passed: npm run build.
  • Black checks passed for touched backend/worker files.

Notes

Supabase environments need scripts/schema.supabase.sql applied before using the new ingestion statuses and timing
fields.

@Aryan1718 Aryan1718 merged commit fb714dc into Aryan1718:develop Jun 8, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants