Skip to content

Introduce Provider-Agnostic Local-First Execution (Ollama Backend) with Runtime Decoupling and Performance Hardening#145

Open
spice14 wants to merge 22 commits intoVectifyAI:mainfrom
spice14:main
Open

Introduce Provider-Agnostic Local-First Execution (Ollama Backend) with Runtime Decoupling and Performance Hardening#145
spice14 wants to merge 22 commits intoVectifyAI:mainfrom
spice14:main

Conversation

@spice14
Copy link

@spice14 spice14 commented Mar 4, 2026

This pull request introduces a provider-agnostic, local-first execution mode for PageIndex, enabling fully offline document indexing and reasoning using Ollama as the default LLM backend.

This is not a simple endpoint swap. The PR introduces runtime abstraction, response normalization, prompt governance, and performance hardening to make local execution stable, reproducible, and production-viable.

The core PageIndex tree-based, vectorless reasoning architecture is preserved. The inference layer and operational surface are restructured to support pluggable providers, with Ollama as the primary local backend.

Key Enhancements

  1. Provider Runtime Decoupling
  • Replaced OpenAI-tied wrapper calls with provider-routed interfaces
  • Centralized provider selection into configuration instead of business logic
  • Preserved compatibility for optional future providers
  • This creates a clean inference boundary and prevents provider assumptions from leaking into traversal/indexing flows.
  1. Finish-Reason Normalization
  • Introduced a response handling layer to standardize continuation behavior
  • Normalized provider-specific stop/truncation semantics
  • Reduced brittle assumptions in recursive tree traversal flows
  • This improves determinism across different model backends.
  1. Local-First CLI & Operational Shift
  • Added CLI (cli.py) aligned with local inference defaults
  • Ollama integrated as first-class backend (mistral-based defaults, configurable)
  • Removed OpenAI dependency from default local workflow
  • Setup scripts for local model provisioning
  • No API keys required for default usage.
  1. Prompt Governance Refactor
  • Externalized prompts into a registry-driven loader system
  • Removed large inline prompt strings from core logic
  • Enables easier cross-model tuning and reproducibility
  1. Bounded Async Concurrency
  • Introduced semaphore-controlled parallelization in TOC detection and summarization flows
  • Improves throughput for slower local models
  • Maintains deterministic behavior
  1. Adaptive Fallback & Chunking Improvements
  • Hardened no-TOC and hierarchical fallback paths
  • Improved resilience under degraded-model scenarios
  1. Expanded Validation Surface
  • Added e2e and integration tests
  • Added parallel-processing validation scripts
  • Verified tree generation, node selection, and answer generation using Ollama
  • This reduces regression risk introduced by provider variability.

Why This Matters?

  • Upstream PageIndex is a powerful tree-based, vectorless reasoning framework for long documents.
  • This PR expands its footprint by:
  • Enabling fully offline execution
  • Reducing cloud/API coupling
  • Improving runtime abstraction boundaries
  • Increasing robustness across model providers
  • Supporting reproducible local experimentation
  • It moves PageIndex from an OpenAI-assumed execution model to a viable multi-provider, local-first architecture.

What’s Preserved

  • Core tree construction and reasoning logic
  • Indexing pipeline
  • High-level API contract and traversal design
  • No changes were made to the fundamental reasoning strategy.

Caveats

  • Smaller local models (e.g., ~3B) may struggle with complex reasoning compared to large hosted models.
  • Performance depends on local hardware (RAM/VRAM).
  • Upstream provider SDK code remains for compatibility; this PR reorients defaults but does not remove multi-provider flexibility.

Testing & Verification

  • Verified end-to-end CLI workflows with Ollama
  • Added automated tests covering:
  • Tree generation
  • Node selection
  • Answer generation
  • Parallel execution flows

Updated documentation for local setup and configuration

@spice14 spice14 changed the title Add local-first support with Ollama backend for PageIndex CLI and workflows Introduce Provider-Agnostic Local-First Execution (Ollama Backend) with Runtime Decoupling and Performance Hardening Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant