Declarative Document Indexing (DDI) framework for Python. Define schemas, extract structured indices, search smarter.
-
Updated
Apr 20, 2026 - Python
Declarative Document Indexing (DDI) framework for Python. Define schemas, extract structured indices, search smarter.
Context Search Engine is an AI-powered semantic document search platform built for learning, experimentation, and real-world prototyping. It demonstrates the full lifecycle of modern vector-based search — from document ingestion to chunking, embedding, indexing, and contextual query matching.
A highly efficient, isomorphic, full-featured, multilingual text search engine library, providing full-text search, fuzzy matching, phonetic scoring, document indexing and more, with micro JSON state hydration/dehydration in-browser and server-side.
dead simple document index and search, nothing fancy
Atlas - Enterprise document indexing plugin for OpenClaw. Vectorless RAG using PageIndex with async indexing, incremental updates, and smart caching. Scales from 10 to 5000+ documents. Perfect for financial reports, legal docs, technical manuals, and research papers.
Filesystem watcher that keeps a Qdrant vector store in sync with document changes. Config-driven rules engine, semantic search API, and CLI.
Local Search Engine Implementation with Document Indexing
This repository highlights my learning journey in building Retrieval-Augmented Generation (RAG) pipelines using DeepSeek on Lightning AI, covering document ingestion, retrieval, and integration with generative AI. It showcases fine-tuning, evaluation, and optimization for accurate open-domain QA and knowledge management.
The chatbot built as a part of Hackathon 2024 (Microsoft & Sword) competition on MS Azure AI Studio. The chatbot aims to assist HR with frequently used questions issue
Educational Document Base prototype to perform queries based on similarity and dissimilarity measures of documents to which stemming, lemmantization and latent semantic indexing was applied.
An AI-powered solution for efficient document querying. It uses Llama Index for vector-based indexing and OpenAI's GPT to interpret natural language queries, providing accurate search results.
Portable knowledge database with PySide6 GUI and web viewer. Indexes documents (PDF, DOCX, TXT, MD, HTML), FTS5 search, optional LLM summarization.
Programa que simula um algoritmo de indexação de documentos similar ao do Google. Ele é capaz de identificar ocorrências de termos em arquivos TXT.
The purpose of this project is also to compare the efficiency and performance of two different methods for handling search operations: the inverted index and the term-document matrix
Self-hosted document indexing for AI agents. Upload docs, get searchable tables of contents.
CHME, Compact Hierarchical Memory Engine (CHME) in-memory memory orchestration engine that provides multi-collection support, keyword-based retrieval, automatic routing, and snapshot persistence for LLM applications. Features deterministic behavior and supports both local (Ollama) and cloud (OpenAI-compatible) providers.
A real-time Personal Document Intelligence system that utilizes Java filesystem monitoring and Python RAG orchestration with Google Gemini to automatically index and semantically query local documents
Implementation of Document Indexing as part of COMP34711: NLP
Lightweight Python information-retrieval toolkit that builds and queries a document index, performs content retrieval and user profiling, provides ranking-analysis visualizations, and includes a Flask web UI with demos.
LLM-powered knowledge base indexer that builds a growing semantic layer of keywords and relationships for intelligent document search
Add a description, image, and links to the document-indexing topic page so that developers can more easily learn about it.
To associate your repository with the document-indexing topic, visit your repo's landing page and select "manage topics."