#

document-indexing

Here are 24 public repositories matching this topic...

vunone / ennoia

Declarative Document Indexing (DDI) framework for Python. Define schemas, extract structured indices, search smarter.

python ai retrieval semantic-search rag document-indexing rag-pipeline

Updated Apr 20, 2026
Python

inboxpraveen / Context-Search-Engine

Context Search Engine is an AI-powered semantic document search platform built for learning, experimentation, and real-world prototyping. It demonstrates the full lifecycle of modern vector-based search — from document ingestion to chunking, embedding, indexing, and contextual query matching.

nlp search-engine natural-language-processing language-model bert data-ingestion context-embeddings rag vector-search bert-embeddings huggingface vector-database huggingface-transformers document-indexing llm

Updated Dec 24, 2025
Python

kyr0 / clientside-search

A highly efficient, isomorphic, full-featured, multilingual text search engine library, providing full-text search, fuzzy matching, phonetic scoring, document indexing and more, with micro JSON state hydration/dehydration in-browser and server-side.

nodejs multilingual search-engine browser trie fuzzy-matching full-text-search lucene tf-idf client-side phonetics text-processing bk-tree bm25 text-search document-search damerau-levenshtein-distance document-indexing state-hydration

Updated Jul 21, 2023
TypeScript

lethalbit / bookwurm

dead simple document index and search, nothing fancy

document-search document-indexing

Updated Mar 28, 2024
Python

joshuaswarren / openclaw-atlas

Atlas - Enterprise document indexing plugin for OpenClaw. Vectorless RAG using PageIndex with async indexing, incremental updates, and smart caching. Scales from 10 to 5000+ documents. Perfect for financial reports, legal docs, technical manuals, and research papers.

typescript artificial-intelligence knowledge-base document-management knowledge-management enterprise-search pageindex document-search rag vector-embeddings document-indexing llm llm-reasoning openclaw openclaw-plugin vectorless-search enterprise-scalability pdf-indexing

Updated Mar 23, 2026
TypeScript

karmaniverous / jeeves-watcher

Filesystem watcher that keeps a Qdrant vector store in sync with document changes. Config-driven rules engine, semantic search API, and CLI.

cli typescript embeddings gemini semantic-search rag filesystem-watcher document-indexing qdrant langchain vector-store

Updated Apr 15, 2026
TypeScript

ak811 / ase

Local Search Engine Implementation with Document Indexing

java search-engine dictionary android-sdk tf-idf search-algorithm n-gram document-indexing

Updated Jan 21, 2023
Java

SubhangiSati / RAG-using-DeepSeek-R1

This repository highlights my learning journey in building Retrieval-Augmented Generation (RAG) pipelines using DeepSeek on Lightning AI, covering document ingestion, retrieval, and integration with generative AI. It showcases fine-tuning, evaluation, and optimization for accurate open-domain QA and knowledge management.

api gpt embedding-models fine-tuning rag huggingface-transformers document-indexing llm generative-ai langchain deepseek

Updated Jan 24, 2025
Jupyter Notebook

jeanbou / chatbot-hr-assistant

The chatbot built as a part of Hackathon 2024 (Microsoft & Sword) competition on MS Azure AI Studio. The chatbot aims to assist HR with frequently used questions issue

love chatbots azure-search sword python-backend nodejs-react react-frontend openai-api document-indexing gpt4 azure-openai azure-ai-studio hr-assistant custom-qna enterprise-bot ai-powered-hr chatgpt-enterprise internal-support-bot

Updated Jan 6, 2025
Python

victor-cali / DocumentBase

Educational Document Base prototype to perform queries based on similarity and dissimilarity measures of documents to which stemming, lemmantization and latent semantic indexing was applied.

information-retrieval stemming latent-semantic-indexing document-indexing lemmantization

Updated Aug 25, 2021
Jupyter Notebook

sanu0711 / llama-index-and-openai

An AI-powered solution for efficient document querying. It uses Llama Index for vector-based indexing and OpenAI's GPT to interpret natural language queries, providing accurate search results.

natural-language-processing document-search vector-search openai-api document-indexing llma-index retrieval-augmented-generation vectorstoreindex

Updated Sep 21, 2024
Jupyter Notebook

file-bricks / knowledgedigest

Portable knowledge database with PySide6 GUI and web viewer. Indexes documents (PDF, DOCX, TXT, MD, HTML), FTS5 search, optional LLM summarization.

python ai sqlite knowledge-base document-processing fts5 pyside6 document-indexing llm

Updated Apr 12, 2026
Python

trkotovicz / document-indexing-algorithm-py

Programa que simula um algoritmo de indexação de documentos similar ao do Google. Ele é capaz de identificar ocorrências de termos em arquivos TXT.

python stack queue estrutura-de-dados tads fifo lifo dequeue linked-lists doubly-linked-list-python document-indexing

Updated Feb 10, 2023
Python

MaximLevchenko / Boolean-Model-Implementations-Comparison

The purpose of this project is also to compare the efficiency and performance of two different methods for handling search operations: the inverted index and the term-document matrix

react python search-engine flask web-application full-stack inverted-index term-document-matrix boolean-model nltk-python document-indexing

Updated Jul 30, 2024
Python

grignolalouis / logidoc-server

Self-hosted document indexing for AI agents. Upload docs, get searchable tables of contents.

go ai agents document-indexing

Updated Apr 5, 2026
Go

tahsinkoc / CHME

CHME, Compact Hierarchical Memory Engine (CHME) in-memory memory orchestration engine that provides multi-collection support, keyword-based retrieval, automatic routing, and snapshot persistence for LLM applications. Features deterministic behavior and supports both local (Ollama) and cloud (OpenAI-compatible) providers.

nodejs typescript memory hierarchical-data rag document-indexing llm context-management openai-compatible

Updated Apr 16, 2026
TypeScript

krishs-23 / mySearch

A real-time Personal Document Intelligence system that utilizes Java filesystem monitoring and Python RAG orchestration with Google Gemini to automatically index and semantically query local documents

Updated Feb 5, 2026
Python

Vladislavlhp7 / document_indexing

Implementation of Document Indexing as part of COMP34711: NLP

python nlp proximity-search inverted-indexing positional-indexing document-indexing

Updated Aug 5, 2023
Jupyter Notebook

nikhil-nandanwar / ir-model

Lightweight Python information-retrieval toolkit that builds and queries a document index, performs content retrieval and user profiling, provides ranking-analysis visualizations, and includes a Flask web UI with demos.

visualization search flask information-retrieval retrieval personalization webapp user-profiles document-indexing ranking-analysis

Updated Oct 21, 2025
Python

ThinkerYzu / kb-indexer

LLM-powered knowledge base indexer that builds a growing semantic layer of keywords and relationships for intelligent document search

knowledge-base semantic-search cli-tool ai-search document-indexing llm

Updated Oct 30, 2025
Python

Improve this page

Add a description, image, and links to the document-indexing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-indexing topic, visit your repo's landing page and select "manage topics."