A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
-
Updated
Dec 24, 2025 - Python
A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.
Production-ready Snowflake RAG system with type-specific chunking
A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.
"My complete LangChain learning journey — from basics to advanced RAG, LCEL, LangGraph, LangServe, LangSmith with hands-on code examples."
This repository provides a fully modular implementation of a Retrieval-Augmented Generation (RAG) pipeline tailored for Italian legal-domain documents.
Smart text chunking tool for RAG systems. Splits long texts into sentence-based chunks with ~10%-15% overlap for better context retention. Runs fully in-browser with a clean UI and copyable outputs.
📝 Parse, chunk, and evaluate Markdown for RAG pipelines with token-accurate support and flexible strategies for optimal context management.
Add a description, image, and links to the document-chunking topic page so that developers can more easily learn about it.
To associate your repository with the document-chunking topic, visit your repo's landing page and select "manage topics."