text-splitting

Star

Here are 23 public repositories matching this topic...

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated May 6, 2026
Python

jparkerweb / semantic-chunking

Star

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking equill-library

Updated Apr 17, 2026
JavaScript

messkan / rag-chunk

Star

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

python nlp ia chunking rag vector-search embedding-vectors llm langchain retrieval-augmented-generation text-splitting rag-pipeline document-chunking

Updated Jan 18, 2026
Python

dimicx / griffo

Sponsor

Star

Kerning-aware text splitting

react javascript typescript animation motion typography gsap morph morphing text-animation kerning framer-motion split-text text-splitting

Updated Feb 24, 2026
TypeScript

speedyk-005 / chunklet-py

Star

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

visualization python nlp natural-language-processing chunking code-structure code-chunking rag chunks-processing chunks-algorithm text-splitting document-chunking

Updated May 11, 2026
Python

sentencizer / sentencizer

Star

A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.

golang natural-language-processing ai nlp-library sentence-tokenizer sentence-segmentation sentence-boundary-detection sentence-splitting rag sentence-splitter sentence-segmenter text-splitter llm retrieval-augmented-generation text-splitting

Updated Aug 31, 2025
Go

jchunk-io / jchunk

Star

JChunk is a lightweight and flexible library designed to provide multiple strategies for text chunking within Java applications

java chunk chunking etl-pipeline rag text-splitter text-splitting

Updated Apr 13, 2026
Java

ResetNetwork / n8n-nodes

Star

A collection of custom n8n nodes for enhanced document processing, text splitting, and embeddings generation

typescript ai monorepo embeddings document-processing n8n langchain text-splitting n8n-community-nodes

Updated Nov 24, 2025
TypeScript

ekimetrics / adaptive-chunking

Star

Adaptive Chunking: automatically select the best chunking method per document for RAG. Accepted at LREC 2026.

nlp information-retrieval chunking rag llm text-splitting

Updated Mar 27, 2026
Python

philnash / chunkers

Sponsor

Star

An exploration of text splitting and chunking in JavaScript

text-splitter llamaindex langchain-js text-chunking text-splitting

Updated Nov 20, 2025
TypeScript

A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .

Updated May 5, 2025
JavaScript

HamedFathi / RecursiveTextSplitter

Sponsor

Star

A smart C# text splitting library that intelligently chunks text while preserving semantic boundaries. Uses a hierarchical approach with configurable overlap and detailed metadata.

csharp dotnet text dotnetcore dotnet-core recursive recursive-algorithm dotnet-library text-split text-splitter text-splitting recursive-text-splitter

Updated Jun 18, 2025
C#

VaidehiShyara14 / Ayurveda-PDF-Q-A-Chatbot

Star

An intelligent chatbot that allows users to upload text-based Ayurveda PDFs and ask questions based on the content using RAG (Retrieval-Augmented Generation) combining semantic search and LLM-based responses.

python pdf embeddings question-answering pymupdf fastapi vector-database llm langchain text-splitting llama3 fiass langchain-groq pdfprocessing

Updated Jun 28, 2025
Python

shantanu-deshmukh / chunktuner

Star

Benchmark chunking strategies for your RAG corpus. Get a recommended config. CLI, Python library, and MCP server.

retrieval optimization mcp evaluation chunking embedding rag vector-database llm langchain llamaindex litellm text-splitting ragas

Updated May 10, 2026
Python

Shuvob4 / LangChain-Tutorial

Star

LangChain is a framework, which is very helpful and easy to build applications based on available Large Language Models.

embeddings openai text-summarization vector-database langchain prompt-template text-splitting

Updated Mar 10, 2024
Jupyter Notebook

pranav-kural / ledaa-text-splitter

Sponsor

Star

Specialized markdown text splitter - part of LEDAA project's data ingestion pipeline for RAG.

python conversational-ai langchain text-splitting ledaa

Updated Feb 19, 2025
Python

1rishu0 / News-Research-Tool-Project

Star

I built a News Research Tool with Streamlit and LangChain that fetches news articles from URLs, processes them with text splitting and embeddings, and stores them in a FAISS vector DB. Users can query articles via a RetrievalQA chain to get precise, source-backed insights—showcasing my skills in LLMs and vector search.

serialization persistence embeddings web-scraping text-processing retrieval-based-dialog-system streamlit vector-database environment-management text-splitting llm-integration

Updated Sep 1, 2025
Jupyter Notebook

samliebl / word-matching

Star

Matching strings between lists based on length

string text text-splitter block-splitting text-splitting

Updated Sep 15, 2024
JavaScript

shikhar13012001 / research-papers-QA-langchain-pinecone

Star

This is an experiment in learning langchain, pinecone and stuff, don't mind

typescript ai serverless nextjs embeddings qna pinecone cohere-ai langchain pineconedb text-splitting fireworkai recursivecharactertextsplitter

Updated Jun 25, 2024
TypeScript

ABDELRAHMAN-ELRAYES / go-chunker

Star

A zero-dependency Go library for splitting text into overlap-aware chunks optimized for embeddings and RAG pipelines.

golang data-preprocessing chunking rag llm text-splitting text-processiong

Updated Apr 5, 2026
Go

Improve this page

Add a description, image, and links to the text-splitting topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-splitting topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-splitting

Here are 23 public repositories matching this topic...

isaacus-dev / semchunk

jparkerweb / semantic-chunking

messkan / rag-chunk

dimicx / griffo

speedyk-005 / chunklet-py

sentencizer / sentencizer

jchunk-io / jchunk

ResetNetwork / n8n-nodes

ekimetrics / adaptive-chunking

philnash / chunkers

HemalDholakiya12 / PDFChat

HamedFathi / RecursiveTextSplitter

VaidehiShyara14 / Ayurveda-PDF-Q-A-Chatbot

shantanu-deshmukh / chunktuner

Shuvob4 / LangChain-Tutorial

pranav-kural / ledaa-text-splitter

1rishu0 / News-Research-Tool-Project

samliebl / word-matching

shikhar13012001 / research-papers-QA-langchain-pinecone

ABDELRAHMAN-ELRAYES / go-chunker

Improve this page

Add this topic to your repo