Skip to content

Latest commit

 

History

History
86 lines (66 loc) · 2.32 KB

File metadata and controls

86 lines (66 loc) · 2.32 KB

Feature Roadmap

This document includes the roadmap for the Serpent project. It outlines features to be implemented and their current status.

Important

This roadmap is a work in progress and is subject to change.

1. Document Loaders

  • Plain text loader (for .txt, .md, and .rst files)
  • JSON/JSONL loader with field extraction
  • HTML loader with tag stripping
  • PDF loader (via PoDoFo submodule)
  • Word document loader (.docx via miniz)
  • CSV and Excel loader

2. Text Chunking

  • Recursive character text splitter
  • Configurable chunk size and overlap
  • Semantic chunking (sentence-aware)
  • Token-based chunking
  • Markdown-aware chunking

3. Retrieval and Search

  • BM25 sparse retrieval index
  • HNSW dense vector store (hnswlib)
  • RRF hybrid fusion
  • Hybrid retriever (dense + sparse)
  • Metadata filtering
  • HNSW persistence with documents
  • MMR diversity reranking
  • Cross-encoder reranking

4. LLM Providers

  • llama.cpp local inference (CUDA support)
  • Embedding generation via llama.cpp
  • OpenRouter API integration
  • OpenAI-compatible API
  • Streaming generation
  • Batch inference

5. REST API Server (Removed for Simplification)

Note

The REST API server and web UI have been removed to simplify the project. These may be re-added in a future release.

6. Core Infrastructure

  • Document store with deduplication
  • Content hashing (FNV-1a)
  • Chunk embedding storage
  • Model auto-download (from HuggingFace)
  • Version header with library metadata
  • Persistent document storage (SQLite)
  • Database backend (PostgreSQL)

7. Python Bindings

  • pybind11 module structure
  • Document loading API
  • Retrieval API
  • LLM generation API
  • PyPI packaging

8. Utilities

  • Model manager with HuggingFace download
  • File utilities (header-only, not integrated)
  • String utilities (header-only, not integrated)
  • Embedding caching
  • Logging framework

9. Documentation and Testing

  • Unit tests (for types, chunker, loaders, and retrieval)
  • End-to-end tests (for the pipeline, persistence, and hybrid)
  • API server tests
  • Provider tests
  • MkDocs documentation site (basic)
  • Example notebooks
  • Performance benchmarks