R8R (Rapid RAG Runtime) is an end-to-end intelligent RAG workflow platform that turns weeks of development into a 5-minute setup. Build advanced retrieval pipelines visually, execute them via API, or create them directly through Telegram.
Instead of writing 1000+ lines of RAG logic, you get:
- ๐งฉ Visual Workflow Builder - Drag, drop, deploy
- ๐ง Intelligent Memory System - 95.7% duplicate detection accuracy
- ๐ค Multi-LLM Orchestration - Run GPT-4, Claude & Gemini in parallel
- ๐ฌ Telegram Integration - Build workflows through chat
- ๐ Real-time Analytics - Cost tracking & performance monitoring
Building production-ready RAG systems is painful:
| Challenge | Reality |
|---|---|
| โฐ Time | 2-4 weeks to build a basic pipeline |
| ๐ง Complexity | 1000+ lines of code for query enhancement, retrieval, reranking |
| ๐ Repetition | Every project rebuilds the same logic |
| ๐ธ Cost | Manual LLM orchestration burns tokens unnecessarily |
| ๐ง Memory | No context persistence across sessions |
| ๐ Debugging | Multi-step failures are impossible to trace |
Developers waste countless hours rebuilding query enhancers, rerankers, Hyde processes, and memory systems โ again and again.
R8R provides pre-built, optimized RAG workflows accessible through:
- ๐ REST API - Single endpoint for any workflow
- ๐จ Visual Canvas - Drag-and-drop workflow builder
- ๐ฌ Telegram Bot - Natural language workflow creation
- ๐ Dashboard - Analytics, debugging, and cost tracking
- Before R8R: 1000+ lines of code, 2 weeks of development
+ With R8R: 5 minutes to deploy, one API call to execute
- Before: Manual LLM calls, no memory, hallucinations
+ With R8R: Multi-LLM consensus, 95.7% memory accuracy, verified outputs
- Before: Custom debugging, no visibility
+ With R8R: Real-time analytics, step-by-step logging, replay functionality# Sign up at https://r8r.ai
# Free tier: 1,000 queries/monthnpm install r8r-client
# or
pip install r8r-clientJavaScript/TypeScript:
import R8R from 'r8r-client';
const client = new R8R('your-api-key');
const result = await client.query(
"What are the latest treatments for type 2 diabetes?",
{
pipeline: 'advanced',
memory: true,
llms: ['gpt-4', 'claude-3']
}
);
console.log(result.answer);
console.log(result.sources);Python:
from r8r_client import R8RClient
client = R8RClient(api_key="your-api-key")
response = client.query(
"Explain quantum computing applications in healthcare",
pipeline="research",
memory=True
)
print(response['answer'])cURL:
curl -X POST https://api.r8r.ai/v1/query \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "How does photosynthesis work?",
"pipeline": "standard",
"format": "detailed"
}'Build RAG pipelines without code using our drag-and-drop canvas:
Available Nodes:
- Query Rewriter - Reformulates user queries for better retrieval
- Hyde Generator - Creates hypothetical answers to enhance context matching
- Vector Search - Semantic search using embeddings (text-embedding-3-small)
- Reranker - Re-scores retrieved documents for relevance
- LLM Response - Generates answers using retrieved context
- Memory Store - Persists conversation history across sessions
Example Workflow:
User Query โ Query Rewriter โ Hyde Generator โ Vector Search
โ Reranker โ Memory Check โ LLM Response โ Memory Store
Deploy your workflow and get an instant API endpoint.
R8R implements a three-tier memory architecture for persistent, context-aware conversations:
Architecture:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Redis (Hot Memory) โ
โ โข Current session context โ
โ โข Sub-10ms access time โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Qdrant Vector DB (Warm Memory) โ
โ โข Semantic search across past sessions โ
โ โข ~50ms retrieval time โ
โ โข 95.7% duplicate detection accuracy โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PostgreSQL (Cold Memory) โ
โ โข Full historical data โ
โ โข Structured queries for analytics โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Capabilities:
- โ 95.7% duplicate detection accuracy - Prevents memory bloat
- โ 93.4% similarity matching - Finds relevant past conversations
- โ Cross-session persistence - Context survives restarts
- โ Automatic consolidation - Background jobs optimize memory storage
Run multiple LLMs simultaneously for deeper, faster, more reliable answers:
Sequential (Old Way):
GPT-4 (3s) โ Claude (3s) โ Gemini (3s) = 9 seconds total
Parallel (R8R Way):
โโ GPT-4 โโโ
โโ Claude โโค โ Ensemble โ Final Answer (3 seconds total)
โโ Gemini โโ
Benefits:
- โก 45% faster response times
- ๐ฏ Better accuracy through consensus
- ๐ก๏ธ 99.8% uptime (fallback when providers fail)
- ๐ฐ Smart routing to cheapest suitable model
Supported LLMs:
- OpenAI (GPT-4, GPT-3.5-turbo)
- Anthropic (Claude 3 Opus, Claude 3 Sonnet)
- Google (Gemini Pro, Gemini Ultra)
What is Hyde? Hypothetical Document Embeddings - generate a hypothetical answer and search with that instead of the raw query.
Why It Works: User questions are often vague. A hypothetical answer is semantically closer to the documents you want to retrieve.
Example:
โ User asks: "How do I fix the login bug?"
โ Poor retrieval results
โ
Hyde generates: "To fix the login bug, update the authentication
middleware to handle token expiration by refreshing tokens..."
โ Excellent retrieval results
Impact:
- ๐ 60% reduction in hallucinations
- ๐ 40% improvement in retrieval quality
- โก Works automatically in advanced pipelines
Build entire RAG workflows directly from Telegram - no website needed.
How It Works:
1๏ธโฃ Message the bot:
/create Build a customer support RAG workflow.
Use GPT-4, enable memory, and search my knowledge base.
2๏ธโฃ R8R analyzes your request:
- Extracts intent and requirements
- Selects appropriate nodes
- Generates workflow configuration
3๏ธโฃ Receive your API key:
โ
Workflow created: "Customer Support RAG"
๐ API Key: r8r_sk_abc123xyz
๐ Endpoint: https://api.r8r.ai/v1/workflows/cs-support
Test it:
curl -X POST https://api.r8r.ai/v1/workflows/cs-support \
-H "Authorization: Bearer r8r_sk_abc123xyz" \
-d '{"query": "How do I reset my password?"}'
Available Commands:
/create- Create new workflow/list- Show all your workflows/stats- View usage analytics/edit <workflow_id>- Modify workflow/delete <workflow_id>- Remove workflow
Integration: Telegram data stored directly in PostgreSQL, unified with web workflows.
Real-Time Metrics:
- ๐ Total queries processed
- โฑ๏ธ Average response time
- ๐ฐ Token usage & cost breakdown
- ๐ Error rates by node
- ๐ฏ Retrieval quality scores
Performance Monitoring:
- ๐ฅ Latency heatmaps
- ๐ Cache hit rates
- ๐ง Memory usage trends
- ๐ Step-by-step execution logs
Cost Tracking:
- Per-workflow cost analysis
- Daily/weekly/monthly spend
- Provider-level breakdown (OpenAI vs Claude vs Gemini)
- Budget alerts and quotas
Debugging Tools:
- Timeline view of execution flow
- Node-level performance profiling
- Error stack traces with context
- Query replay for A/B testing
Frontend:
- Next.js 15 (App Router) + TypeScript
- Tailwind CSS for styling
- Canvas-based workflow editor
- React Server Components
Backend:
- Node.js + Express + TypeScript
- RESTful API design
- WebSocket for real-time updates
- JWT authentication
Databases:
PostgreSQL (Prisma ORM)
โโโ User accounts & authentication
โโโ Workflow schemas & configurations
โโโ API keys & permissions
โโโ Telegram user mappings
โโโ Execution logs & analytics
Qdrant Vector Database
โโโ Document embeddings
โโโ Conversation memory embeddings
โโโ Query history for caching
โโโ HNSW index optimization
Redis
โโโ Session management
โโโ Rate limiting
โโโ Short-term conversation cache
โโโ Job queue for async processing
AI Infrastructure:
- Multi-LLM orchestration layer
- text-embedding-3-small for vectorization
- Parallel execution engine
- Automatic fallback and retry logic
Telegram Integration:
- Telegram Bot API with webhooks
- Natural language workflow parsing
- Direct PostgreSQL integration (unified data model)
# All requests require an API key
Authorization: Bearer YOUR_API_KEYExecute a workflow with a query.
Request:
{
"query": "What are the benefits of exercise?",
"pipeline": "advanced",
"response_format": "detailed",
"llm_preferences": ["gpt-4", "claude-3"],
"memory": true,
"metadata": {
"user_id": "user_123",
"session_id": "session_456"
}
}Response:
{
"success": true,
"data": {
"answer": "Exercise provides numerous benefits including...",
"sources": [
{
"title": "Harvard Health - Exercise Benefits",
"url": "https://example.com/article",
"confidence": 0.95,
"relevance_score": 0.89
}
],
"metadata": {
"pipelines_used": ["vector", "hybrid"],
"llms_used": ["gpt-4", "claude-3"],
"confidence_score": 0.94,
"execution_time_ms": 1234,
"cost_usd": 0.0045
}
}
}List all your workflows.
Create a new workflow.
Get workflow details.
Update a workflow.
Delete a workflow.
Get usage analytics and metrics.
Choose from optimized workflows for different use cases:
| Pipeline | Speed | Accuracy | Cost | Use Case |
|---|---|---|---|---|
standard |
โกโกโก | โญโญโญ | ๐ฐ | General Q&A, FAQs |
advanced |
โกโก | โญโญโญโญ | ๐ฐ๐ฐ | Research, technical docs |
research |
โก | โญโญโญโญโญ | ๐ฐ๐ฐ๐ฐ | Academic papers, analysis |
enterprise |
โก | โญโญโญโญโญ | ๐ฐ๐ฐ๐ฐ๐ฐ | Mission-critical, compliance |
custom |
โ๏ธ | โ๏ธ | โ๏ธ | Build your own |
Pipeline Configurations:
// Standard: Fast, cost-effective
{
nodes: ['query_rewriter', 'vector_search', 'llm_response'],
llm: 'gpt-3.5-turbo',
top_k: 5
}
// Advanced: Multi-strategy retrieval
{
nodes: ['query_rewriter', 'hyde', 'vector_search', 'reranker', 'llm_response'],
llm: 'gpt-4',
top_k: 10,
rerank_threshold: 0.7
}
// Research: Maximum accuracy
{
nodes: ['query_rewriter', 'hyde', 'vector_search', 'reranker', 'verification', 'llm_response'],
llm: ['gpt-4', 'claude-3'],
top_k: 20,
rerank_threshold: 0.8,
require_citations: true
}// app/api/chat/route.ts
import R8R from 'r8r-client';
export async function POST(req: Request) {
const { message } = await req.json();
const client = new R8R(process.env.R8R_API_KEY!);
const result = await client.query(message, {
pipeline: 'advanced',
memory: true
});
return Response.json(result);
}import { useR8R } from 'r8r-react';
function ChatApp() {
const { query, loading, error } = useR8R(process.env.NEXT_PUBLIC_R8R_KEY);
const [messages, setMessages] = useState([]);
const handleSend = async (message: string) => {
const response = await query(message, { pipeline: 'standard' });
setMessages([...messages, { role: 'assistant', content: response.answer }]);
};
return <ChatInterface onSendMessage={handleSend} />;
}from flask import Flask, request, jsonify
from r8r_client import R8RClient
app = Flask(__name__)
client = R8RClient(api_key=os.getenv('R8R_API_KEY'))
@app.route('/api/chat', methods=['POST'])
def chat():
data = request.json
response = client.query(
data['message'],
pipeline='advanced',
memory=True
)
return jsonify(response)const express = require('express');
const R8R = require('r8r-client');
const app = express();
const client = new R8R(process.env.R8R_API_KEY);
app.post('/api/query', async (req, res) => {
const result = await client.query(req.body.question, {
pipeline: 'research',
memory: true
});
res.json(result);
});| Plan | Queries/Month | Features | Price |
|---|---|---|---|
| Free | 1,000 |
โข Standard workflows โข Basic analytics โข 5 custom workflows โข Community support |
$0 |
| Pro | 50,000 |
โข All advanced workflows โข Full analytics dashboard โข Unlimited custom workflows โข Memory system access โข Telegram integration โข Priority support โข Custom domain |
$49/mo |
| Enterprise | Unlimited |
โข Everything in Pro, plus: โข Dedicated instances โข SLA guarantees (99.9%) โข On-premise deployment โข Team collaboration โข SSO & advanced security โข Custom integrations โข 24/7 support |
Custom |
All plans include:
- โ All LLM providers (OpenAI, Claude, Gemini)
- โ Vector database access
- โ API & SDK access
- โ Basic rate limiting
- โก 90% faster deployment - 2 weeks โ 5 minutes
- ๐ง 95.7% memory accuracy - Industry-leading duplicate detection
- ๐ 45% faster responses - Through parallel LLM execution
- ๐ฏ 60% fewer hallucinations - Via Hyde process
- ๐ก๏ธ 99.8% uptime - Multi-provider redundancy
- ๐ 1000+ lines of code โ One API call
- ๐ฐ $15,000 saved per project (avg. developer time)
- ๐ 50+ early adopters with positive feedback
- โญ "Enterprise-level GenAI infra" - Beta Tester
- ๐ Getting Started Guide
- ๐ง API Reference
- ๐จ Workflow Builder Docs
- ๐ฌ Telegram Bot Guide
- ๐ง Memory System Deep Dive
- ๐ Best Practices
- ๐ฌ Telegram Workflow Builder - Natural language workflow creation (90% complete)
- ๐ง Memory Optimization - Testing advanced consolidation strategies
- ๐ง Memory Summarization Engine - Compress older histories into summaries
- โก Self-Optimizing Pipelines - Auto-adjust based on query patterns
- ๐ช Template Marketplace - Share and reuse community workflows
- ๐ Team Collaboration - Multi-user workspaces and permissions
- ๐งฉ Multi-Agent Workflows - Specialized agents working together
- ๐ Multi-language Support - Beyond English
- ๐ฑ Mobile SDKs - iOS and Android native clients
- ๐ Third-party Integrations - Slack, Discord, Microsoft Teams
- ๐ Fine-tuning Platform - Custom model training
- ๐ข Enterprise Features - Advanced compliance, audit logs, SSO
- โ SOC 2 Type II Certified (In Progress)
- โ GDPR Compliant - Data privacy by design
- โ CCPA Compliant - California data rights
- โ End-to-End Encryption - TLS 1.3, AES-256
- โ Zero Data Retention - Optional mode for sensitive use cases
- โ On-Premise Deployment - Available for enterprise
- โ Regular Security Audits - Quarterly penetration testing
- โ Role-Based Access Control - Granular permissions
We welcome contributions! Here's how you can help:
- ๐ Report Bugs - GitHub Issues
- ๐ก Suggest Features - Feature Requests
- ๐ Improve Docs - Submit PRs to our docs repo
- ๐งฉ Share Workflows - Contribute to template marketplace
- ๐ฌ Join Community - Discord Server
- ๐ Documentation: docs.r8r.ai
- ๐ฌ Community Discord: discord.gg/r8r
- ๐ง Email Support: support@r8r.ai
- ๐ Bug Reports: GitHub Issues
- ๐ System Status: status.r8r.ai
- ๐ฆ Twitter: @r8r_ai
R8R is licensed under the MIT License. See LICENSE for details.
Built with โค๏ธ by the FlowForge AI team.
Special thanks to:
- Our amazing beta testers
- The open-source community
- Contributors and supporters
For Developers:
- โก Deploy RAG systems 90% faster
- ๐ ๏ธ No infrastructure management
- ๐ฏ Production-ready from day one
- ๐ฐ Pay only for what you use
For AI Teams:
- ๐ก Focus on innovation, not plumbing
- ๐ A/B test retrieval strategies easily
- ๐ Rich analytics for optimization
- ๐งฉ Reusable workflow templates
For Enterprises:
- ๐ Enterprise-grade security
- ๐ Predictable cost scaling
- ๐ฌ 24/7 SLA-backed support
- ๐ข On-premise deployment option
Get Started Free โข View Demo โข Read Docs
ยฉ 2025 R8R AI. All rights reserved.