Learn Retrieval-Augmented Generation (RAG) through practical examples: basic embeddings, semantic search knowledge base, grounding techniques, and adaptive knowledge management.
# Set Python version (requires Python 3.7+)
pyenv local 3.12
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtIn this demo, we use OpenAI's text-embedding-3-small model for generating embeddings. Therefore, you need to provide an OpenAI API key to run the script.
Simple demonstration of generating text embeddings.
python embedding_demo.pyBuild a semantic search knowledge base for team/service documentation.
python knowledge_base_demo.pyCompare weak vs strong grounding strategies to prevent hallucinations.
python grounding_demo.pyShows 4 approaches:
- Weak grounding - Basic prompt (prone to hallucination)
- Strong grounding - Strict rules, low temperature
- Citation grounding - Requires source references
- Structured grounding - JSON with confidence validation
Production-ready KB with feedback loop and maintenance operations.
python adaptive_kb_demo.pyFull production-grade KB combining embeddings, cross-encoder reranking, and adaptive learning.
python production_kb.pyFeatures:
- Two-stage retrieval - Broad embedding search, then cross-encoder reranking for precision
- Cross-encoder reranking - LLM scores each query-document pair (0–10) for accurate relevance
- Strong grounding - Strict citation rules and
temperature=0to prevent hallucination - Confidence filtering - Low-relevance results (<4/10) are surfaced as gaps, not guesses
- CRUD with versioning - Add, update (re-embeds), and soft-delete documents with timestamps
- Persistent storage - Saves full state (documents, embeddings, query log, gaps) to JSON
1. Initial Build
└─> Add documents → Generate embeddings
2. Production Usage
└─> Users ask questions
├─> Answered → Log success
└─> Unanswered → Log gap
3. Gap Analysis
└─> Review unanswered questions
└─> Identify missing topics
4. Content Evolution
├─> Add new documents (fill gaps)
├─> Update documents (fix errors, re-embed)
├─> Delete documents (remove obsolete)
└─> Merge duplicates
5. Repeat from step 2
RAG Pipeline:
- Embed documents → vector store
- Query → embed → similarity search
- Retrieve top-k contexts
- LLM generates grounded answer
Feedback Loop:
- Track unanswered/low-confidence queries
- Analyze patterns → identify gaps
- Add/update documentation
- Re-embed changed content
- Monitor improvement
- Adding:
kb.add_document(text, metadata) - Updating:
kb.add_document(text, metadata, doc_id=5)(re-embeds) - Deleting:
kb.delete_document(doc_id=5)(soft delete) - Finding duplicates:
kb.find_duplicates(threshold=0.85) - Gap analysis:
kb.get_knowledge_gaps()
- Uses
text-embedding-3-small(1536→512 dimensions) - Cosine similarity for semantic matching
- Strong grounding with citations
- Version tracking for updates
- Persistent storage (JSON)