Granite Switch facilitates a modular architecture by consolidating multiple LoRA adapters into a single, unified checkpoint. The following tutorials explore the underlying mechanics and usability, detailing adapter function invocation, multi-step pipelines with guardrails, and checkpoint composition.
Step-by-step walkthroughs covering adapter function invocation, pipeline construction, and model composition.
| Notebook | Topics | Duration | Colab |
|---|---|---|---|
| hello_mellea.ipynb | Mellea adapter functions intro with vLLM | 5 min | |
| rag_101.ipynb | RAG 101: build a vector corpus and run a basic answerability check | 15 min | |
| rag_flow.ipynb | Full RAG flow with guardian checks (harm + scope) | 30 min | |
| compose_granite_switch.ipynb | Compose a checkpoint from adapter libraries | 15 min | |
| alora_vs_lora_race.ipynb | ALORA vs LoRA race: side-by-side throughput comparison on a multi-step RAG pipeline | 20 min | |
| hello_adapter.ipynb | Minimal adapter function invocation with HuggingFace | 5 min | |
| granite_switch_with_hf.ipynb | Compose + HuggingFace backend, adapter_name= invocation, Core + Guardian adapter functions in a multi-turn conversation |
10 min | |
| granite_speech_demo.ipynb | Real-time voice assistant: Granite Speech STT + Granite Switch LLM + Granite Libraries validation, orchestrated by Mellea over WebRTC | 10 min |
| Guide | Description |
|---|---|
| Using Mellea with Granite Switch | Connect Mellea to a Granite Switch model |
| Bring Your Own Adapter | Train, compose, and use custom adapters |
| Compare Inference Throughput | Compare LoRA vs aLoRA based models in an inference race setup |
for the following learning path we will use pre built models HuggingFace you can compose your on model see Path 3 Compose your own model
Best for: All inference use cases — development through production
Mellea is the correct way to invoke Granite Switch capabilities. It handles constrained decoding, prompt rewriting, and input/output processing automatically. Currently supports vLLM; HuggingFace support coming soon.
Best for: Seeing how adapter functions compose into multi-step applications
- RAG 101 - corpus build + answerability check, the smallest end-to-end RAG demo
- Full RAG Flow with Guardians - rewrite, answerability, citations, harm + scope checks
Best for: Custom adapter function development
Best for: Understanding how Granite Switch works at the control-token level
HuggingFace inference examples demonstrate how adapter functions are activated via control tokens, providing insight into the underlying mechanics. For most applications, we recommend running inference with Mellea (Part 2).
- Prerequisites
- Hello Adapter — see control tokens in action
- Granite Switch with HuggingFace — detailed walkthrough
Runnable scripts in scripts/ for common tasks:
| Script | Description |
|---|---|
| run_adapter_generation_direct.py | Direct adapter function invocation via control tokens |
| run_adapter_generation_mellea.py | Adapter function invocation through Mellea |
Granite Switch checkpoints embed adapters drawn from IBM's granitelib libraries. The three libraries below are featured throughout these tutorials:
| Adapter | Purpose | Where used in tutorials | HF repo |
|---|---|---|---|
| Core | Foundational post-generation adapter functions: certainty scoring, requirement checking, and response attribution. | granite_switch_with_hf, compose_granite_switch | ibm-granite/granitelib-core-r1.0 |
| RAG | Retrieval-augmented generation adapter functions: query rewrite, answerability, hallucination detection, and citation generation. | hello_mellea, rag_101, rag_flow, compose_granite_switch | ibm-granite/granitelib-rag-r1.0 |
| Guardian | Safety and risk detection: harm, social bias, jailbreaking, factuality, and policy compliance checks. | hello_adapter, hello_mellea, granite_switch_with_hf, rag_flow, compose_granite_switch | ibm-granite/granitelib-guardian-r1.0 |
| Resource | Description |
|---|---|
| Mellea | IBM's library for writing Generative Programs |
| Granite aLoRA Adapters | Official adapter libraries on HuggingFace |
| vLLM Documentation | High-performance inference |
| Granite Models | Base Granite models |
For technical details, see docs/:
- Supported Models — Model compatibility
- Git Workflow — Contribution guidelines