opea-project · arpannookala-12 · Apr 21, 2026 · May 5, 2026 · May 27, 2026
diff --git a/third_party/Dell/model-deployment/Granite-3.2-2b-Instruct/deployment.md b/third_party/Dell/model-deployment/Granite-3.2-2b-Instruct/deployment.md
@@ -0,0 +1,101 @@
+## Step 1: Prerequisites to Deploy Granite-3.2-2b-Instruct Model on Xeon with Keycloak
+
+Ensure the Enterprise Inference stack with Keycloak is already deployed before proceeding.
+
+Edit `core/scripts/generate-token.sh` and set your values before sourcing it:
+
+| Variable                  | Description                                                              |
+| ------------------------- | ------------------------------------------------------------------------ |
+| `BASE_URL`                | Hostname of your cluster (e.g. `api.example.com`), without `https://`   |
+| `KEYCLOAK_ADMIN_USERNAME` | Keycloak admin username                                                  |
+| `KEYCLOAK_PASSWORD`       | Keycloak admin password                                                  |
+| `KEYCLOAK_CLIENT_ID`      | Keycloak client ID configured during EI deployment                       |
+
+Then run:
+
+```bash
+export HUGGING_FACE_HUB_TOKEN="your_token_here"
+
+cd ~/Enterprise-Inference
+source core/scripts/generate-token.sh
+```
+
+This exports: `BASE_URL`, `KEYCLOAK_CLIENT_ID`, `KEYCLOAK_CLIENT_SECRET`, and `TOKEN`.
+
+## Step 2: Deploy Granite-3.2-2b-Instruct Model
+
+```bash
+helm install vllm-granite-3-2-instruct ./core/helm-charts/vllm \
+  --values ./core/helm-charts/vllm/xeon-values.yaml \
+  --set LLM_MODEL_ID="ibm-granite/granite-3.2-2b-instruct" \
+  --set global.HUGGINGFACEHUB_API_TOKEN="$HUGGING_FACE_HUB_TOKEN" \
+  --set ingress.enabled=true \
+  --set ingress.secretname="${BASE_URL}" \
+  --set ingress.host="${BASE_URL}" \
+  --set oidc.client_id="$KEYCLOAK_CLIENT_ID" \
+  --set oidc.client_secret="$KEYCLOAK_CLIENT_SECRET" \
+  --set apisix.enabled=true \
+  --set tensor_parallel_size="1" \
+  --set pipeline_parallel_size="1"
+```
+
+## Step 3: Verify the Deployment
+
+```bash
+kubectl get pods
+kubectl get apisixroutes
+```
+
+Expected Output:
+
+```
+NAME                                          READY   STATUS    RESTARTS
+keycloak-0                                    1/1     Running   0
+keycloak-postgresql-0                         1/1     Running   0
+vllm-granite-3-2-instruct-<hash>-<hash>       1/1     Running   0
+```
+
+> Note: The pod name suffix `<hash>-<hash>` is auto-generated by Kubernetes and will differ on each deployment. Ensure all pods show `1/1 Running`.
+
+```
+NAME                                    HOSTS
+vllm-granite-3-2-instruct-apisixroute   api.example.com
+```
+
+## Step 4: Test the Deployed Model
+
+```bash
+curl -k https://${BASE_URL}/granite-3.2-2b-instruct-vllmcpu/v1/completions \
+  -X POST \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $TOKEN" \
+  -d '{
+    "model": "ibm-granite/granite-3.2-2b-instruct",
+    "prompt": "What is Deep Learning?",
+    "max_tokens": 25,
+    "temperature": 0
+  }'
+```
+
+If successful, the model will return a completion response.
+
+## To undeploy the model
+
+```bash
+helm uninstall vllm-granite-3-2-instruct
+```
+
+## Parameters
+
+| Parameter                                                    | Description                                                                                       |
+| ------------------------------------------------------------ | ------------------------------------------------------------------------------------------------- |
+| `--set LLM_MODEL_ID="ibm-granite/granite-3.2-2b-instruct"`  | Defines the target model from **Hugging Face** to deploy.                                         |
+| `--set global.HUGGINGFACEHUB_API_TOKEN="..."`                | Authenticates access to gated or private Hugging Face models. Replace with your own secure token. |
+| `--set ingress.enabled=true`                                 | Enables Kubernetes **Ingress** to expose the model service externally.                            |
+| `--set ingress.host="${BASE_URL}"`                           | Public hostname or FQDN for the inference endpoint (maps to your Ingress controller IP).          |
+| `--set ingress.secretname="${BASE_URL}"`                     | Kubernetes **TLS Secret** used for HTTPS termination at the ingress layer.                        |
+| `--set oidc.client_id="..."`                                 | Keycloak OIDC client ID used for token-based authentication.                                      |
+| `--set oidc.client_secret="..."`                             | Keycloak OIDC client secret corresponding to the client ID.                                       |
+| `--set apisix.enabled=true`                                  | Enables **APISIX** as the API gateway for routing and authentication.                             |
+| `--set tensor_parallel_size="1"`                             | Number of tensor parallel workers. Set to the number of available CPUs/GPUs per node.            |
+| `--set pipeline_parallel_size="1"`                           | Number of pipeline parallel stages. Typically `1` for single-node deployments.                   |
diff --git a/third_party/Dell/model-deployment/Granite-3.2-2b-Instruct/model-card.md b/third_party/Dell/model-deployment/Granite-3.2-2b-Instruct/model-card.md
@@ -0,0 +1,66 @@
+# granite-3.2-2b-instruct
+
+This model uses ibm-granite/granite-3.2-2b-instruct, a modern, lightweight instruction-tuned large language model developed by IBM Granite Team. It is designed for efficient reasoning, instruction-following, and enterprise-grade AI workloads such as summarization, problem solving, structured response generation, and conversational AI.
+
+For full details including model specifications, licensing, intended use, safety guidance, and example prompts, please visit the official Hugging Face page: **Official Hugging Face Page**
+
+https://huggingface.co/ibm-granite/granite-3.2-2b-instruct
+
+This model provides inference services using IBM’s open-weight Granite architecture and is distributed under the Apache 2.0 license.
+
+### Model Attribution
+
+**Developer:**	IBM (Granite Team)
+
+**purpose:** General-purpose instruction-following, reasoning, and enterprise AI workloads
+
+**Sizes/Variants:**	Granite 3.2 family – includes 2B, 8B, and larger variants optimized for different deployment scales
+
+**Modalities:**	Text → Text (natural language, reasoning, structured responses, code-related logic)
+
+**Parameter Size:** ~2 billion parameters (dense)
+
+**Max Context:**	Up to ~128K tokens (depending on backend and serving configuration)
+
+**License:** Apache 2.0 (open-weight, commercially usable)
+
+### Usage Notice
+
+**By using this model, you agree that:**
+
+- Inputs and outputs are processed by the IBM Granite 3.2 2B Instruct model.
+- You accept and comply with the Apache 2.0 License.
+- Generated outputs must be reviewed for accuracy, safety, and compliance prior to production use.
+- The model must not be used for malicious activities or violation of applicable laws or policies.
+- Deployment in high-risk or regulated environments should include appropriate validation and guardrails.
+
+### Intended Applications
+
+- Conversational AI and enterprise assistants
+- Instruction-following automation
+- Reasoning and decision-support systems
+- Long-document summarization and analysis
+- Retrieval-Augmented Generation (RAG) systems
+- Classification, extraction, and knowledge workflows
+- Code-related reasoning and structured logic explanation
+- Multilingual AI applications
+
+### Limitations
+
+- May still produce factual inaccuracies or hallucinated responses
+- Performance may vary depending on prompt quality and domain complexity
+- Not a replacement for expert decision-making in regulated environments
+- Requires human validation in sensitive or critical applications
+- Smaller size may reduce performance in highly complex multimodal tasks compared to very large models
+
+### References
+
+IBM Granite 3.2 Model Documentation
+https://www.ibm.com/architectures/product-guides/granite-32
+
+Hugging Face Model Page – IBM Granite 3.2 2B Instruct
+https://huggingface.co/ibm-granite/granite-3.2-2b-instruct
+
+IBM Granite Announcement Blog
+https://www.ibm.com/new/announcements/ibm-granite-3-2-open-source-reasoning-and-vision
+
diff --git a/third_party/Dell/model-deployment/README.md b/third_party/Dell/model-deployment/README.md