Merge pull request #53 from sauagarwa/main

sauagarwa · web-flow · commit dcde3858ca1d · 2025-02-05T14:15:16.000-05:00
Changed default model to ibm-granite/granite-3.1-8b-instruct
diff --git a/README.md b/README.md
@@ -67,9 +67,9 @@ _Figure 4. Schematic diagram for Ingestion of data for RAG._
 _Figure 5. Schematic diagram for RAG demo augmented query._
 
 
-In Figure 5, we can see RAG augmented query. Community version of [Mistral-7B-Instruct](https://huggingface.co/mistral-community/Mistral-7B-Instruct-v0.3) model is used for language processing, LangChain to
+In Figure 5, we can see RAG augmented query. [IBM Granite 3.1-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model is used for language processing, LangChain to
 integrate different tools of the LLM-based application together and to process the PDF
-files and web pages, vector database provider such as EDB Postgres for Kubernetes or Redis, is used to store vectors, and [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to serve the [Mistral-7B-Instruct](https://huggingface.co/mistral-community/Mistral-7B-Instruct-v0.3) model, Gradio is used for user interface and object storage to store language model and other datasets.
+files and web pages, vector database provider such as EDB Postgres for Kubernetes or Redis, is used to store vectors, and [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to serve the [IBM Granite 3.1-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model, Gradio is used for user interface and object storage to store language model and other datasets.
 Solution components are deployed as microservices in the Red Hat OpenShift cluster.
 
 
@@ -84,7 +84,7 @@ _Figure 6. Proposed demo architecture with OpenShift AI_
 
 ### Components deployed
 
-- **vLLM Text Generation Inference Server:** The pattern deploys a vLLM Inference Server. The server deploys and serves `mistral-community/Mistral-7B-Instruct-v0.3` model. The server will require a GPU node.
+- **vLLM Text Generation Inference Server:** The pattern deploys a vLLM Inference Server. The server deploys and serves `ibm-granite/granite-3.1-8b-instruct` model. The server will require a GPU node.
 - **EDB Postgres for Kubernetes / Redis Server:** A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation.
 - **Populate VectorDb Job:** The job creates the embeddings and populates the vector database.
 - **LLM Application:** This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db.
@@ -112,7 +112,7 @@ cd rag-llm-gitops
 
 ### Configuring model
 
-This pattern deploys community version of [Mistral-7B-Instruct](https://huggingface.co/mistral-community/Mistral-7B-Instruct-v0.3) out of box. Run the following command to configure vault with the model Id.
+This pattern deploys [IBM Granite 3.1-8B-Instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) out of box. Run the following command to configure vault with the model Id.
 
 ```sh
 # Copy values-secret.yaml.template to ~/values-secret-rag-llm-gitops.yaml.
@@ -121,7 +121,7 @@ This pattern deploys community version of [Mistral-7B-Instruct](https://huggingf
 cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yaml
 ```
 
-To deploy a non-community [Mistral-7b-Instruct](https://huggingface.co/mistralai/) model, grab the [Hugging Face token](https://huggingface.co/settings/tokens) and accept the terms and conditions on the model page. Edit ~/values-secret-rag-llm-gitops.yaml to replace the `model Id` and the `Hugging Face` token.
+To deploy a model that can requires an Hugging Face token, grab the [Hugging Face token](https://huggingface.co/settings/tokens) and accept the terms and conditions on the model page. Edit ~/values-secret-rag-llm-gitops.yaml to replace the `model Id` and the `Hugging Face` token.
 
 ```sh
 secrets:
@@ -130,7 +130,7 @@ secrets:
     - name: hftoken
       value: null
     - name: modelId
-      value: "mistral-community/Mistral-7B-Instruct-v0.3"
+      value: "ibm-granite/granite-3.1-8b-instruct"
   - name: minio
     fields:
     - name: MINIO_ROOT_USER
diff --git a/charts/all/llm-monitoring/kustomize/base/grafanadashboard/ai-llm-dashboard.yaml b/charts/all/llm-monitoring/kustomize/base/grafanadashboard/ai-llm-dashboard.yaml
@@ -174,8 +174,8 @@ spec:
           "scopedVars": {
             "ModelID": {
               "selected": false,
-              "text": "mistralai/Mistral-7B-Instruct-v0.1",
-              "value": "mistralai/Mistral-7B-Instruct-v0.1"
+              "text": "ibm-granite/granite-3.1-8b-instruct",
+              "value": "ibm-granite/granite-3.1-8b-instruct"
             }
           },
           "targets": [
@@ -326,8 +326,8 @@ spec:
           "scopedVars": {
             "ModelID": {
               "selected": false,
-              "text": "mistralai/Mistral-7B-Instruct-v0.1",
-              "value": "mistralai/Mistral-7B-Instruct-v0.1"
+              "text": "ibm-granite/granite-3.1-8b-instruct",
+              "value": "ibm-granite/granite-3.1-8b-instruct"
             }
           },
           "targets": [
@@ -462,8 +462,8 @@ spec:
           "scopedVars": {
             "ModelID": {
               "selected": false,
-              "text": "mistralai/Mistral-7B-Instruct-v0.1",
-              "value": "mistralai/Mistral-7B-Instruct-v0.1"
+              "text": "ibm-granite/granite-3.1-8b-instruct",
+              "value": "ibm-granite/granite-3.1-8b-instruct"
             }
           },
           "targets": [
diff --git a/charts/all/llm-serving-service/templates/inference-service.yaml b/charts/all/llm-serving-service/templates/inference-service.yaml
@@ -2,11 +2,11 @@ apiVersion: serving.kserve.io/v1beta1
 kind: InferenceService
 metadata:
   annotations:
-    openshift.io/display-name: mistral-7b-instruct
+    openshift.io/display-name: ibm-granite-instruct
     serving.knative.openshift.io/enablePassthrough: 'true'
     sidecar.istio.io/inject: 'true'
     sidecar.istio.io/rewriteAppHTTPProbers: 'true'
-  name: mistral-7b-instruct
+  name: ibm-granite-instruct
   namespace: rag-llm
   labels:
     opendatahub.io/dashboard: 'true'
@@ -27,7 +27,7 @@ spec:
           cpu: '2'
           memory: 8Gi
           nvidia.com/gpu: '1'
-      runtime: mistral-7b-instruct
+      runtime: ibm-granite-instruct
     restartPolicy: OnFailure
     tolerations:
       - effect: NoSchedule
diff --git a/charts/all/llm-serving-service/templates/serving-runtime.yaml b/charts/all/llm-serving-service/templates/serving-runtime.yaml
@@ -5,8 +5,8 @@ metadata:
     opendatahub.io/accelerator-name: nvidia-gpu
     opendatahub.io/apiProtocol: REST
     opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]'
-    openshift.io/display-name: mistral-7b-instruct
-  name: mistral-7b-instruct
+    openshift.io/display-name: ibm-granite-instruct
+  name: ibm-granite-instruct
   namespace: rag-llm
   labels:
     opendatahub.io/dashboard: 'true'
@@ -19,7 +19,7 @@ spec:
         - '--port=8080'
         - '--model=/cache/models'
         - '--distributed-executor-backend=mp'
-        - '--served-model-name=mistral-7b-instruct'
+        - '--served-model-name=ibm-granite-instruct'
         - '--max-model-len=4096'
         - '--dtype=half'
         - '--gpu-memory-utilization'
@@ -44,7 +44,7 @@ spec:
               name: huggingface-secret
         - name: HF_HUB_OFFLINE
           value: '0'
-      image: 'quay.io/modh/vllm@sha256:b51fde66f162f1a78e8c027320dddf214732d5345953b1599a84fe0f0168c619'
+      image: 'quay.io/modh/vllm@sha256:c86ff1e89c86bc9821b75d7f2bbc170b3c13e3ccf538bf543b1110f23e056316'
       name: kserve-container
       ports:
         - containerPort: 8080
diff --git a/charts/all/rag-llm/files/config.yaml b/charts/all/rag-llm/files/config.yaml
@@ -2,10 +2,10 @@ llm_providers:
   - name: "OpenShift AI (vLLM)"
     enabled: True
     models:
-      - name: mistral-7b-instruct
+      - name: ibm-granite-instruct
         weight: 1
         enabled: True
-        url: https://mistral-7b-instruct-{{ .Values.llmui.namespace }}.{{ coalesce .Values.global.localClusterDomain .Values.global.hubClusterDomain }}/v1
+        url: https://ibm-granite-instruct-{{ .Values.llmui.namespace }}.{{ coalesce .Values.global.localClusterDomain .Values.global.hubClusterDomain }}/v1
         params:
           - name: max_new_tokens
             value: 1024
diff --git a/values-global.yaml b/values-global.yaml
@@ -11,7 +11,7 @@ global:
     type: EDB
 # Add for model ID
   model:
-      modelId: mistral-community/Mistral-7B-Instruct-v0.3
+      modelId: ibm-granite/granite-3.1-8b-instruct
 main:
   clusterGroupName: hub
   multiSourceConfig:
diff --git a/values-hub.yaml b/values-hub.yaml
@@ -59,6 +59,7 @@ clusterGroup:
   # We can use self-referential variables because the chart calls the tpl function with these variables defined
   sharedValueFiles:
     - '/overrides/values-{{ $.Values.global.clusterPlatform }}.yaml'
+    - 'values-rag-llm-gitops.yaml'
   # sharedValueFiles is a flexible mechanism that will add the listed valuefiles to every app defined in the
   # applications section. We intend this to supplement and possibly even replace previous "magic" mechanisms, though
   # we do not at present have a target date for removal.
diff --git a/values-rag-llm-gitops.yaml b/values-rag-llm-gitops.yaml
diff --git a/values-secret.yaml.template b/values-secret.yaml.template
@@ -5,7 +5,7 @@ version: "2.0"
 # Ideally you NEVER COMMIT THESE VALUES TO GIT (although if all passwords are
 # automatically generated inside the vault this should not really matter)
 
-# In order to use a the standard verison of mistralai/Mistral-7B-Instruct-v0.3
+# In order to use a the standard verison of ibm-granite/granite-3.1-8b-instruct
 # you will need to do the following:
 # provide your token as a value for hftoken
 # NOTE: you need to add value in values-global.yaml as well
@@ -16,7 +16,7 @@ secrets:
     - name: hftoken
       value: null
     - name: modelId
-      value: "mistral-community/Mistral-7B-Instruct-v0.3"
+      value: "ibm-granite/granite-3.1-8b-instruct"
   - name: minio
     fields:
     - name: MINIO_ROOT_USER