-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Models deployed from Hugging Face using the inference-stack-deploy.sh script cannot be removed using the "Remove Model using deployment name" menu option. The removal process runs and reports success, but the model pod remains running and the deployment installed. See the full process below using the TinyLlama model as an example, starting from deploying the model:
user@master1:~/Enterprise-Inference/core$ ./inference-stack-deploy.sh
----------------------------------------------------------
| Intel AI for Enterprise Inference |
|---------------------------------------------------------|
| 1) Provision Enterprise Inference Cluster |
| 2) Decommission Existing Cluster |
| 3) Update Deployed Inference Cluster |
| 4) Brownfield Deployment of Enterprise Inference |
|---------------------------------------------------------|
Please choose an option (1, 2, 3 or 4):
> 3
-------------------------------------------------
| Update Existing Cluster |
|------------------------------------------------|
| 1) Manage Worker Nodes |
| 2) Manage LLM Models |
|------------------------------------------------|
Please choose an option (1 or 2):
> 2
-------------------------------------------------
| Manage LLM Models
|------------------------------------------------|
| 1) Deploy Model |
| 2) Undeploy Model |
| 3) List Installed Models |
| 4) Deploy Model from Hugging Face |
| 5) Remove Model using deployment name |
|------------------------------------------------|
Please choose an option (1, 2, 3, or 4):
> 4
-------------------------------------------------
| Deploy Model from Huggingface |
|------------------------------------------------|
Configuration file found, setting vars!
---------------------------------------
Metadata configuration file found, setting vars!
---------------------------------------
Proceeding with the setup of Fresh Kubernetes cluster: no
Proceeding with the setup of Habana AI Operator: no
Proceeding with the setup of Ingress Controller: yes
Proceeding with the setup of Keycloak : yes
Proceeding with the setup of Apisix: yes
Proceeding with the setup of GenAI Gateway: no
Proceeding with the setup of Observability: no
Proceeding with the setup of Ceph cluster: no
Proceeding with Ceph cluster uninstallation: no
Proceeding with the setup of Istio: no
NRI CPU Balloon Policy automatically enabled for CPU deployment
Using provided Huggingface token
Proceeding with the setup of Large Language Model (LLM): yes
----- Input -----
Using provided CLUSTER URL: api.example.com
Using provided certificate file: /home/user/certs/cert.pem
Using provided key file: /home/user/certs/key.pem
Using provided keycloak client id: api
Using provided Keycloak admin username: api-admin
Using provided Keycloak admin password
cpu_or_gpu is already set to c
Some required arguments are missing. Prompting for input...
Proceeding with the setup of Fresh Kubernetes cluster: no
Proceeding with the setup of Habana AI Operator: no
Proceeding with the setup of Ingress Controller: yes
Proceeding with the setup of Keycloak : yes
Proceeding with the setup of Apisix: yes
Proceeding with the setup of GenAI Gateway: no
Proceeding with the setup of Observability: no
Proceeding with the setup of Ceph cluster: no
Proceeding with Ceph cluster uninstallation: no
Proceeding with the setup of Istio: no
Proceeding with the setup of NRI CPU Balloon Policy: yes
Using provided Huggingface token
Proceeding with the setup of Large Language Model (LLM): yes
----- Input -----
Using provided CLUSTER URL: api.example.com
Using provided certificate file: /home/user/certs/cert.pem
Using provided key file: /home/user/certs/key.pem
Using provided keycloak client id: api
Using provided Keycloak admin username: api-admin
Using provided Keycloak admin password
cpu_or_gpu is already set to c
Enter the Huggingface Model ID: TinyLlama/TinyLlama-1.1B-Chat-v1.0
NOTICE: The model deployment name will be used as the release identifier for deployment. It must be unique, meaningful, and follow Kubernetes naming conventions — lowercase letters, numbers, and hyphens only. Capital letters or special characters are not allowed.
Enter Deployment Name for the Model: tinyllama-1-1b-chat-v1-0
NOTICE: Ensure the Tensor Parallel size value corresponds to the number of available Gaudi cards. Providing an incorrect value may result in the model being in a not ready state.
NOTICE: You are about to deploy a model directly from Hugging Face, which has not been pre-validated by our team. Do you wish to continue? (y/n) y
....
Inference LLM Model is deployed successfully.
-------------------------------------------------------------------------------------
| AI LLM Model Deployment Complete! |
| The model is transitioning to a state ready for Inference. |
| This may take some time depending on system resources and other factors. |
| Please standby... |
--------------------------------------------------------------------------------------
Accessing Deployed Models for Inference
https://github.com/opea-project/Enterprise-Inference/blob/main/docs/accessing-deployed-models.md
Please refer to this comprehensive guide for detailed instructions.
user@master1:~/Enterprise-Inference/core$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
auth-apisix auth-apisix-77784f9df6-8qn6z 1/1 Running 2 (114m ago) 8d
auth-apisix auth-apisix-etcd-0 1/1 Running 2 (114m ago) 8d
auth-apisix auth-apisix-ingress-controller-57889db898-99zhp 1/1 Running 2 (114m ago) 8d
default keycloak-0 1/1 Running 3 (114m ago) 8d
default keycloak-postgresql-0 1/1 Running 2 (114m ago) 8d
default tinyllama-1-1b-chat-v1-0-cpu-vllm-c946d8595-d9pg5 1/1 Running 0 80m
default vllm-llama-8b-67dbdf6549-hjwj8 1/1 Running 0 114m
default vllm-tei-d7645b56c-shcdn 1/1 Running 0 114m
habana-ai-operator habana-ai-device-plugin-ds-jhqsf 1/1 Running 2 (114m ago) 8d
habana-ai-operator habana-ai-driver-ubuntu-22-04-ds-bqpxh 1/1 Running 0 22m
habana-ai-operator habana-ai-feature-discovery-ds-44zd4 1/1 Running 2 (114m ago) 8d
habana-ai-operator habana-ai-metric-exporter-ds-kd57m 1/1 Running 2 (114m ago) 8d
habana-ai-operator habana-ai-operator-controller-manager-7845c6874c-c6c4g 2/2 Running 4 (114m ago) 8d
habana-ai-operator habana-ai-runtime-ds-7qg5t 1/1 Running 2 (114m ago) 8d
ingress-nginx ingress-nginx-controller-77674b4c66-dp9vz 1/1 Running 2 (114m ago) 8d
kube-system calico-kube-controllers-5db5978889-bgx4h 1/1 Running 2 (114m ago) 8d
kube-system calico-node-dwx86 1/1 Running 2 (114m ago) 8d
kube-system coredns-d665d669-fxvhp 1/1 Running 2 (114m ago) 8d
kube-system coredns-d665d669-wn6lw 0/1 Pending 0 8d
kube-system dns-autoscaler-597dccb9b9-8w7vs 1/1 Running 2 (114m ago) 8d
kube-system kube-apiserver-master1 1/1 Running 4 (114m ago) 8d
kube-system kube-controller-manager-master1 1/1 Running 4 (114m ago) 8d
kube-system kube-proxy-42mn9 1/1 Running 2 (114m ago) 8d
kube-system kube-scheduler-master1 1/1 Running 3 (114m ago) 8d
kube-system kubernetes-dashboard-7f4d4b895-455mw 1/1 Running 2 (114m ago) 8d
kube-system kubernetes-metrics-scraper-6d4c5d99f9-nvld5 1/1 Running 2 (114m ago) 8d
kube-system nodelocaldns-tkjmc 1/1 Running 4 (114m ago) 8d
kube-system registry-p98c4 1/1 Running 2 (114m ago) 8d
local-path-storage local-path-provisioner-68b545849f-f2jsp 1/1 Running 2 (114m ago) 8d
user@master1:~/Enterprise-Inference/core$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
tinyllama-1-1b-chat-v1-0-cpu-vllm 1/1 1 1 81m
vllm-llama-8b 1/1 1 1 2d
vllm-tei 1/1 1 1 8d
user@master1:~/Enterprise-Inference/core$ ./inference-stack-deploy.sh
----------------------------------------------------------
| Intel AI for Enterprise Inference |
|---------------------------------------------------------|
| 1) Provision Enterprise Inference Cluster |
| 2) Decommission Existing Cluster |
| 3) Update Deployed Inference Cluster |
| 4) Brownfield Deployment of Enterprise Inference |
|---------------------------------------------------------|
Please choose an option (1, 2, 3 or 4):
> 3
-------------------------------------------------
| Update Existing Cluster |
|------------------------------------------------|
| 1) Manage Worker Nodes |
| 2) Manage LLM Models |
|------------------------------------------------|
Please choose an option (1 or 2):
> 2
-------------------------------------------------
| Manage LLM Models
|------------------------------------------------|
| 1) Deploy Model |
| 2) Undeploy Model |
| 3) List Installed Models |
| 4) Deploy Model from Hugging Face |
| 5) Remove Model using deployment name |
|------------------------------------------------|
Please choose an option (1, 2, 3, or 4):
> 5
-------------------------------------------------
| Removing Model using Deployment name |
|------------------------------------------------|
Configuration file found, setting vars!
---------------------------------------
Metadata configuration file found, setting vars!
---------------------------------------
Proceeding with the setup of Fresh Kubernetes cluster: no
Proceeding with the setup of Habana AI Operator: no
Proceeding with the setup of Ingress Controller: yes
Proceeding with the setup of Keycloak : yes
Proceeding with the setup of Apisix: yes
Proceeding with the setup of GenAI Gateway: no
Proceeding with the setup of Observability: no
Proceeding with the setup of Ceph cluster: no
Proceeding with Ceph cluster uninstallation: no
Proceeding with the setup of Istio: no
NRI CPU Balloon Policy automatically enabled for CPU deployment
Using provided Huggingface token
Proceeding with the setup of Large Language Model (LLM): yes
----- Input -----
Using provided CLUSTER URL: api.example.com
Using provided certificate file: /home/user/certs/cert.pem
Using provided key file: /home/user/certs/key.pem
Using provided keycloak client id: api
Using provided Keycloak admin username: api-admin
Using provided Keycloak admin password
cpu_or_gpu is already set to c
Some required arguments are missing. Prompting for input...
Proceeding with the setup of Fresh Kubernetes cluster: no
Proceeding with the setup of Habana AI Operator: no
Proceeding with the setup of Ingress Controller: yes
Proceeding with the setup of Keycloak : yes
Proceeding with the setup of Apisix: yes
Proceeding with the setup of GenAI Gateway: no
Proceeding with the setup of Observability: no
Proceeding with the setup of Ceph cluster: no
Proceeding with Ceph cluster uninstallation: no
Proceeding with the setup of Istio: no
Proceeding with the setup of NRI CPU Balloon Policy: yes
Using provided Huggingface token
Proceeding with the setup of Large Language Model (LLM): yes
----- Input -----
Using provided CLUSTER URL: api.example.com
Using provided certificate file: /home/user/certs/cert.pem
Using provided key file: /home/user/certs/key.pem
Using provided keycloak client id: api
Using provided Keycloak admin username: api-admin
Using provided Keycloak admin password
cpu_or_gpu is already set to c
CAUTION: Removing the Inference LLM Model will also remove its associated services and resources, which may cause service downtime and potential data loss. This action is irreversible. Are you absolutely certain you want to proceed? (y/n) y
Enter the deployment name of the model you wish to deprovision: tinyllama-1-1b-chat-v1-0-cpu-vllm
....
TASK [Check if "tinyllama-1-1b-chat-v1-0-cpu-vllm-cpu" Model is deployed] ********************************************************************
changed: [master1]
....
TASK [List of Models to be Installed] ********************************************************************************************************
ok: [master1] => {
"model_name_list": ""
}
Thursday 11 December 2025 15:51:30 -0600 (0:00:00.090) 0:00:27.276 *****
TASK [Clean up remote dependencies directory] ************************************************************************************************
changed: [master1]
PLAY RECAP ***********************************************************************************************************************************
master1 : ok=17 changed=5 unreachable=0 failed=0 skipped=144 rescued=0 ignored=0
Thursday 11 December 2025 15:51:31 -0600 (0:00:00.504) 0:00:27.781 *****
===============================================================================
inference-tools : Install Deployment Client ------------------------------------------------------------------------------------------- 3.20s
inference-tools : Ensure Python pip module is installed ------------------------------------------------------------------------------- 2.95s
inference-tools : Ensure jq is installed ---------------------------------------------------------------------------------------------- 2.60s
inference-tools : Install Kubernetes Python SDK --------------------------------------------------------------------------------------- 1.90s
Create/Update Kubernetes Secret for Hugging Face Token -------------------------------------------------------------------------------- 1.52s
kubernetes-precheck : Check if kubectl is Available ----------------------------------------------------------------------------------- 0.92s
Sync dependency files to Deployment Nodes --------------------------------------------------------------------------------------------- 0.89s
Check if "tinyllama-1-1b-chat-v1-0-cpu-vllm-cpu" Model is deployed -------------------------------------------------------------------- 0.84s
kubernetes-precheck : Check Kubernetes API server connectivity ------------------------------------------------------------------------ 0.74s
Ensure Remote Directory Exists -------------------------------------------------------------------------------------------------------- 0.72s
Clean up remote dependencies directory ------------------------------------------------------------------------------------------------ 0.50s
kubernetes-precheck : Fail if Kubernetes API is not reachable ------------------------------------------------------------------------- 0.15s
Set proxy args if proxy is defined ---------------------------------------------------------------------------------------------------- 0.12s
Parse existing models and check if our model exists ----------------------------------------------------------------------------------- 0.11s
Display deployment configuration ------------------------------------------------------------------------------------------------------ 0.11s
Set model_exists to false if API call failed or no data ------------------------------------------------------------------------------- 0.10s
Model registration completed ---------------------------------------------------------------------------------------------------------- 0.10s
Model registration skipped (already exists) ------------------------------------------------------------------------------------------- 0.10s
Model registration completed ---------------------------------------------------------------------------------------------------------- 0.10s
Model registration completed ---------------------------------------------------------------------------------------------------------- 0.09s
Inference LLM Model is removed successfully.
---------------------------------------------------------------------
| LLM Model Being Removed from Intel AI for Enterprise Inference! |
---------------------------------------------------------------------
user@master1:~/Enterprise-Inference/core$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
tinyllama-1-1b-chat-v1-0-cpu-vllm 1/1 1 1 83m
vllm-llama-8b 1/1 1 1 2d
vllm-tei 1/1 1 1 8d
user@master1:~/Enterprise-Inference/core$ kubectl get pods
NAME READY STATUS RESTARTS AGE
keycloak-0 1/1 Running 3 (120m ago) 8d
keycloak-postgresql-0 1/1 Running 2 (120m ago) 8d
tinyllama-1-1b-chat-v1-0-cpu-vllm-c946d8595-d9pg5 1/1 Running 0 85m
vllm-llama-8b-67dbdf6549-hjwj8 1/1 Running 0 119m
vllm-tei-d7645b56c-shcdn 1/1 Running 0 119m
Metadata
Metadata
Assignees
Labels
No labels