Skip to content

Models deployed from Hugging Face do not undeploy #40

@toddtomashek-c2l

Description

@toddtomashek-c2l

Models deployed from Hugging Face using the inference-stack-deploy.sh script cannot be removed using the "Remove Model using deployment name" menu option. The removal process runs and reports success, but the model pod remains running and the deployment installed. See the full process below using the TinyLlama model as an example, starting from deploying the model:

user@master1:~/Enterprise-Inference/core$ ./inference-stack-deploy.sh
----------------------------------------------------------
|  Intel AI for Enterprise Inference                      |
|---------------------------------------------------------|
| 1) Provision Enterprise Inference Cluster               |
| 2) Decommission Existing Cluster                        |
| 3) Update Deployed Inference Cluster                    |
| 4) Brownfield Deployment of Enterprise Inference        |
|---------------------------------------------------------|
Please choose an option (1, 2, 3 or 4):
> 3
-------------------------------------------------
|             Update Existing Cluster            |
|------------------------------------------------|
| 1) Manage Worker Nodes                         |
| 2) Manage LLM Models                           |
|------------------------------------------------|
Please choose an option (1 or 2):
> 2
-------------------------------------------------
| Manage LLM Models
|------------------------------------------------|
| 1) Deploy Model                                |
| 2) Undeploy Model                              |
| 3) List Installed Models                       |
| 4) Deploy Model from Hugging Face              |
| 5) Remove Model using deployment name          |
|------------------------------------------------|
Please choose an option (1, 2, 3, or 4):
> 4
-------------------------------------------------
|         Deploy Model from Huggingface          |
|------------------------------------------------|
Configuration file found, setting vars!
---------------------------------------
Metadata configuration file found, setting vars!
---------------------------------------
Proceeding with the setup of Fresh Kubernetes cluster: no
Proceeding with the setup of Habana AI Operator: no
Proceeding with the setup of Ingress Controller: yes
Proceeding with the setup of Keycloak : yes
Proceeding with the setup of Apisix: yes
Proceeding with the setup of GenAI Gateway: no
Proceeding with the setup of Observability: no
Proceeding with the setup of Ceph cluster: no
Proceeding with Ceph cluster uninstallation: no
Proceeding with the setup of Istio: no
NRI CPU Balloon Policy automatically enabled for CPU deployment
Using provided Huggingface token
Proceeding with the setup of Large Language Model (LLM): yes
----- Input -----
Using provided CLUSTER URL: api.example.com
Using provided certificate file: /home/user/certs/cert.pem
Using provided key file: /home/user/certs/key.pem
Using provided keycloak client id: api
Using provided Keycloak admin username: api-admin
Using provided Keycloak admin password
cpu_or_gpu is already set to c
Some required arguments are missing. Prompting for input...
Proceeding with the setup of Fresh Kubernetes cluster: no
Proceeding with the setup of Habana AI Operator: no
Proceeding with the setup of Ingress Controller: yes
Proceeding with the setup of Keycloak : yes
Proceeding with the setup of Apisix: yes
Proceeding with the setup of GenAI Gateway: no
Proceeding with the setup of Observability: no
Proceeding with the setup of Ceph cluster: no
Proceeding with Ceph cluster uninstallation: no
Proceeding with the setup of Istio: no
Proceeding with the setup of NRI CPU Balloon Policy: yes
Using provided Huggingface token
Proceeding with the setup of Large Language Model (LLM): yes
----- Input -----
Using provided CLUSTER URL: api.example.com
Using provided certificate file: /home/user/certs/cert.pem
Using provided key file: /home/user/certs/key.pem
Using provided keycloak client id: api
Using provided Keycloak admin username: api-admin
Using provided Keycloak admin password
cpu_or_gpu is already set to c
Enter the Huggingface Model ID: TinyLlama/TinyLlama-1.1B-Chat-v1.0
NOTICE: The model deployment name will be used as the release identifier for deployment. It must be unique, meaningful, and follow Kubernetes naming conventions — lowercase letters, numbers, and hyphens only. Capital letters or special characters are not allowed.
Enter Deployment Name for the Model: tinyllama-1-1b-chat-v1-0
NOTICE: Ensure the Tensor Parallel size value corresponds to the number of available Gaudi cards. Providing an incorrect value may result in the model being in a not ready state.
NOTICE: You are about to deploy a model directly from Hugging Face, which has not been pre-validated by our team. Do you wish to continue? (y/n) y

....

Inference LLM Model is deployed successfully.
-------------------------------------------------------------------------------------
|  AI LLM Model Deployment Complete!                                                |
|  The model is transitioning to a state ready for Inference.                       |
|  This may take some time depending on system resources and other factors.         |
|  Please standby...                                                                |
--------------------------------------------------------------------------------------

Accessing Deployed Models for Inference
https://github.com/opea-project/Enterprise-Inference/blob/main/docs/accessing-deployed-models.md

Please refer to this comprehensive guide for detailed instructions.

user@master1:~/Enterprise-Inference/core$ kubectl get pods -A
NAMESPACE            NAME                                                     READY   STATUS    RESTARTS       AGE
auth-apisix          auth-apisix-77784f9df6-8qn6z                             1/1     Running   2 (114m ago)   8d
auth-apisix          auth-apisix-etcd-0                                       1/1     Running   2 (114m ago)   8d
auth-apisix          auth-apisix-ingress-controller-57889db898-99zhp          1/1     Running   2 (114m ago)   8d
default              keycloak-0                                               1/1     Running   3 (114m ago)   8d
default              keycloak-postgresql-0                                    1/1     Running   2 (114m ago)   8d
default              tinyllama-1-1b-chat-v1-0-cpu-vllm-c946d8595-d9pg5        1/1     Running   0              80m
default              vllm-llama-8b-67dbdf6549-hjwj8                           1/1     Running   0              114m
default              vllm-tei-d7645b56c-shcdn                                 1/1     Running   0              114m
habana-ai-operator   habana-ai-device-plugin-ds-jhqsf                         1/1     Running   2 (114m ago)   8d
habana-ai-operator   habana-ai-driver-ubuntu-22-04-ds-bqpxh                   1/1     Running   0              22m
habana-ai-operator   habana-ai-feature-discovery-ds-44zd4                     1/1     Running   2 (114m ago)   8d
habana-ai-operator   habana-ai-metric-exporter-ds-kd57m                       1/1     Running   2 (114m ago)   8d
habana-ai-operator   habana-ai-operator-controller-manager-7845c6874c-c6c4g   2/2     Running   4 (114m ago)   8d
habana-ai-operator   habana-ai-runtime-ds-7qg5t                               1/1     Running   2 (114m ago)   8d
ingress-nginx        ingress-nginx-controller-77674b4c66-dp9vz                1/1     Running   2 (114m ago)   8d
kube-system          calico-kube-controllers-5db5978889-bgx4h                 1/1     Running   2 (114m ago)   8d
kube-system          calico-node-dwx86                                        1/1     Running   2 (114m ago)   8d
kube-system          coredns-d665d669-fxvhp                                   1/1     Running   2 (114m ago)   8d
kube-system          coredns-d665d669-wn6lw                                   0/1     Pending   0              8d
kube-system          dns-autoscaler-597dccb9b9-8w7vs                          1/1     Running   2 (114m ago)   8d
kube-system          kube-apiserver-master1                                   1/1     Running   4 (114m ago)   8d
kube-system          kube-controller-manager-master1                          1/1     Running   4 (114m ago)   8d
kube-system          kube-proxy-42mn9                                         1/1     Running   2 (114m ago)   8d
kube-system          kube-scheduler-master1                                   1/1     Running   3 (114m ago)   8d
kube-system          kubernetes-dashboard-7f4d4b895-455mw                     1/1     Running   2 (114m ago)   8d
kube-system          kubernetes-metrics-scraper-6d4c5d99f9-nvld5              1/1     Running   2 (114m ago)   8d
kube-system          nodelocaldns-tkjmc                                       1/1     Running   4 (114m ago)   8d
kube-system          registry-p98c4                                           1/1     Running   2 (114m ago)   8d
local-path-storage   local-path-provisioner-68b545849f-f2jsp                  1/1     Running   2 (114m ago)   8d

user@master1:~/Enterprise-Inference/core$ kubectl get deploy
NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
tinyllama-1-1b-chat-v1-0-cpu-vllm   1/1     1            1           81m
vllm-llama-8b                       1/1     1            1           2d
vllm-tei                            1/1     1            1           8d

user@master1:~/Enterprise-Inference/core$ ./inference-stack-deploy.sh
----------------------------------------------------------
|  Intel AI for Enterprise Inference                      |
|---------------------------------------------------------|
| 1) Provision Enterprise Inference Cluster               |
| 2) Decommission Existing Cluster                        |
| 3) Update Deployed Inference Cluster                    |
| 4) Brownfield Deployment of Enterprise Inference        |
|---------------------------------------------------------|
Please choose an option (1, 2, 3 or 4):
> 3
-------------------------------------------------
|             Update Existing Cluster            |
|------------------------------------------------|
| 1) Manage Worker Nodes                         |
| 2) Manage LLM Models                           |
|------------------------------------------------|
Please choose an option (1 or 2):
> 2
-------------------------------------------------
| Manage LLM Models
|------------------------------------------------|
| 1) Deploy Model                                |
| 2) Undeploy Model                              |
| 3) List Installed Models                       |
| 4) Deploy Model from Hugging Face              |
| 5) Remove Model using deployment name          |
|------------------------------------------------|
Please choose an option (1, 2, 3, or 4):
> 5
-------------------------------------------------
|         Removing Model using Deployment name   |
|------------------------------------------------|
Configuration file found, setting vars!
---------------------------------------
Metadata configuration file found, setting vars!
---------------------------------------
Proceeding with the setup of Fresh Kubernetes cluster: no
Proceeding with the setup of Habana AI Operator: no
Proceeding with the setup of Ingress Controller: yes
Proceeding with the setup of Keycloak : yes
Proceeding with the setup of Apisix: yes
Proceeding with the setup of GenAI Gateway: no
Proceeding with the setup of Observability: no
Proceeding with the setup of Ceph cluster: no
Proceeding with Ceph cluster uninstallation: no
Proceeding with the setup of Istio: no
NRI CPU Balloon Policy automatically enabled for CPU deployment
Using provided Huggingface token
Proceeding with the setup of Large Language Model (LLM): yes
----- Input -----
Using provided CLUSTER URL: api.example.com
Using provided certificate file: /home/user/certs/cert.pem
Using provided key file: /home/user/certs/key.pem
Using provided keycloak client id: api
Using provided Keycloak admin username: api-admin
Using provided Keycloak admin password
cpu_or_gpu is already set to c
Some required arguments are missing. Prompting for input...
Proceeding with the setup of Fresh Kubernetes cluster: no
Proceeding with the setup of Habana AI Operator: no
Proceeding with the setup of Ingress Controller: yes
Proceeding with the setup of Keycloak : yes
Proceeding with the setup of Apisix: yes
Proceeding with the setup of GenAI Gateway: no
Proceeding with the setup of Observability: no
Proceeding with the setup of Ceph cluster: no
Proceeding with Ceph cluster uninstallation: no
Proceeding with the setup of Istio: no
Proceeding with the setup of NRI CPU Balloon Policy: yes
Using provided Huggingface token
Proceeding with the setup of Large Language Model (LLM): yes
----- Input -----
Using provided CLUSTER URL: api.example.com
Using provided certificate file: /home/user/certs/cert.pem
Using provided key file: /home/user/certs/key.pem
Using provided keycloak client id: api
Using provided Keycloak admin username: api-admin
Using provided Keycloak admin password
cpu_or_gpu is already set to c
CAUTION: Removing the Inference LLM Model will also remove its associated services and resources, which may cause service downtime and potential data loss. This action is irreversible. Are you absolutely certain you want to proceed? (y/n) y

Enter the deployment name of the model you wish to deprovision: tinyllama-1-1b-chat-v1-0-cpu-vllm

....

TASK [Check if "tinyllama-1-1b-chat-v1-0-cpu-vllm-cpu" Model is deployed] ********************************************************************
changed: [master1]

....

TASK [List of Models to be Installed] ********************************************************************************************************
ok: [master1] => {
    "model_name_list": ""
}
Thursday 11 December 2025  15:51:30 -0600 (0:00:00.090)       0:00:27.276 *****

TASK [Clean up remote dependencies directory] ************************************************************************************************
changed: [master1]

PLAY RECAP ***********************************************************************************************************************************
master1                    : ok=17   changed=5    unreachable=0    failed=0    skipped=144  rescued=0    ignored=0

Thursday 11 December 2025  15:51:31 -0600 (0:00:00.504)       0:00:27.781 *****
===============================================================================
inference-tools : Install Deployment Client ------------------------------------------------------------------------------------------- 3.20s
inference-tools : Ensure Python pip module is installed ------------------------------------------------------------------------------- 2.95s
inference-tools : Ensure jq is installed ---------------------------------------------------------------------------------------------- 2.60s
inference-tools : Install Kubernetes Python SDK --------------------------------------------------------------------------------------- 1.90s
Create/Update Kubernetes Secret for Hugging Face Token -------------------------------------------------------------------------------- 1.52s
kubernetes-precheck : Check if kubectl is Available ----------------------------------------------------------------------------------- 0.92s
Sync dependency files to Deployment Nodes --------------------------------------------------------------------------------------------- 0.89s
Check if "tinyllama-1-1b-chat-v1-0-cpu-vllm-cpu" Model is deployed -------------------------------------------------------------------- 0.84s
kubernetes-precheck : Check Kubernetes API server connectivity ------------------------------------------------------------------------ 0.74s
Ensure Remote Directory Exists -------------------------------------------------------------------------------------------------------- 0.72s
Clean up remote dependencies directory ------------------------------------------------------------------------------------------------ 0.50s
kubernetes-precheck : Fail if Kubernetes API is not reachable ------------------------------------------------------------------------- 0.15s
Set proxy args if proxy is defined ---------------------------------------------------------------------------------------------------- 0.12s
Parse existing models and check if our model exists ----------------------------------------------------------------------------------- 0.11s
Display deployment configuration ------------------------------------------------------------------------------------------------------ 0.11s
Set model_exists to false if API call failed or no data ------------------------------------------------------------------------------- 0.10s
Model registration completed ---------------------------------------------------------------------------------------------------------- 0.10s
Model registration skipped (already exists) ------------------------------------------------------------------------------------------- 0.10s
Model registration completed ---------------------------------------------------------------------------------------------------------- 0.10s
Model registration completed ---------------------------------------------------------------------------------------------------------- 0.09s
Inference LLM Model is removed successfully.
---------------------------------------------------------------------
|     LLM Model Being Removed from Intel AI for Enterprise Inference! |
---------------------------------------------------------------------

user@master1:~/Enterprise-Inference/core$ kubectl get deploy
NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
tinyllama-1-1b-chat-v1-0-cpu-vllm   1/1     1            1           83m
vllm-llama-8b                       1/1     1            1           2d
vllm-tei                            1/1     1            1           8d

user@master1:~/Enterprise-Inference/core$ kubectl get pods
NAME                                                READY   STATUS    RESTARTS       AGE
keycloak-0                                          1/1     Running   3 (120m ago)   8d
keycloak-postgresql-0                               1/1     Running   2 (120m ago)   8d
tinyllama-1-1b-chat-v1-0-cpu-vllm-c946d8595-d9pg5   1/1     Running   0              85m
vllm-llama-8b-67dbdf6549-hjwj8                      1/1     Running   0              119m
vllm-tei-d7645b56c-shcdn                            1/1     Running   0              119m

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions