diff --git a/third_party/Dell/model-deployment/Mistral-7B-v0.3/deployment.md b/third_party/Dell/model-deployment/Mistral-7B-v0.3/deployment.md new file mode 100644 index 00000000..8dbb882f --- /dev/null +++ b/third_party/Dell/model-deployment/Mistral-7B-v0.3/deployment.md @@ -0,0 +1,101 @@ +## Step 1: Prerequisites to Deploy Mistral-7B-v0.3 Model on Xeon with Keycloak + +Ensure the Enterprise Inference stack with Keycloak is already deployed before proceeding. + +Edit `core/scripts/generate-token.sh` and set your values before sourcing it: + +| Variable | Description | +| ------------------------- | ------------------------------------------------------------------------ | +| `BASE_URL` | Hostname of your cluster (e.g. `api.example.com`), without `https://` | +| `KEYCLOAK_ADMIN_USERNAME` | Keycloak admin username | +| `KEYCLOAK_PASSWORD` | Keycloak admin password | +| `KEYCLOAK_CLIENT_ID` | Keycloak client ID configured during EI deployment | + +Then run: + +```bash +export HUGGING_FACE_HUB_TOKEN="your_token_here" + +cd ~/Enterprise-Inference +source core/scripts/generate-token.sh +``` + +This exports: `BASE_URL`, `KEYCLOAK_CLIENT_ID`, `KEYCLOAK_CLIENT_SECRET`, and `TOKEN`. + +## Step 2: Deploy Mistral-7B-v0.3 Model + +```bash +helm install mistral-7b-v3 ./core/helm-charts/vllm \ + --values ./core/helm-charts/vllm/xeon-values.yaml \ + --set LLM_MODEL_ID="mistralai/Mistral-7B-v0.3" \ + --set global.HUGGINGFACEHUB_API_TOKEN="$HUGGING_FACE_HUB_TOKEN" \ + --set ingress.enabled=true \ + --set ingress.secretname="${BASE_URL}" \ + --set ingress.host="${BASE_URL}" \ + --set oidc.client_id="$KEYCLOAK_CLIENT_ID" \ + --set oidc.client_secret="$KEYCLOAK_CLIENT_SECRET" \ + --set apisix.enabled=true \ + --set tensor_parallel_size="1" \ + --set pipeline_parallel_size="1" +``` + +## Step 3: Verify the Deployment + +```bash +kubectl get pods +kubectl get apisixroutes +``` + +Expected Output: + +``` +NAME READY STATUS RESTARTS +keycloak-0 1/1 Running 0 +keycloak-postgresql-0 1/1 Running 0 +mistral-7b-v3-- 1/1 Running 0 +``` + +> Note: The pod name suffix `-` is auto-generated by Kubernetes and will differ on each deployment. Ensure all pods show `1/1 Running`. + +``` +NAME HOSTS +mistral-7b-v3-apisixroute api.example.com +``` + +## Step 4: Test the Deployed Model + +```bash +curl -k https://${BASE_URL}/Mistral-7B-v0.3-vllmcpu/v1/completions \ + -X POST \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $TOKEN" \ + -d '{ + "model": "mistralai/Mistral-7B-v0.3", + "prompt": "What is Deep Learning?", + "max_tokens": 25, + "temperature": 0 + }' +``` + +If successful, the model will return a completion response. + +## To undeploy the model + +```bash +helm uninstall mistral-7b-v3 +``` + +## Parameters + +| Parameter | Description | +| ------------------------------------------------ | ------------------------------------------------------------------------------------------------- | +| `--set LLM_MODEL_ID="mistralai/Mistral-7B-v0.3"` | Defines the target model from **Hugging Face** to deploy. | +| `--set global.HUGGINGFACEHUB_API_TOKEN="..."` | Authenticates access to gated or private Hugging Face models. Replace with your own secure token. | +| `--set ingress.enabled=true` | Enables Kubernetes **Ingress** to expose the model service externally. | +| `--set ingress.host="${BASE_URL}"` | Public hostname or FQDN for the inference endpoint (maps to your Ingress controller IP). | +| `--set ingress.secretname="${BASE_URL}"` | Kubernetes **TLS Secret** used for HTTPS termination at the ingress layer. | +| `--set oidc.client_id="..."` | Keycloak OIDC client ID used for token-based authentication. | +| `--set oidc.client_secret="..."` | Keycloak OIDC client secret corresponding to the client ID. | +| `--set apisix.enabled=true` | Enables **APISIX** as the API gateway for routing and authentication. | +| `--set tensor_parallel_size="1"` | Number of tensor parallel workers. Set to the number of available CPUs/GPUs per node. | +| `--set pipeline_parallel_size="1"` | Number of pipeline parallel stages. Typically `1` for single-node deployments. | diff --git a/third_party/Dell/model-deployment/Mistral-7B-v0.3/model-card.md b/third_party/Dell/model-deployment/Mistral-7B-v0.3/model-card.md new file mode 100644 index 00000000..8b03ce8b --- /dev/null +++ b/third_party/Dell/model-deployment/Mistral-7B-v0.3/model-card.md @@ -0,0 +1,76 @@ +## Mistral 7B v0.3 + +This model uses **Mistral-7B v0.3**, a next-generation 7-billion-parameter transformer language model developed by **Mistral AI**. It represents a compact, efficient, and high-performance LLM architecture optimized for general-purpose text generation, research, and downstream fine-tuning. Compared to earlier releases, the v0.3 iteration integrates tokenizer improvements, extended context length support, and architectural refinements for stronger performance and interoperability in modern LLM ecosystems. + +For full details including model specifications, licensing, intended use, and technical documentation, please visit the official Hugging Face page: **Official Hugging Face Page** + +https://huggingface.co/mistralai/Mistral-7B-v0.3 + +--- + +### Model Attribution + +**Developer:** Mistral AI + +**Purpose:** Foundation model for general NLP tasks, downstream fine-tuning, and integration into custom pipelines + +**Sizes / Variants:** +7B (≈ 7 billion parameters) + +**Modalities:** +Text → Text (autoregressive language modeling) + +**Parameter Size:** +~7 billion + +**Max Context:** +Extended context window supported (exact length may depend on inference backend and configuration) + +**License:** +Apache 2.0 (open-weight release) + +**Minimum Required PCIe Cards:** +1–2 (varies by precision, quantization, and inference framework) + +--- + +### Usage Notice + +By using this model, you agree that: + +- Inputs and outputs are processed via the Mistral-7B v0.3 model and you accept its licensing terms under Apache 2.0. +- Model outputs must be reviewed for accuracy, suitability, and safety before use in commercial or production contexts. +- This base model does not include alignment or instruction-fine-tuning, and therefore may produce literal, unfiltered, or undesired content without safety conditioning. +- You remain responsible for monitoring, filtering, and enforcing compliance, especially in sensitive, regulated, or user-facing deployments. + +--- + +### Intended Applications + +- Research in transformer and LLM architectures +- Pre-training or continued training for domain-specific LLMs +- Fine-tuning for instruction following, chat roles, code, or domain tasks +- General-purpose text generation and language modeling +- Embedding into autonomous or semi-autonomous agents with external alignment layers +- Experimental or academic benchmarking on open-weight LLMs + +--- + +### Limitations + +- As a base model, it lacks instruction tuning and safety alignment, making outputs potentially unstructured or unsafe without further processing. +- May generate hallucinated, biased, or factually incorrect content; human validation is recommended. +- Safety-critical and regulated use cases require external safeguards, filtering, or moderation systems. +- Operational performance varies with context length, quantization, and hardware backend; optimization may be required for real-time workloads. + +--- + +### References + +- Official Model Card on Hugging Face: https://huggingface.co/mistralai/Mistral-7B-v0.3 + +- Open model documentation by Mistral AI. + https://docs.mistral.ai/getting-started/models + +- “Mistral 7B” announcement blog post. + https://mistral.ai/news/announcing-mistral-7b diff --git a/third_party/Dell/model-deployment/README.md b/third_party/Dell/model-deployment/README.md deleted file mode 100644 index 43d98118..00000000 --- a/third_party/Dell/model-deployment/README.md +++ /dev/null @@ -1 +0,0 @@ -# PLACEHOLDER \ No newline at end of file