Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions third_party/Dell/model-deployment/Mistral-7B-v0.3/deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
## Step 1: Prerequisites to Deploy Mistral-7B-v0.3 Model on Xeon with Keycloak

Ensure the Enterprise Inference stack with Keycloak is already deployed before proceeding.

Edit `core/scripts/generate-token.sh` and set your values before sourcing it:

| Variable | Description |
| ------------------------- | ------------------------------------------------------------------------ |
| `BASE_URL` | Hostname of your cluster (e.g. `api.example.com`), without `https://` |
| `KEYCLOAK_ADMIN_USERNAME` | Keycloak admin username |
| `KEYCLOAK_PASSWORD` | Keycloak admin password |
| `KEYCLOAK_CLIENT_ID` | Keycloak client ID configured during EI deployment |

Then run:

```bash
export HUGGING_FACE_HUB_TOKEN="your_token_here"

cd ~/Enterprise-Inference
source core/scripts/generate-token.sh
```

This exports: `BASE_URL`, `KEYCLOAK_CLIENT_ID`, `KEYCLOAK_CLIENT_SECRET`, and `TOKEN`.

## Step 2: Deploy Mistral-7B-v0.3 Model

```bash
helm install mistral-7b-v3 ./core/helm-charts/vllm \
--values ./core/helm-charts/vllm/xeon-values.yaml \
--set LLM_MODEL_ID="mistralai/Mistral-7B-v0.3" \
--set global.HUGGINGFACEHUB_API_TOKEN="$HUGGING_FACE_HUB_TOKEN" \
--set ingress.enabled=true \
--set ingress.secretname="${BASE_URL}" \
--set ingress.host="${BASE_URL}" \
--set oidc.client_id="$KEYCLOAK_CLIENT_ID" \
--set oidc.client_secret="$KEYCLOAK_CLIENT_SECRET" \
--set apisix.enabled=true \
--set tensor_parallel_size="1" \
--set pipeline_parallel_size="1"
```

## Step 3: Verify the Deployment

```bash
kubectl get pods
kubectl get apisixroutes
```

Expected Output:

```
NAME READY STATUS RESTARTS
keycloak-0 1/1 Running 0
keycloak-postgresql-0 1/1 Running 0
mistral-7b-v3-<hash>-<hash> 1/1 Running 0
```

> Note: The pod name suffix `<hash>-<hash>` is auto-generated by Kubernetes and will differ on each deployment. Ensure all pods show `1/1 Running`.

```
NAME HOSTS
mistral-7b-v3-apisixroute api.example.com
```

## Step 4: Test the Deployed Model

```bash
curl -k https://${BASE_URL}/Mistral-7B-v0.3-vllmcpu/v1/completions \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "mistralai/Mistral-7B-v0.3",
"prompt": "What is Deep Learning?",
"max_tokens": 25,
"temperature": 0
}'
```

If successful, the model will return a completion response.

## To undeploy the model

```bash
helm uninstall mistral-7b-v3
```

## Parameters

| Parameter | Description |
| ------------------------------------------------ | ------------------------------------------------------------------------------------------------- |
| `--set LLM_MODEL_ID="mistralai/Mistral-7B-v0.3"` | Defines the target model from **Hugging Face** to deploy. |
| `--set global.HUGGINGFACEHUB_API_TOKEN="..."` | Authenticates access to gated or private Hugging Face models. Replace with your own secure token. |
| `--set ingress.enabled=true` | Enables Kubernetes **Ingress** to expose the model service externally. |
| `--set ingress.host="${BASE_URL}"` | Public hostname or FQDN for the inference endpoint (maps to your Ingress controller IP). |
| `--set ingress.secretname="${BASE_URL}"` | Kubernetes **TLS Secret** used for HTTPS termination at the ingress layer. |
| `--set oidc.client_id="..."` | Keycloak OIDC client ID used for token-based authentication. |
| `--set oidc.client_secret="..."` | Keycloak OIDC client secret corresponding to the client ID. |
| `--set apisix.enabled=true` | Enables **APISIX** as the API gateway for routing and authentication. |
| `--set tensor_parallel_size="1"` | Number of tensor parallel workers. Set to the number of available CPUs/GPUs per node. |
| `--set pipeline_parallel_size="1"` | Number of pipeline parallel stages. Typically `1` for single-node deployments. |
76 changes: 76 additions & 0 deletions third_party/Dell/model-deployment/Mistral-7B-v0.3/model-card.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
## Mistral 7B v0.3

This model uses **Mistral-7B v0.3**, a next-generation 7-billion-parameter transformer language model developed by **Mistral AI**. It represents a compact, efficient, and high-performance LLM architecture optimized for general-purpose text generation, research, and downstream fine-tuning. Compared to earlier releases, the v0.3 iteration integrates tokenizer improvements, extended context length support, and architectural refinements for stronger performance and interoperability in modern LLM ecosystems.

For full details including model specifications, licensing, intended use, and technical documentation, please visit the official Hugging Face page: **Official Hugging Face Page**

https://huggingface.co/mistralai/Mistral-7B-v0.3

---

### Model Attribution

**Developer:** Mistral AI

**Purpose:** Foundation model for general NLP tasks, downstream fine-tuning, and integration into custom pipelines

**Sizes / Variants:**
7B (≈ 7 billion parameters)

**Modalities:**
Text → Text (autoregressive language modeling)

**Parameter Size:**
~7 billion

**Max Context:**
Extended context window supported (exact length may depend on inference backend and configuration)

**License:**
Apache 2.0 (open-weight release)

**Minimum Required PCIe Cards:**
1–2 (varies by precision, quantization, and inference framework)

---

### Usage Notice

By using this model, you agree that:

- Inputs and outputs are processed via the Mistral-7B v0.3 model and you accept its licensing terms under Apache 2.0.
- Model outputs must be reviewed for accuracy, suitability, and safety before use in commercial or production contexts.
- This base model does not include alignment or instruction-fine-tuning, and therefore may produce literal, unfiltered, or undesired content without safety conditioning.
- You remain responsible for monitoring, filtering, and enforcing compliance, especially in sensitive, regulated, or user-facing deployments.

---

### Intended Applications

- Research in transformer and LLM architectures
- Pre-training or continued training for domain-specific LLMs
- Fine-tuning for instruction following, chat roles, code, or domain tasks
- General-purpose text generation and language modeling
- Embedding into autonomous or semi-autonomous agents with external alignment layers
- Experimental or academic benchmarking on open-weight LLMs

---

### Limitations

- As a base model, it lacks instruction tuning and safety alignment, making outputs potentially unstructured or unsafe without further processing.
- May generate hallucinated, biased, or factually incorrect content; human validation is recommended.
- Safety-critical and regulated use cases require external safeguards, filtering, or moderation systems.
- Operational performance varies with context length, quantization, and hardware backend; optimization may be required for real-time workloads.

---

### References

- Official Model Card on Hugging Face: https://huggingface.co/mistralai/Mistral-7B-v0.3

- Open model documentation by Mistral AI.
https://docs.mistral.ai/getting-started/models

- “Mistral 7B” announcement blog post.
https://mistral.ai/news/announcing-mistral-7b
1 change: 0 additions & 1 deletion third_party/Dell/model-deployment/README.md

This file was deleted.