Skip to content

Cld2labs/airgap#100

Open
HarikaDev296 wants to merge 28 commits into
opea-project:mainfrom
cld2labs:cld2labs/airgap
Open

Cld2labs/airgap#100
HarikaDev296 wants to merge 28 commits into
opea-project:mainfrom
cld2labs:cld2labs/airgap

Conversation

@HarikaDev296
Copy link
Copy Markdown
Contributor

No description provided.

Harika added 15 commits May 19, 2026 11:32
Enables full EI stack deployment (Kubernetes + LLM serving + GenAI Gateway)
on internet-blocked machines by routing all dependencies through a local
JFrog Artifactory instance.

Changes:
- Add airgap_enabled / jfrog_url / jfrog_username / jfrog_password vars
- Dual-task pattern in all playbooks (internet vs JFrog path)
- setup-env.sh: pip, kubespray, ansible collections, apt from JFrog
- prereq-check.sh: connectivity check against JFrog ping endpoint
- offline.yml: Kubespray binary URLs redirected to JFrog
- containerd mirror config for all 5 registries via JFrog
- Kubespray hosts.toml.j2 patched to not write skip_verify unless true
- inference-tools role: helm, pip, jq installs all JFrog-aware
- nri_cpu_balloons role: helm repo and airgap vars wired up
- JFrog setup script + README for offline bundle preparation
- Air-gap troubleshooting and deployment documentation

Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
- Remove step 3i (Meta-Llama-3.1-8B-Instruct)
- Renumber Llama-3.2-3B-Instruct as step 3i
- Add step 3j for Qwen/Qwen3.5-0.8B
- Add step 3k for Qwen/Qwen3.5-4B

Signed-off-by: Harika <codewith3@gmail.com>
- Replace Llama-3.1-8B with Qwen3.5-0.8B and Qwen3.5-4B
- Update HuggingFace credentials section with model table
- Update disk space requirement note
- Update --hf-token flag description and step-by-step table

Signed-off-by: Harika <codewith3@gmail.com>
- Rename Qwen3.5-0.8B -> Qwen3-0.6B and Qwen3.5-4B -> Qwen3-4B throughout
  (script, README, step headers, HuggingFace repo IDs, JFrog folder names)
- Fix SKIP_STEPS loop in should_run: drop erroneous `:-` default expansion
  that caused an empty-string iteration when no steps were skipped

Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
…install

apt-get install --download-only --reinstall can use apt's in-memory package
state and skip the network fetch entirely for already-installed packages like
python3-pip, so JFrog never caches the .deb. apt-get download always fetches
from the configured sources regardless of install state, reliably triggering
the JFrog remote proxy to cache the package.

Signed-off-by: Harika <codewith3@gmail.com>
grep returns rc=1 when it finds no matches, which Ansible treats as
a task failure. Allow rc=0 (matches found) and rc=1 (no matches) as
both valid; only fail on real errors like helm not being available.

Signed-off-by: Harika <codewith3@gmail.com>
….yml

all.yml is copied for every deployment (airgap and non-airgap). Having
containerd_registries_mirrors with JFROG_HOST placeholders in all.yml
causes non-airgap deployments to fail — containerd tries to resolve
the literal string JFROG_HOST as a DNS name and image pulls fail.

offline.yml is only copied when airgap_enabled=yes, and setup-env.sh
substitutes JFROG_HOST with the real JFrog IP before Kubespray runs.
Moving mirrors, calico_version, and coredns_version there ensures:
- airgap=no: no registry mirrors configured, internet pulls work
- airgap=yes: mirrors point to JFrog with real IP substituted

Signed-off-by: Harika <codewith3@gmail.com>
If the system still has internet connectivity while airgap mode is
enabled, Docker images not cached in JFrog may silently fall through
to the internet, breaking the airgap guarantee. Detect this condition
early and exit with a clear message directing the user to disable
internet access before proceeding.

Signed-off-by: Harika <codewith3@gmail.com>
…ation

- Clarify VM2 network requirement: internet must be disabled before running
  EI with airgap_enabled=yes; deployment now exits with an error if not
- Update step 3f description to reflect apt-get download fix for reliable
  python3-pip caching in JFrog
- Add troubleshooting entry for the internet connectivity exit with
  instructions on how to disable internet access on VM2

Signed-off-by: Harika <codewith3@gmail.com>
Clearly document which models have been tested and validated end-to-end
in airgap mode: Llama-3.2-3B-Instruct, Qwen3-0.6B, Qwen3-1.7B, and
Qwen3-4B. Includes a note that other models are not supported without
manual JFrog uploads and have not been validated.

Signed-off-by: Harika <codewith3@gmail.com>
JFrog's bundled installer expects db5.3-util to be present on the system
but the package was missing from our prerequisites list, causing install.sh
to fail when trying to install it from the bundled .deb.

Signed-off-by: Harika <codewith3@gmail.com>
…sue in 7.146.10

Signed-off-by: Harika <codewith3@gmail.com>
Signed-off-by: Harika <codewith3@gmail.com>
@HarikaDev296 HarikaDev296 force-pushed the cld2labs/airgap branch 2 times, most recently from 6479a7d to fa7d211 Compare May 19, 2026 16:39
@alexsin368 alexsin368 requested review from alexsin368 and psurabh May 20, 2026 00:05
Comment thread third_party/Dell/air-gap/jfrog-setup/uninstall-jfrog.sh
@alexsin368
Copy link
Copy Markdown
Collaborator

There appears to be some files that did not pick up the latest changes in the opea/main branch. I recommend rebasing this branch so the PR won't change files that should not be changed.

Comment thread third_party/Dell/air-gap/EI/single-node/air-gap.md
Co-authored-by: alexsin368 <109180236+alexsin368@users.noreply.github.com>
Comment thread third_party/Dell/air-gap/EI/single-node/air-gap.md
Comment thread third_party/Dell/air-gap/EI/single-node/air-gap.md
Comment thread third_party/Dell/air-gap/EI/single-node/air-gap.md Outdated
Harika added 8 commits May 28, 2026 10:30
Documents the bug where entering a deployment name with the -cpu suffix
causes the remove-model script to silently do nothing and still report success.
…ments

- Set HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1 for all CPU model
  deployments in airgap mode to prevent vLLM from contacting HuggingFace
- Fix vllm configmap template to merge per-model and default configMapValues
  so HF_HUB_OFFLINE correctly reaches containers with per-model xeon configs
- Add JFrog download tasks for Llama-3.2-3B and Qwen3-1.7B validated models;
  use local hostPath for LLM_MODEL_ID so vLLM loads weights without HF hub
- Guard helm repo update calls across ingress, keycloak, genai-gateway, NRI,
  observability, ceph, istio, and bastion playbooks to prevent internet
  connection attempts in airgap mode
- Guard internet binary downloads in setup-bastion.yml
uninstall-model.sh did not pass airgap_enabled, jfrog_url, jfrog_username,
or jfrog_password to the Ansible playbook. This caused the inference-tools
role to run the internet pip install task instead of the airgap JFrog path,
failing with 404 errors on the JFrog debian repo.
In airgap mode, derive JFrog folder from model ID (strip org prefix),
download weights to /opt/ei-models/<model_id>, and use local path for
LLM_MODEL_ID so vLLM loads from disk without contacting HuggingFace.

Convention: JFrog folder = model name without org prefix (e.g.
Qwen/Qwen3-4B -> Qwen3-4B), matching the naming used in jfrog-setup.sh.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants