Skip to content

lab10: Istio sidecar injection, Loki observability addon, and deploy/cleanup refactor#8

Open
talorlik wants to merge 5 commits into
nirgeier:masterfrom
talorlik:master
Open

lab10: Istio sidecar injection, Loki observability addon, and deploy/cleanup refactor#8
talorlik wants to merge 5 commits into
nirgeier:masterfrom
talorlik:master

Conversation

@talorlik
Copy link
Copy Markdown

@talorlik talorlik commented Feb 25, 2026

Summary by CodeRabbit

  • New Features

    • Added Loki to the observability stack and exposed it via the Istio observability gateway; routing for all addons consolidated through gateway VirtualServices.
  • Bug Fixes

    • Deployment and verification are more tolerant (warnings instead of hard failures); readiness checks now include Loki.
    • Port-forwarding skips conflicting local ports to avoid collisions.
  • Chores

    • Updated docs and access instructions for gateway-centric workflow and Loki.
    • Hardened addon deployments (sidecar injection, security contexts) and improved cleanup to remove addon PVCs and cluster-wide resources robustly.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Adds Loki to the observability stack, enables Istio sidecar injection for observability components, replaces nginx Ingress with Istio Gateway VirtualServices, and refactors scripts (LAB_DIR sourcing, improved cleanup, port-forward guards, tolerant verification) and manifests (security contexts, PVC cleanup).

Changes

Cohort / File(s) Summary
Top-level orchestration
Labs/10-Istio/demo.sh, Labs/10-Istio/scripts/common.sh
Introduce LAB_DIR sourcing; add colored logging helpers and prerequisites checks; centralize lab-path resolution and improve access/port-forward logic.
Istio installer & addons
Labs/10-Istio/scripts/01-install-istio.sh, Labs/10-Istio/scripts/02-install-addons.sh, Labs/10-Istio/scripts/05-verify.sh
Pre-install cleanup of stray Istio CRDs/webhooks/roles; add Loki to addon install sequence; apply observability VirtualServices; make verification tolerant (
Bookinfo & traffic gen
Labs/10-Istio/scripts/03-deploy-bookinfo.sh, Labs/10-Istio/scripts/04-traffic-generator.sh, Labs/10-Istio/manifests/traffic-generator.yaml
Make Bookinfo connectivity check non-fatal and conditional on productpage pod; rely on manifest-labeled traffic-gen namespace (remove explicit creation); add istio-injection label.
Demo helpers & demos
Labs/10-Istio/demo/01-basic-setup/demo.sh, Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh
Add helpers: is_port_listening, crd_exists, cleanup_port_forwards, cleanup_temp_dirs, improved start_port_forward and open_url; add port-forward guards and robust cleanup/traps.
Observability manifests (sidecar & security hardening)
Labs/10-Istio/manifests/addons/grafana.yaml, .../jaeger.yaml, .../kiali.yaml, .../prometheus.yaml
Enable Istio sidecar injection (sidecar.istio.io/inject: "true") and add pod/container securityContext hardening (fsGroup, runAsUser/group, readOnlyRootFilesystem, dropped capabilities, seccomp). Prometheus moved injection to label.
New Loki addon
Labs/10-Istio/manifests/addons/loki.yaml
Add full Loki manifest: ServiceAccount, ConfigMaps (config/runtime), ClusterRole/Binding, Services (memberlist/headless), StatefulSet with sidecar, 10Gi PVC template, volumes, security hardening, and affinity rules.
Routing / ingress changes
Labs/10-Istio/manifests/observability-routes.yaml, Labs/10-Istio/manifests/ingress.yaml, Labs/10-Istio/manifests/bookinfo-gateway.yaml
Remove Nginx Ingress resources; add observability-routes.yaml with VirtualServices for kiali/grafana/jaeger/prometheus/loki via bookinfo-gateway; tighten bookinfo VirtualService hosts.
Docs & monitor
Labs/10-Istio/README.md, Labs/10-Istio/monitor.sh
Document Loki integration and gateway-centric access instructions; update cleanup notes for Loki PVCs; extend monitor.sh to report Loki/addons, Istiod, sidecar status, prometheus metrics and multi-mode summaries.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant DNS as DNS/hosts
  participant Gateway as Istio Ingress Gateway
  participant VS as VirtualService (bookinfo-gateway)
  participant Service as Addon Service (loki)
  participant Pod as Loki Pod

  User->>DNS: resolve loki.local
  DNS-->>User: return GATEWAY_IP
  User->>Gateway: HTTP request for loki.local:3100
  Gateway->>VS: match host -> route to service
  VS->>Service: forward to loki:3100
  Service->>Pod: kube-proxy -> pod (StatefulSet)
  Pod-->>Service: response
  Service-->>VS: response
  VS-->>Gateway: response
  Gateway-->>User: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰
I hop where gateways gently hum,
Loki's logs like carrots come,
Sidecars snug in morning light,
I tidy PVCs through the night,
And nibble bugs till all is right.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title comprehensively captures the three main changes: enabling Istio sidecar injection across observability addons, integrating Loki as a new addon, and refactoring the deployment/cleanup logic. It directly aligns with the extensive changes throughout the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Labs/10-Istio/manifests/addons/kiali.yaml (1)

144-175: ⚠️ Potential issue | 🟠 Major

Add explicit security context for the Kiali pod/container.

The deployment currently relies on default security context, which allows root-capable execution. Please harden the pod/container security context.

🔒 Proposed hardening patch
     spec:
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       serviceAccountName: kiali
       containers:
         - name: kiali
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            capabilities:
+              drop: ["ALL"]
           image: quay.io/kiali/kiali:v1.86
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/kiali.yaml` around lines 144 - 175, The Kiali
Deployment lacks explicit pod/container securityContext allowing root execution;
update the pod spec and container spec where serviceAccountName is "kiali" and
the container named "kiali" to harden it: add a podSecurityContext with fsGroup
(e.g., 1000) and runAsNonRoot true, and set the container securityContext to
runAsUser/runAsGroup (non-root UID/GID, e.g., 1000), runAsNonRoot: true,
readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, drop all
capabilities, and set seccompProfile to RuntimeDefault to enforce OS-level
sandboxing.
Labs/10-Istio/manifests/addons/grafana.yaml (1)

72-100: ⚠️ Potential issue | 🟠 Major

Harden Grafana pod/container security context.

The deployment is still using default security context; this leaves the workload root-capable by default.

🔒 Proposed hardening patch
     spec:
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       containers:
         - name: grafana
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            capabilities:
+              drop: ["ALL"]
           image: grafana/grafana:10.4.1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/grafana.yaml` around lines 72 - 100, The
Grafana Deployment leaves pods root-capable; add a hardened securityContext at
pod and container levels for the grafana container: in the pod spec
(spec.template.spec) set runAsNonRoot: true, runAsUser to a non-zero uid (e.g.
1000) and fsGroup to a non-zero gid; in the grafana container spec
(spec.template.spec.containers where name == "grafana") add securityContext with
readOnlyRootFilesystem: true, allowPrivilegeEscalation: false,
capabilities.drop: ["ALL"], and seccompProfile: { type: "RuntimeDefault" } (and
set runAsGroup/runAsUser here if you prefer container-level). Ensure these keys
are added alongside the existing env/resources entries so the pod cannot start
as root or escalate privileges.
🧹 Nitpick comments (5)
Labs/10-Istio/manifests/ingress.yaml (1)

95-116: Consider removing this Loki nginx Ingress if gateway routing is now canonical.

This PR shifts addon access to Istio gateway routes; keeping a parallel nginx Ingress path increases config drift and can create an unintended alternate exposure path.

♻️ If nginx ingress is no longer needed, remove this block
-# Loki Ingress
-apiVersion: networking.k8s.io/v1
-kind: Ingress
-metadata:
-  name: loki
-  namespace: istio-system
-  labels:
-    app.kubernetes.io/name: loki
-spec:
-  ingressClassName: nginx
-  rules:
-    - host: loki.local
-      http:
-        paths:
-          - path: /
-            pathType: Prefix
-            backend:
-              service:
-                name: loki
-                port:
-                  number: 3100
----
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/ingress.yaml` around lines 95 - 116, Remove the nginx
Ingress resource for Loki to avoid creating an alternate exposure path now that
Istio gateway routing is canonical: delete the Ingress manifest block for kind:
Ingress with metadata.name: loki (namespace: istio-system) that sets
ingressClassName: nginx and host: loki.local; if you must keep it temporarily,
add a clear comment above the resource explaining it's legacy and intended for
fallback only, but prefer removing the resource to prevent config drift with the
Istio Gateway/VirtualService routes that now serve Loki.
Labs/10-Istio/manifests/addons/prometheus.yaml (1)

137-139: Redundant annotation — only the label is needed for sidecar injection in modern Istio.

In modern Istio, setting sidecar.istio.io/inject: "true" as a label on the pod template spec triggers sidecar injection, whereas the same value as an annotation does not. The annotation is a legacy mechanism and is superseded by the label in Istio 1.10+. Keeping the annotation alongside the label is harmless for backward compatibility, but you may want to document this intent explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/prometheus.yaml` around lines 137 - 139, The
pod spec currently sets sidecar.istio.io/inject: "true" both as a label and as
an annotation; remove the redundant annotation entry (the annotations:
sidecar.istio.io/inject key) and keep only the pod template label
sidecar.istio.io/inject: "true" (or add a brief comment explaining it's
intentionally retained for legacy compatibility) so that only the label controls
Istio sidecar injection (refer to the sidecar.istio.io/inject key in the pod
template).
Labs/10-Istio/manifests/addons/loki.yaml (1)

265-265: Hardcoded checksum/config annotation will silently stale when the ConfigMap changes.

This annotation is typically generated at helm template/helm upgrade time. Since it's committed as a static value, any manual edit to the loki ConfigMap will not trigger a pod rollout. Either remove it or add a note that it must be updated when config.yaml changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/loki.yaml` at line 265, The committed static
checksum/config annotation will prevent pod rollouts when the referenced
ConfigMap changes; remove the hardcoded "checksum/config:
7beee9ce6dba8de1e670322d1f02ca6271acbff4e76efe12d443937c11f802f8" from the pod
template (or update it as part of your Helm pipeline) so that changes to the
Loki ConfigMap (config.yaml) trigger restarts; search for the literal annotation
key "checksum/config" in the Loki manifest and either delete that annotation
from the Deployment/StatefulSet podTemplate or replace it with a templated/value
that is updated when the ConfigMap changes.
Labs/10-Istio/scripts/01-install-istio.sh (2)

68-81: grep -i istio may inadvertently delete ClusterRoles owned by istio-base.

The condition release != "istiod" will delete any ClusterRole/ClusterRoleBinding containing istio in its name that isn't owned by the istiod Helm release — including those that might be owned by istio-base (release-name = "istio-base"). If istio-base creates any cluster-scoped roles in some versions, they'd be wiped here just before istiod install. Consider guarding with [ "$release" != "istiod" ] && [ "$release" != "istio-base" ].

🔧 Proposed fix
  for role in $(kubectl get clusterrole -o name 2>/dev/null | grep -i istio || true); do
    release=$(kubectl get "$role" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true)
-   if [ "$release" != "istiod" ]; then
+   if [ "$release" != "istiod" ] && [ "$release" != "istio-base" ]; then
      print_info "Removing $role (not managed by Helm istiod)"
      kubectl delete "$role" --ignore-not-found 2>/dev/null || true
    fi
  done
  for binding in $(kubectl get clusterrolebinding -o name 2>/dev/null | grep -i istio || true); do
    release=$(kubectl get "$binding" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true)
-   if [ "$release" != "istiod" ]; then
+   if [ "$release" != "istiod" ] && [ "$release" != "istio-base" ]; then
      print_info "Removing $binding (not managed by Helm istiod)"
      kubectl delete "$binding" --ignore-not-found 2>/dev/null || true
    fi
  done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/01-install-istio.sh` around lines 68 - 81, The cleanup
loops for ClusterRole/ClusterRoleBinding (variables role, binding and the
release extraction using kubectl jsonpath) currently delete any resource
containing "istio" unless its release equals "istiod"; update the conditional to
also skip resources owned by the "istio-base" release by ensuring the if-test
checks that release is neither "istiod" nor "istio-base" (i.e., change the
condition around release for both loops to require [ "$release" != "istiod" ] &&
[ "$release" != "istio-base" ] before calling kubectl delete), keeping the rest
of the logic (print_info and --ignore-not-found) unchanged.

23-48: --wait=false on CRD deletion followed by a fixed sleep 3 may not be sufficient.

CRDs with existing custom resources (from a prior install) go through garbage collection before being fully removed. Three seconds is an optimistic bound; the Helm install that immediately follows may race against an in-progress CRD deletion, resulting in "already exists" or "invalid ownership metadata" errors — the exact problem this cleanup aims to prevent.

Consider either:

  • Using --wait (drop --wait=false) and allowing Kubernetes to confirm removal before proceeding, or
  • Increasing the sleep and adding a kubectl wait poll for the CRD to disappear.
🔧 Proposed fix
  for crd in $(kubectl get crd -o name 2>/dev/null | grep istio.io || true); do
    release=$(kubectl get "$crd" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true)
    if [ "$release" != "istio-base" ]; then
      print_info "Removing $crd (not managed by Helm istio-base)"
-     kubectl delete "$crd" --ignore-not-found --wait=false 2>/dev/null || true
+     kubectl delete "$crd" --ignore-not-found 2>/dev/null || true
    fi
  done

- # Allow removal to complete before Helm install
- sleep 3
+ # Allow removal to complete before Helm install
+ sleep 5
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/01-install-istio.sh` around lines 23 - 48, The CRD
cleanup loop uses kubectl delete "$crd" --ignore-not-found --wait=false and then
a fixed sleep 3, which can race with Helm; change the CRD deletion to wait for
actual removal (e.g., drop --wait=false so kubectl waits, or keep delete
non-blocking but follow it with a kubectl wait --for=delete on the same resource
name) and remove or extend the fixed sleep; apply the same waiting logic after
the CRD loop (and optionally for the validating/mutating webhook deletions) so
the script only proceeds to Helm install once kubectl wait confirms the
resources are gone.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 154-156: The cleanup step currently deletes addons and routes but
doesn't remove PVCs created by Loki's volumeClaimTemplates (e.g.,
storage-loki-0), leaving orphaned 10Gi volumes; after the existing kubectl
delete -f "${LAB_DIR}/manifests/addons/" command add an explicit PVC deletion
step that targets Loki's PVCs (for example by name "storage-loki-0" and/or by
label selector used by the Loki StatefulSet) or delete PVCs owned by the Loki
StatefulSet to ensure the persistent volume is removed after the StatefulSet is
gone; place this PVC deletion immediately after the addons delete and before the
print_success line so orphaned storage is cleaned up.

In `@Labs/10-Istio/manifests/addons/loki.yaml`:
- Around line 359-367: The anti-affinity currently uses
requiredDuringSchedulingIgnoredDuringExecution which can block scheduling on
single-node labs; change the pod anti-affinity to use
preferredDuringSchedulingIgnoredDuringExecution instead, replacing the required
block with a preferred entry (with a weight and a podAffinityTerm containing the
same labelSelector/matchLabels and topologyKey) so the scheduler treats it as a
soft preference; update the affinity stanza around affinity -> podAntiAffinity
and remove the requiredDuringSchedulingIgnoredDuringExecution key while reusing
the existing matchLabels (app.kubernetes.io/component,
app.kubernetes.io/instance, app.kubernetes.io/name) and topologyKey
kubernetes.io/hostname.
- Around line 120-153: The ClusterRole "loki-clusterrole" currently grants
cluster-wide read access to "secrets" which is over-privileged; modify the RBAC
so the rules for loki-clusterrole only include "configmaps" (remove "secrets")
or alternatively replace the ClusterRole/ClusterRoleBinding pair
(loki-clusterrole and loki-clusterrolebinding) with a namespaced Role and
RoleBinding bound to the loki ServiceAccount in istio-system so Loki cannot read
secrets cluster-wide.
- Around line 287-358: The loki and loki-sc-rules containers currently have no
resource limits (resources: {} or omitted), which can allow unbounded CPU/memory
use; add a resources block to both container specs (for container names "loki"
and "loki-sc-rules") including requests and limits for cpu and memory (e.g.,
requests: cpu 100m, memory 128Mi; limits: cpu 500m, memory 512Mi for loki and
smaller for the sidecar such as requests 50m/64Mi and limits 200m/256Mi),
ensuring the resources field is present under each container entry so the
cluster can enforce QoS and avoid OOM/CPU starvation.
- Around line 350-358: The loki-sc-rules sidecar (k8s-sidecar image) is marked
readOnlyRootFilesystem: true but lacks a writable /tmp mount; add a volumeMount
for mountPath "/tmp" to the loki-sc-rules container (e.g., name it "tmp") and
add a corresponding emptyDir volume entry in the Pod spec's volumes so the
sidecar has a writable /tmp; keep the existing sc-rules-volume mount and
securityContext unchanged except ensure the new /tmp mount provides write access
for temporary files.

In `@Labs/10-Istio/manifests/bookinfo-gateway.yaml`:
- Around line 27-29: The VirtualService host restriction 'bookinfo.local'
prevents matches for in-cluster requests sent to
istio-ingressgateway.istio-system.svc.cluster.local; update the VirtualService
(the hosts list in the VirtualService resource) to include the cluster-local
FQDN used by the traffic generator
(istio-ingressgateway.istio-system.svc.cluster.local) or replace the single host
with a broader match/wildcard so requests that don’t override the Host header
will also match the VirtualService.

---

Outside diff comments:
In `@Labs/10-Istio/manifests/addons/grafana.yaml`:
- Around line 72-100: The Grafana Deployment leaves pods root-capable; add a
hardened securityContext at pod and container levels for the grafana container:
in the pod spec (spec.template.spec) set runAsNonRoot: true, runAsUser to a
non-zero uid (e.g. 1000) and fsGroup to a non-zero gid; in the grafana container
spec (spec.template.spec.containers where name == "grafana") add securityContext
with readOnlyRootFilesystem: true, allowPrivilegeEscalation: false,
capabilities.drop: ["ALL"], and seccompProfile: { type: "RuntimeDefault" } (and
set runAsGroup/runAsUser here if you prefer container-level). Ensure these keys
are added alongside the existing env/resources entries so the pod cannot start
as root or escalate privileges.

In `@Labs/10-Istio/manifests/addons/kiali.yaml`:
- Around line 144-175: The Kiali Deployment lacks explicit pod/container
securityContext allowing root execution; update the pod spec and container spec
where serviceAccountName is "kiali" and the container named "kiali" to harden
it: add a podSecurityContext with fsGroup (e.g., 1000) and runAsNonRoot true,
and set the container securityContext to runAsUser/runAsGroup (non-root UID/GID,
e.g., 1000), runAsNonRoot: true, readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, drop all capabilities, and set seccompProfile
to RuntimeDefault to enforce OS-level sandboxing.

---

Nitpick comments:
In `@Labs/10-Istio/manifests/addons/loki.yaml`:
- Line 265: The committed static checksum/config annotation will prevent pod
rollouts when the referenced ConfigMap changes; remove the hardcoded
"checksum/config:
7beee9ce6dba8de1e670322d1f02ca6271acbff4e76efe12d443937c11f802f8" from the pod
template (or update it as part of your Helm pipeline) so that changes to the
Loki ConfigMap (config.yaml) trigger restarts; search for the literal annotation
key "checksum/config" in the Loki manifest and either delete that annotation
from the Deployment/StatefulSet podTemplate or replace it with a templated/value
that is updated when the ConfigMap changes.

In `@Labs/10-Istio/manifests/addons/prometheus.yaml`:
- Around line 137-139: The pod spec currently sets sidecar.istio.io/inject:
"true" both as a label and as an annotation; remove the redundant annotation
entry (the annotations: sidecar.istio.io/inject key) and keep only the pod
template label sidecar.istio.io/inject: "true" (or add a brief comment
explaining it's intentionally retained for legacy compatibility) so that only
the label controls Istio sidecar injection (refer to the sidecar.istio.io/inject
key in the pod template).

In `@Labs/10-Istio/manifests/ingress.yaml`:
- Around line 95-116: Remove the nginx Ingress resource for Loki to avoid
creating an alternate exposure path now that Istio gateway routing is canonical:
delete the Ingress manifest block for kind: Ingress with metadata.name: loki
(namespace: istio-system) that sets ingressClassName: nginx and host:
loki.local; if you must keep it temporarily, add a clear comment above the
resource explaining it's legacy and intended for fallback only, but prefer
removing the resource to prevent config drift with the Istio
Gateway/VirtualService routes that now serve Loki.

In `@Labs/10-Istio/scripts/01-install-istio.sh`:
- Around line 68-81: The cleanup loops for ClusterRole/ClusterRoleBinding
(variables role, binding and the release extraction using kubectl jsonpath)
currently delete any resource containing "istio" unless its release equals
"istiod"; update the conditional to also skip resources owned by the
"istio-base" release by ensuring the if-test checks that release is neither
"istiod" nor "istio-base" (i.e., change the condition around release for both
loops to require [ "$release" != "istiod" ] && [ "$release" != "istio-base" ]
before calling kubectl delete), keeping the rest of the logic (print_info and
--ignore-not-found) unchanged.
- Around line 23-48: The CRD cleanup loop uses kubectl delete "$crd"
--ignore-not-found --wait=false and then a fixed sleep 3, which can race with
Helm; change the CRD deletion to wait for actual removal (e.g., drop
--wait=false so kubectl waits, or keep delete non-blocking but follow it with a
kubectl wait --for=delete on the same resource name) and remove or extend the
fixed sleep; apply the same waiting logic after the CRD loop (and optionally for
the validating/mutating webhook deletions) so the script only proceeds to Helm
install once kubectl wait confirms the resources are gone.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c33a2e2 and 7b9783b.

📒 Files selected for processing (14)
  • Labs/10-Istio/demo.sh
  • Labs/10-Istio/manifests/addons/grafana.yaml
  • Labs/10-Istio/manifests/addons/jaeger.yaml
  • Labs/10-Istio/manifests/addons/kiali.yaml
  • Labs/10-Istio/manifests/addons/loki.yaml
  • Labs/10-Istio/manifests/addons/prometheus.yaml
  • Labs/10-Istio/manifests/bookinfo-gateway.yaml
  • Labs/10-Istio/manifests/ingress.yaml
  • Labs/10-Istio/manifests/observability-routes.yaml
  • Labs/10-Istio/manifests/traffic-generator.yaml
  • Labs/10-Istio/scripts/01-install-istio.sh
  • Labs/10-Istio/scripts/02-install-addons.sh
  • Labs/10-Istio/scripts/03-deploy-bookinfo.sh
  • Labs/10-Istio/scripts/04-traffic-generator.sh

Comment thread Labs/10-Istio/demo.sh
Comment thread Labs/10-Istio/manifests/addons/loki.yaml
Comment thread Labs/10-Istio/manifests/addons/loki.yaml
Comment thread Labs/10-Istio/manifests/addons/loki.yaml
Comment thread Labs/10-Istio/manifests/addons/loki.yaml Outdated
Comment thread Labs/10-Istio/manifests/bookinfo-gateway.yaml
Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
…deploy/cleanup processes.

Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Labs/10-Istio/manifests/ingress.yaml (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Update the header comment to include loki.local.

The comment lists the hosts but omits the newly added loki.local.

📝 Proposed fix
-# Hosts: kiali.local, grafana.local, jaeger.local, prometheus.local, bookinfo.local
+# Hosts: kiali.local, grafana.local, jaeger.local, prometheus.local, loki.local, bookinfo.local
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/ingress.yaml` at line 4, Update the header comment
that lists hosts to include the newly added host by appending "loki.local" to
the existing comma-separated list (the comment line starting with "# Hosts:
kiali.local, grafana.local, jaeger.local, prometheus.local, bookinfo.local");
ensure formatting matches the existing comment (comma and space separation).
Labs/10-Istio/demo.sh (1)

184-184: ⚠️ Potential issue | 🟠 Major

set -u will abort with "unbound variable" when no argument is given.

Running ./demo.sh without arguments hits ${1} while nounset is active, producing a cryptic bash error instead of the friendly usage text on line 192.

Proposed fix
-case "${1}" in
+case "${1:-}" in
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo.sh` at line 184, The script fails under set -u because
case "${1}" expands an unset positional and aborts; change the case to use a
safe default (e.g. case "${1:-}" in) or explicitly check $# -eq 0 before the
case and print the existing usage string, so the demo.sh entry point (the case
"${1}" in) won't trigger an "unbound variable" when no argument is provided.
♻️ Duplicate comments (8)
Labs/10-Istio/manifests/addons/loki.yaml (4)

120-153: ClusterRole grants cluster-wide secrets read access — already flagged.

The secrets resource in the ClusterRole is still present per the previous review. If secrets are removed from the ClusterRole, also update the loki-sc-rules container env var RESOURCE from "both" to "configmap" (line 343) to stay consistent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/loki.yaml` around lines 120 - 153, Remove
"secrets" from the ClusterRole resources list in the ClusterRole resource named
"loki-clusterrole" (keep only "configmaps") and then update the loki-sc-rules
container environment variable RESOURCE from "both" to "configmap" so
permissions and the container config remain consistent; locate the ClusterRole
block with metadata.name: loki-clusterrole and the container definition for
loki-sc-rules that defines env var RESOURCE and change the values accordingly.

287-331: Missing resource limits for both containers — already flagged.

loki has resources: {} (line 330-331) and loki-sc-rules omits the field entirely. See previous review for suggested bounds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/loki.yaml` around lines 287 - 331, The loki
and loki-sc-rules containers are missing resource constraints: update the
container specs (container name "loki" and container name "loki-sc-rules") to
include appropriate resources.requests and resources.limits blocks (e.g. CPU and
memory request/limit values consistent with the previous review bounds) instead
of resources: {} or omitting the field; add the same or adjusted values to both
containers' spec.volumeMounts sections so Kubernetes can schedule and limit them
reliably.

359-367: Hard anti-affinity blocks single-node lab clusters — already flagged.

requiredDuringSchedulingIgnoredDuringExecution is still in place. See previous review for the preferredDuringScheduling alternative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/loki.yaml` around lines 359 - 367, The
affinity block uses hard pod anti-affinity
(requiredDuringSchedulingIgnoredDuringExecution), which prevents scheduling on
single-node labs; replace requiredDuringSchedulingIgnoredDuringExecution under
affinity.podAntiAffinity with preferredDuringSchedulingIgnoredDuringExecution
and convert the existing list item into a WeightedPodAffinityTerm (set weight
(e.g., 100) and move the current podAffinityTerm contents — labelSelector with
matchLabels and topologyKey — under podAffinityTerm) so anti-affinity becomes a
preference rather than a hard requirement (update the affinity/podAntiAffinity
structure where requiredDuringSchedulingIgnoredDuringExecution is declared).

350-358: loki-sc-rules sidecar missing /tmp mount — already flagged.

The sidecar has readOnlyRootFilesystem: true but no writable /tmp emptyDir, unlike the main loki container. See previous review for the fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/loki.yaml` around lines 350 - 358, The sidecar
container "loki-sc-rules" is marked readOnlyRootFilesystem: true but lacks a
writable /tmp mount; add an emptyDir volume (e.g. name: sc-rules-tmp) to the Pod
spec's volumes and add a volumeMount in the loki-sc-rules container mounting
that volume at /tmp (mountPath: "/tmp") so the sidecar gets a writable tmpfs
while keeping other mounts read-only; ensure the new volume name is referenced
exactly in the loki-sc-rules volumeMount entry.
Labs/10-Istio/manifests/addons/grafana.yaml (1)

70-74: Sidecar injection annotation correctly enabled; same inert label note as Jaeger.

The annotation change from "false" to "true" is correct. The sidecar.istio.io/inject label on line 72 is the same pattern noted in the Jaeger review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/grafana.yaml` around lines 70 - 74, There is a
duplicate Istio sidecar injection marker — remove the sidecar.istio.io/inject
entry from the metadata.labels block (the labels: app: grafana /
sidecar.istio.io/inject) and keep the sidecar.istio.io/inject annotation in
annotations: so only the annotation controls injection; update the labels block
to only include app: grafana and ensure no duplicate label key remains.
Labs/10-Istio/manifests/addons/prometheus.yaml (1)

135-139: Same sidecar label/annotation pattern — annotation correctly enables injection.

Same note as the Jaeger review regarding the inert sidecar.istio.io/inject label.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/prometheus.yaml` around lines 135 - 139, The
labels block contains an inert sidecar.istio.io/inject label that should be
removed—keep only the annotation sidecar.istio.io/inject: "true" (in the
annotations block) to enable Istio sidecar injection; update the manifest by
deleting the labels entry sidecar.istio.io/inject and leaving app: prometheus
unchanged so only the annotation controls injection.
Labs/10-Istio/manifests/addons/kiali.yaml (1)

142-146: Same sidecar label/annotation pattern — annotation correctly enables injection.

Same note as the Jaeger review regarding the inert sidecar.istio.io/inject label.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/kiali.yaml` around lines 142 - 146, The
manifest sets sidecar.istio.io/inject as both a label and an annotation under
the kiali resource; the label is inert and should be removed to avoid
confusion—keep the annotation that actually enables injection. Edit the labels
block (where app: kiali appears) to remove the sidecar.istio.io/inject label and
ensure sidecar.istio.io/inject: "true" remains only under the annotations block
so injection behavior is controlled solely by the annotation.
Labs/10-Istio/demo.sh (1)

154-156: Loki PVCs from volumeClaimTemplates are still not cleaned up.

As noted in a prior review, kubectl delete -f addons/ removes the StatefulSet but leaves its PVCs (e.g., storage-loki-0) orphaned. This concern is still unaddressed.

Proposed fix — delete Loki PVCs after StatefulSet removal
  kubectl delete -f "${LAB_DIR}/manifests/addons/" 2>/dev/null || true
+ kubectl delete pvc -l app.kubernetes.io/name=loki -n istio-system 2>/dev/null || true
  print_success "Addons and gateway routes removed"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo.sh` around lines 154 - 156, After removing the addons
StatefulSet with kubectl delete -f "${LAB_DIR}/manifests/addons/", also
explicitly delete the orphaned Loki PVCs (e.g., storage-loki-0) before printing
success: run a kubectl delete pvc command that targets the Loki PVCs (by name
pattern or by label selector used by the Loki StatefulSet) in the same
namespace, ensure the command tolerates no-matches (suppress errors or use ||
true), then call print_success "Addons and gateway routes removed" as before;
update the block that currently contains kubectl delete -f
"${LAB_DIR}/manifests/addons/" and print_success to include this additional PVC
cleanup step.
🧹 Nitpick comments (3)
Labs/10-Istio/manifests/addons/jaeger.yaml (1)

77-81: sidecar.istio.io/inject is correctly set as a label — this is the canonical Istio control mechanism.

The label on line 79 is not inert; per official Istio documentation, sidecar.istio.io/inject as a label on the pod template is the reliable and explicit way to control sidecar injection at the pod level. The annotation on line 81 is redundant. Both work, but having both is unnecessary — keep the label and consider removing the annotation for clarity.

🧹 Remove redundant annotation
      labels:
        app: jaeger
        sidecar.istio.io/inject: "true"
      annotations:
-        sidecar.istio.io/inject: "true"
        prometheus.io/scrape: "true"
        prometheus.io/port: "14269"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/manifests/addons/jaeger.yaml` around lines 77 - 81, The pod
template currently sets sidecar.istio.io/inject both as a label and as an
annotation; keep the canonical label (labels: sidecar.istio.io/inject: "true")
and remove the redundant annotations block entry (annotations:
sidecar.istio.io/inject: "true") so only the label remains in the Jaeger
deployment/podTemplate (the entries around labels: app: jaeger and annotations:
sidecar.istio.io/inject).
Labs/10-Istio/demo.sh (2)

176-177: Consider --ignore-not-found to suppress noisy "not found" errors.

When some of these resources exist but others don't, kubectl delete emits error messages to stderr before || true swallows the exit code. Adding --ignore-not-found keeps the output clean and is consistent with the CRD cleanup loop on line 172.

Suggested diff
-  kubectl delete clusterrole istio-prometheus kiali loki-clusterrole 2>/dev/null || true
-  kubectl delete clusterrolebinding istio-prometheus kiali loki-clusterrolebinding 2>/dev/null || true
+  kubectl delete clusterrole istio-prometheus kiali loki-clusterrole --ignore-not-found 2>/dev/null || true
+  kubectl delete clusterrolebinding istio-prometheus kiali loki-clusterrolebinding --ignore-not-found 2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo.sh` around lines 176 - 177, Replace the noisy kubectl
delete invocations that pipe stderr to /dev/null with versions that use the
--ignore-not-found flag: update the two commands that delete clusterroles
("kubectl delete clusterrole istio-prometheus kiali loki-clusterrole") and
clusterrolebindings ("kubectl delete clusterrolebinding istio-prometheus kiali
loki-clusterrolebinding") to include --ignore-not-found and remove the stderr
redirection/|| true so missed resources do not print errors.

149-150: Minor: PeerAuthentication cleanup runs after the bookinfo namespace is already deleted.

Lines 141–144 delete the bookinfo namespace (which removes all namespaced resources within it), so the kubectl delete peerauthentication --all -n bookinfo on line 149 is effectively a no-op. Consider moving these lines before the namespace deletion, or removing line 149 entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo.sh` around lines 149 - 150, The PeerAuthentication cleanup
for the bookinfo namespace is redundant because the script deletes the bookinfo
namespace earlier; either remove the redundant "kubectl delete
peerauthentication --all -n bookinfo" line or move that "kubectl delete
peerauthentication --all -n bookinfo" invocation to run before the "kubectl
delete namespace bookinfo" call so the namespaced resource is actually deleted;
keep the "kubectl delete peerauthentication --all -n istio-system" line as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 59-62: The assignments to GATEWAY_IP use command substitution with
kubectl which can return non-zero and abort the script under set -e; update both
kubectl calls that reference istio-ingressgateway (the jsonpath for
.status.loadBalancer.ingress[0].ip and the hostname fallback) to guard the
command substitution from propagating failure by appending || true to each
kubectl command so a missing/unready service won’t terminate the script and
GATEWAY_IP will simply be empty if not found.

---

Outside diff comments:
In `@Labs/10-Istio/demo.sh`:
- Line 184: The script fails under set -u because case "${1}" expands an unset
positional and aborts; change the case to use a safe default (e.g. case "${1:-}"
in) or explicitly check $# -eq 0 before the case and print the existing usage
string, so the demo.sh entry point (the case "${1}" in) won't trigger an
"unbound variable" when no argument is provided.

In `@Labs/10-Istio/manifests/ingress.yaml`:
- Line 4: Update the header comment that lists hosts to include the newly added
host by appending "loki.local" to the existing comma-separated list (the comment
line starting with "# Hosts: kiali.local, grafana.local, jaeger.local,
prometheus.local, bookinfo.local"); ensure formatting matches the existing
comment (comma and space separation).

---

Duplicate comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 154-156: After removing the addons StatefulSet with kubectl delete
-f "${LAB_DIR}/manifests/addons/", also explicitly delete the orphaned Loki PVCs
(e.g., storage-loki-0) before printing success: run a kubectl delete pvc command
that targets the Loki PVCs (by name pattern or by label selector used by the
Loki StatefulSet) in the same namespace, ensure the command tolerates no-matches
(suppress errors or use || true), then call print_success "Addons and gateway
routes removed" as before; update the block that currently contains kubectl
delete -f "${LAB_DIR}/manifests/addons/" and print_success to include this
additional PVC cleanup step.

In `@Labs/10-Istio/manifests/addons/grafana.yaml`:
- Around line 70-74: There is a duplicate Istio sidecar injection marker —
remove the sidecar.istio.io/inject entry from the metadata.labels block (the
labels: app: grafana / sidecar.istio.io/inject) and keep the
sidecar.istio.io/inject annotation in annotations: so only the annotation
controls injection; update the labels block to only include app: grafana and
ensure no duplicate label key remains.

In `@Labs/10-Istio/manifests/addons/kiali.yaml`:
- Around line 142-146: The manifest sets sidecar.istio.io/inject as both a label
and an annotation under the kiali resource; the label is inert and should be
removed to avoid confusion—keep the annotation that actually enables injection.
Edit the labels block (where app: kiali appears) to remove the
sidecar.istio.io/inject label and ensure sidecar.istio.io/inject: "true" remains
only under the annotations block so injection behavior is controlled solely by
the annotation.

In `@Labs/10-Istio/manifests/addons/loki.yaml`:
- Around line 120-153: Remove "secrets" from the ClusterRole resources list in
the ClusterRole resource named "loki-clusterrole" (keep only "configmaps") and
then update the loki-sc-rules container environment variable RESOURCE from
"both" to "configmap" so permissions and the container config remain consistent;
locate the ClusterRole block with metadata.name: loki-clusterrole and the
container definition for loki-sc-rules that defines env var RESOURCE and change
the values accordingly.
- Around line 287-331: The loki and loki-sc-rules containers are missing
resource constraints: update the container specs (container name "loki" and
container name "loki-sc-rules") to include appropriate resources.requests and
resources.limits blocks (e.g. CPU and memory request/limit values consistent
with the previous review bounds) instead of resources: {} or omitting the field;
add the same or adjusted values to both containers' spec.volumeMounts sections
so Kubernetes can schedule and limit them reliably.
- Around line 359-367: The affinity block uses hard pod anti-affinity
(requiredDuringSchedulingIgnoredDuringExecution), which prevents scheduling on
single-node labs; replace requiredDuringSchedulingIgnoredDuringExecution under
affinity.podAntiAffinity with preferredDuringSchedulingIgnoredDuringExecution
and convert the existing list item into a WeightedPodAffinityTerm (set weight
(e.g., 100) and move the current podAffinityTerm contents — labelSelector with
matchLabels and topologyKey — under podAffinityTerm) so anti-affinity becomes a
preference rather than a hard requirement (update the affinity/podAntiAffinity
structure where requiredDuringSchedulingIgnoredDuringExecution is declared).
- Around line 350-358: The sidecar container "loki-sc-rules" is marked
readOnlyRootFilesystem: true but lacks a writable /tmp mount; add an emptyDir
volume (e.g. name: sc-rules-tmp) to the Pod spec's volumes and add a volumeMount
in the loki-sc-rules container mounting that volume at /tmp (mountPath: "/tmp")
so the sidecar gets a writable tmpfs while keeping other mounts read-only;
ensure the new volume name is referenced exactly in the loki-sc-rules
volumeMount entry.

In `@Labs/10-Istio/manifests/addons/prometheus.yaml`:
- Around line 135-139: The labels block contains an inert
sidecar.istio.io/inject label that should be removed—keep only the annotation
sidecar.istio.io/inject: "true" (in the annotations block) to enable Istio
sidecar injection; update the manifest by deleting the labels entry
sidecar.istio.io/inject and leaving app: prometheus unchanged so only the
annotation controls injection.

---

Nitpick comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 176-177: Replace the noisy kubectl delete invocations that pipe
stderr to /dev/null with versions that use the --ignore-not-found flag: update
the two commands that delete clusterroles ("kubectl delete clusterrole
istio-prometheus kiali loki-clusterrole") and clusterrolebindings ("kubectl
delete clusterrolebinding istio-prometheus kiali loki-clusterrolebinding") to
include --ignore-not-found and remove the stderr redirection/|| true so missed
resources do not print errors.
- Around line 149-150: The PeerAuthentication cleanup for the bookinfo namespace
is redundant because the script deletes the bookinfo namespace earlier; either
remove the redundant "kubectl delete peerauthentication --all -n bookinfo" line
or move that "kubectl delete peerauthentication --all -n bookinfo" invocation to
run before the "kubectl delete namespace bookinfo" call so the namespaced
resource is actually deleted; keep the "kubectl delete peerauthentication --all
-n istio-system" line as-is.

In `@Labs/10-Istio/manifests/addons/jaeger.yaml`:
- Around line 77-81: The pod template currently sets sidecar.istio.io/inject
both as a label and as an annotation; keep the canonical label (labels:
sidecar.istio.io/inject: "true") and remove the redundant annotations block
entry (annotations: sidecar.istio.io/inject: "true") so only the label remains
in the Jaeger deployment/podTemplate (the entries around labels: app: jaeger and
annotations: sidecar.istio.io/inject).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b9783b and ac2eeaa.

📒 Files selected for processing (14)
  • Labs/10-Istio/demo.sh
  • Labs/10-Istio/manifests/addons/grafana.yaml
  • Labs/10-Istio/manifests/addons/jaeger.yaml
  • Labs/10-Istio/manifests/addons/kiali.yaml
  • Labs/10-Istio/manifests/addons/loki.yaml
  • Labs/10-Istio/manifests/addons/prometheus.yaml
  • Labs/10-Istio/manifests/bookinfo-gateway.yaml
  • Labs/10-Istio/manifests/ingress.yaml
  • Labs/10-Istio/manifests/observability-routes.yaml
  • Labs/10-Istio/manifests/traffic-generator.yaml
  • Labs/10-Istio/scripts/01-install-istio.sh
  • Labs/10-Istio/scripts/02-install-addons.sh
  • Labs/10-Istio/scripts/03-deploy-bookinfo.sh
  • Labs/10-Istio/scripts/04-traffic-generator.sh
🚧 Files skipped from review as they are similar to previous changes (5)
  • Labs/10-Istio/scripts/02-install-addons.sh
  • Labs/10-Istio/scripts/01-install-istio.sh
  • Labs/10-Istio/manifests/observability-routes.yaml
  • Labs/10-Istio/manifests/traffic-generator.yaml
  • Labs/10-Istio/manifests/bookinfo-gateway.yaml

Comment thread Labs/10-Istio/demo.sh Outdated
@talorlik talorlik changed the title Corrected deploy and cleanup plus addons lab10: Istio sidecar injection, Loki observability addon, and deploy/cleanup refactor Feb 25, 2026
… updated documentation.

Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)
Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh (3)

18-22: ⚠️ Potential issue | 🟡 Minor

lsof-absent fallback silently allows duplicate port-forwards.

When lsof is unavailable, is_port_listening always returns 1 ("not listening"), so start_port_forward unconditionally launches a new kubectl port-forward. Because the process is backgrounded, set -e never fires — the conflict silently surfaces only in the per-session log file under $RUNTIME_DIR, with no warning to the user.

Consider falling back to ss or netstat before giving up, or at least log a warning when lsof is absent so the user knows the guard is inactive:

🛡️ Proposed fix – warn when guard is disabled
 is_port_listening() {
   local port="$1"
   if command -v lsof >/dev/null 2>&1; then
     lsof -nP -iTCP:"${port}" -sTCP:LISTEN >/dev/null 2>&1
+  elif command -v ss >/dev/null 2>&1; then
+    ss -tlnH "sport = :${port}" | grep -q .
   else
+    echo "Warning: neither lsof nor ss found; cannot check port ${port}." >&2
     return 1
   fi
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 18 - 22,
The is_port_listening guard currently treats missing lsof as "not listening",
causing start_port_forward to launch duplicate kubectl port-forwards; update
is_port_listening to try ss and then netstat when lsof is absent (checking for
their commands and using appropriate flags to test LISTEN on "${port}"), and if
none are present log a clear warning/error stating the port-check guard is
disabled so the user knows duplicate port-forwards may occur; ensure
start_port_forward still respects the boolean result from is_port_listening.

111-112: ⚠️ Potential issue | 🟡 Minor

touch does not truncate — stale PIDs accumulate across script runs.

touch "$PF_PID_FILE" creates the file if absent but leaves existing content intact. On a second run, old (possibly recycled) PIDs pile up, and xargs kill < $PF_PID_FILE on line 131 may then kill unrelated processes.

Replace touch with a truncating open (or >) to start fresh each run:

🐛 Proposed fix
-touch "$PF_PID_FILE"
+>"$PF_PID_FILE"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 111 -
112, The script currently uses touch "$PF_PID_FILE" which preserves old PIDs;
change this to truncate/create PF_PID_FILE before starting port forwards so
stale PIDs don't accumulate and get killed by xargs kill < $PF_PID_FILE; update
the block around the touch and start_port_forward calls (referencing PF_PID_FILE
and start_port_forward) to ensure PF_PID_FILE is emptied/created fresh at
runtime (use a truncating open/redirect) before any PIDs are appended.

138-138: ⚠️ Potential issue | 🟡 Minor

Cleanup command uses bare relative paths — breaks when run from any other directory.

kubectl delete -f manifests/... relies on the user's CWD being $WORK_DIR. Printing absolute paths makes the hint copy-paste safe regardless of where the user's shell is.

🐛 Proposed fix
-echo "Cleanup resources: kubectl delete -f manifests/02-istio-routing.yaml; kubectl delete -f manifests/01-app.yaml"
+echo "Cleanup resources: kubectl delete -f ${WORK_DIR}/manifests/02-istio-routing.yaml; kubectl delete -f ${WORK_DIR}/manifests/01-app.yaml"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` at line 138, The
cleanup echo currently prints relative paths ("kubectl delete -f manifests/...")
which fails if the user isn't in the script's working dir; update the echo in
demo.sh that prints the cleanup command so it uses absolute paths (e.g. expand
to "${WORK_DIR}/manifests/02‑istio‑routing.yaml" and
"${WORK_DIR}/manifests/01‑app.yaml" or compute the script dir via a variable and
use that) instead of bare "manifests/..." so the printed hint is copy‑paste safe
from any CWD.
Labs/10-Istio/demo/01-basic-setup/demo.sh (5)

90-97: ⚠️ Potential issue | 🟡 Minor

rm -f "$PF_PID_FILE" is missing the || true guard (inconsistent with the rest of the file).

Every other rm call in the script appends >/dev/null 2>&1 || true. Under set -euo pipefail, a failure here (e.g., permission issue) would abort the script.

🐛 Proposed fix
-    rm -f "$PF_PID_FILE"
+    rm -f "$PF_PID_FILE" >/dev/null 2>&1 || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 90 - 97, The rm
invocation that removes PF_PID_FILE inside the conditional block can fail under
set -euo pipefail and currently lacks the same defensive redirect-and-|| true
pattern used elsewhere; update the rm call that references PF_PID_FILE in the
block (after the while loop that reads and kills PIDs) to append >/dev/null 2>&1
|| true so the removal failure won't abort the script while keeping behavior
consistent with other rm calls.

342-345: ⚠️ Potential issue | 🟡 Minor

Log file paths in the error hints are wrong — users won't be able to find them.

start_port_forward names its log files $RUNTIME_DIR/port-forward-<ns>-<svc>-<port>.log (no dot prefix, stored under $RUNTIME_DIR). The hardcoded strings in the inline hints omit $RUNTIME_DIR and add a spurious leading dot, making the hint misleading.

🐛 Proposed fix
-  curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check .port-forward-istio-system-kiali-20001.log)"
-  curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check .port-forward-istio-system-istio-ingressgateway-8080.log)"
+  curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check $RUNTIME_DIR/port-forward-istio-system-kiali-20001.log)"
+  curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check $RUNTIME_DIR/port-forward-istio-system-istio-ingressgateway-8080.log)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 342 - 345, The inline
hints printed by the echo statements incorrectly reference log files as
".port-forward-...log" and omit the RUNTIME_DIR where start_port_forward
actually writes logs; update the two echo messages in demo.sh (the lines that
output "Kiali: not reachable yet (...)" and "Bookinfo: not reachable yet (...)")
to reference the correct runtime path by including $RUNTIME_DIR and removing the
leading dot so they match the naming used by start_port_forward
(port-forward-<ns>-<svc>-<port>.log).

228-231: ⚠️ Potential issue | 🟠 Major

Injecting Istio into ALL non-system namespaces is too broad and destructive on shared clusters.

This loop labels every namespace that isn't kube-system, kube-public, kube-node-lease, or istio-system. On any shared or pre-existing cluster this silently schedules sidecar injection for all workloads in all user namespaces (including unrelated ones) on their next restart — and the labels are not removed by the cleanup function.

Only the namespaces that actually need injection (default for Bookinfo, demo for the demo workloads) should be labelled.

🐛 Proposed targeted approach
-for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}'); do
-  case "$ns" in kube-system | kube-public | kube-node-lease | istio-system) continue ;; esac
-  kubectl label namespace "$ns" istio-injection=enabled --overwrite
-done
+for ns in default demo; do
+  if kubectl get ns "$ns" >/dev/null 2>&1; then
+    kubectl label namespace "$ns" istio-injection=enabled --overwrite
+  fi
+done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 228 - 231, Replace
the broad namespace loop that labels every non-system namespace (the for ...
kubectl label namespace "$ns" istio-injection=enabled --overwrite block) with
explicit labeling of only the namespaces needed for the demo (e.g., run kubectl
label namespace default istio-injection=enabled --overwrite and kubectl label
namespace demo istio-injection=enabled --overwrite) and update the cleanup logic
to remove those labels (use kubectl label namespace <name> istio-injection- to
unset) so you don’t affect unrelated user namespaces.

323-323: ⚠️ Potential issue | 🟡 Minor

Cleanup hint omits reversing istio-injection labels.

The istioctl uninstall --purge + namespace deletion hint is correct, but the istio-injection=enabled labels added to default (and potentially other namespaces when the broader loop is used) are not cleaned up. Pods in the default namespace will continue to have sidecars injected after Istio is gone, causing CrashLoopBackOff errors until the label is removed.

Consider adding to the cleanup hint:

kubectl label namespace default istio-injection-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` at line 323, The cleanup hint in
demo.sh currently suggests only running istioctl uninstall --purge -y and
kubectl delete ns istio-system demo but omits removing the istio-injection
label(s); update the echo string to instruct removing the label (e.g., kubectl
label namespace default istio-injection-) and mention removing the label from
any other namespaces you labeled in the loop so pods stop receiving sidecars
after Istio is uninstalled.

265-294: ⚠️ Potential issue | 🟡 Minor

Step numbers are non-sequential — steps 10, 12, and 13 are missing from the comments.

The numbering jumps from Step 09 → Step 11 → Step 14. This likely reflects removed steps whose numbers were never renumbered. Keeping sequential step comments makes it easier to cross-reference with lab documentation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 265 - 294, The step
comments in demo.sh are out of sequence (e.g., "Step 11: Creating demo
VirtualService for reviews service...", then "Step 14: Creating demo
namespace...", "Step 15: Deploying Nginx...", "Step 16: Deploying HTTPD...",
"Step 17: Verifying demo deployment..."); update these echo step numbers to be
sequential to match the lab flow (renumber them to fill the missing 10,12,13 as
appropriate) so the messages that reference reviews-vs.yaml,
demo-namespace.yaml, nginx-demo.yaml, httpd-demo.yaml and the final verification
reflect a continuous sequence; ensure all "Step X:" prefixes in the apply/wait
blocks are renumbered consistently and any cross-references in the script or
docs are adjusted to the new numbering.
Labs/10-Istio/scripts/05-verify.sh (1)

19-19: ⚠️ Potential issue | 🔴 Critical

Guard kubectl command substitutions to prevent premature script exit.

The script uses set -e (line 2), and multiple kubectl command substitutions lack guards. If any of these commands fail (e.g., cluster inaccessible, permission denied), the script will terminate before reaching the pass/fail summary (lines 94–98), preventing users from seeing verification results.

Apply || true to the following lines to ensure the script always completes:

  • Line 19: ISTIOD_STATUS=$(kubectl get pods ... 2>/dev/null || true)
  • Line 28: GW_STATUS=$(kubectl get pods ... 2>/dev/null || true)
  • Line 33: GW_STATUS=$(kubectl get pods ... 2>/dev/null || true)
  • Line 45: STATUS=$(kubectl get pods ... 2>/dev/null || true) (in loop)
  • Line 57: STATUS=$(kubectl get pods ... 2>/dev/null || true) (in loop)
  • Line 60: CONTAINERS=$(kubectl get pods ... 2>/dev/null || true) (also unguarded)
  • Line 74: CRONJOB=$(kubectl get cronjob ... 2>/dev/null || true)
  • Line 83: PROM_POD=$(kubectl get pods ... 2>/dev/null || true)
  • Line 85: The pipeline with kubectl exec also needs guarding: METRIC_COUNT=$(kubectl exec ... 2>/dev/null || true | ...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/05-verify.sh` at line 19, Several kubectl command
substitutions (e.g., ISTIOD_STATUS, GW_STATUS, STATUS, CONTAINERS, CRONJOB,
PROM_POD, METRIC_COUNT) are unguarded and will cause the script to exit under
set -e if kubectl fails; update each substitution to append a fallback "|| true"
so the command returns success on failure (for example change the assignment for
ISTIOD_STATUS, both GW_STATUS occurrences, the loop STATUS assignments,
CONTAINERS, CRONJOB, PROM_POD, and the METRIC_COUNT pipeline involving kubectl
exec) to ensure the script continues to the final pass/fail summary even when
kubectl errors occur.
Labs/10-Istio/monitor.sh (3)

119-130: ⚠️ Potential issue | 🟠 Major

Loki is missing from addon health/test/summary coverage (Line 119).

PR objective includes Loki observability, but addon checks and status views only track Prometheus/Grafana/Jaeger/Kiali. This leaves a monitoring blind spot.

💡 Proposed fix
-  for addon in prometheus grafana jaeger kiali; do
+  for addon in prometheus grafana jaeger kiali loki; do
@@
-  kubectl get svc -n istio-system -l 'app in (prometheus,grafana,kiali,jaeger)' 2>/dev/null ||
-    kubectl get svc -n istio-system 2>/dev/null | grep -E '(NAME|prometheus|grafana|kiali|tracing|jaeger|zipkin)'
+  kubectl get svc -n istio-system -l 'app in (prometheus,grafana,kiali,jaeger,loki)' 2>/dev/null ||
+    kubectl get svc -n istio-system 2>/dev/null | grep -E '(NAME|prometheus|grafana|kiali|tracing|jaeger|zipkin|loki)'
@@
-  for addon in prometheus grafana kiali jaeger; do
+  for addon in prometheus grafana kiali jaeger loki; do
@@
-  for addon in kiali prometheus grafana jaeger; do
+  for addon in kiali prometheus grafana jaeger loki; do

Also applies to: 261-270, 324-327

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/monitor.sh` around lines 119 - 130, The addon health and
service-summary blocks miss Loki; update the monitoring checks in monitor.sh to
include "loki" alongside prometheus,grafana,jaeger,kiali: add "loki" to the for
loop that sets STATUS (used with kubectl get pods and variables
STATUS/print_info/print_error), add "loki" to the kubectl service label selector
('app in (...)') and to the fallback grep pattern (the grep -E list), and apply
the same change to the other similar blocks referenced (the repeated kubectl get
svc / grep sections). Ensure all occurrences of the app list and grep patterns
consistently include "loki" so pod status, svc listing, and summary output cover
Loki.

373-393: ⚠️ Potential issue | 🔴 Critical

Handle missing CLI args safely under strict mode (Lines 373–390).

With set -u enabled, the script exits with an unbound variable error when called without arguments, preventing interactive mode from running. The $1 parameter is accessed directly in three if-conditions without checking if it's set.

Proposed fix
 main() {
   clear
   print_header "ISTIO + KIALI LAB MONITORING"
   check_prerequisites
+  local mode="${1:-}"

-  if [ "$1" = "full" ] || [ "$1" = "-f" ] || [ "$1" = "--full" ]; then
+  if [ "$mode" = "full" ] || [ "$mode" = "-f" ] || [ "$mode" = "--full" ]; then
     show_summary
     check_istio
     check_pods
@@
-  if [ "$1" = "test" ] || [ "$1" = "-t" ] || [ "$1" = "--test" ]; then
+  if [ "$mode" = "test" ] || [ "$mode" = "-t" ] || [ "$mode" = "--test" ]; then
     test_pipeline
     exit 0
   fi

-  if [ "$1" = "summary" ] || [ "$1" = "-s" ] || [ "$1" = "--summary" ]; then
+  if [ "$mode" = "summary" ] || [ "$mode" = "-s" ] || [ "$mode" = "--summary" ]; then
     show_summary
     exit 0
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/monitor.sh` around lines 373 - 393, The script accesses
positional parameter $1 directly in the three conditional blocks (the branches
that call show_summary, test_pipeline, etc.), which breaks under set -u when no
args are provided; update those conditionals to safely handle an absent argument
by using a default expansion (e.g., ${1:-}) or by first checking $# -eq 0 for
interactive behavior, and then compare that safe value against
"full"/"-f"/"--full", "test"/"-t"/"--test", and "summary"/"-s"/"--summary" so
show_summary, check_istio, check_pods, check_addons, check_bookinfo,
check_traffic, check_mtls, and test_pipeline are only invoked when a valid
argument is present.

238-275: ⚠️ Potential issue | 🔴 Critical

Add || true to all grep -c pipelines to prevent script termination with pipefail.

The script uses set -euo pipefail (line 2), which causes any command returning exit code 1 to terminate execution. When grep -c finds zero matches, it returns exit code 1, causing the pipeline to fail and the script to exit before your pass/fail logic runs.

This affects 8 locations:

  • Lines 148, 238, 248, 250, 262, 274, 329, 332
💡 Proposed fix pattern
-  HAS_SIDECAR=$(echo "$line" | grep -c "istio-proxy")
+  HAS_SIDECAR=$(echo "$line" | grep -c "istio-proxy" || true)

-  ISTIOD=$(kubectl get pods -n istio-system -l app=istiod --no-headers 2>/dev/null | grep -c Running)
+  ISTIOD=$(kubectl get pods -n istio-system -l app=istiod --no-headers 2>/dev/null | grep -c Running || true)

-  GW=$(kubectl get pods -n istio-system -l istio=ingressgateway --no-headers 2>/dev/null | grep -c Running)
+  GW=$(kubectl get pods -n istio-system -l istio=ingressgateway --no-headers 2>/dev/null | grep -c Running || true)

-    RUNNING=$(kubectl get pods -n istio-system -l app=$addon --no-headers 2>/dev/null | grep -c Running)
+    RUNNING=$(kubectl get pods -n istio-system -l app=$addon --no-headers 2>/dev/null | grep -c Running || true)

-  BOOKINFO_PODS=$(kubectl get pods -n bookinfo --no-headers 2>/dev/null | grep -c Running)
+  BOOKINFO_PODS=$(kubectl get pods -n bookinfo --no-headers 2>/dev/null | grep -c Running || true)

-  SIDECAR_PODS=$(kubectl get pods -n bookinfo -o jsonpath='{range .items[*]}{.spec.containers[*].name}{"\n"}{end}' 2>/dev/null | grep -c istio-proxy)
+  SIDECAR_PODS=$(kubectl get pods -n bookinfo -o jsonpath='{range .items[*]}{.spec.containers[*].name}{"\n"}{end}' 2>/dev/null | grep -c istio-proxy || true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/monitor.sh` around lines 238 - 275, The pipeline commands that
assign ISTIOD, GW, and RUNNING use "grep -c" which can return exit code 1 and,
under set -euo pipefail, abort the script; update each pipeline that ends with
"| grep -c Running" or "| grep -c Running)" (the assignments to ISTIOD, the
fallback GW assignment, the RUNNING inside the observability addons loop, and
the RUNNING inside the Bookinfo app loop) to append "|| true" so the pipeline
never fails on zero matches and your subsequent pass/fail logic can run.
♻️ Duplicate comments (1)
Labs/10-Istio/demo.sh (1)

60-63: ⚠️ Potential issue | 🟠 Major

Guard gateway lookups from set -e exits.

Line 60 and Line 62 can still abort the script if kubectl get svc returns non-zero (e.g., pending/missing LB), so access info may not print.

🔧 Proposed fix
-  GATEWAY_IP=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null)
+  GATEWAY_IP=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null || true)
   if [ -z "$GATEWAY_IP" ]; then
-    GATEWAY_IP=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' 2>/dev/null)
+    GATEWAY_IP=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' 2>/dev/null || true)
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo.sh` around lines 60 - 63, The kubectl calls that populate
GATEWAY_IP can cause the whole script to exit under set -e when the
service/LoadBalancer isn't ready; update the demo.sh logic around the GATEWAY_IP
assignment (the lines that run kubectl get svc istio-ingressgateway -n
istio-system ...) so they are protected from errexit—either temporarily disable
errexit around those commands (set +e / set -e) or ensure each kubectl
invocation is made safe by appending a non-failing fallback (e.g., using ||
true) and then test GATEWAY_IP for emptiness before continuing; keep the
variable name GATEWAY_IP and the exact kubectl get svc istio-ingressgateway
command when applying the change.
🧹 Nitpick comments (6)
Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh (2)

49-56: open_url implicitly exits with code 1 when no browser is found.

When neither open nor xdg-open is present, the last executed command is the elif command -v xdg-open test, which exits 1. All current call sites guard with || true, so this is not a problem today — but it is fragile if a future call site omits the guard. Adding an explicit return 0 makes the no-op intent self-documenting.

♻️ Proposed fix
 open_url() {
   local url="$1"
   if command -v open >/dev/null 2>&1; then
     open "${url}" >/dev/null 2>&1 || true
   elif command -v xdg-open >/dev/null 2>&1; then
     xdg-open "${url}" >/dev/null 2>&1 || true
   fi
+  return 0
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 49 - 56,
The open_url function can return a non-zero status when neither 'open' nor
'xdg-open' are found because the last executed test exits 1; update the open_url
function to explicitly return 0 in the no-op path so it always exits success
when it intentionally does nothing (ensure open_url has an explicit success
return for the branch where neither command exists and that all other branches
also end with an explicit successful return to make the behavior deterministic).

1-144: No EXIT trap — port-forwards are left running silently on any script failure.

With set -e active, any prerequisite check failure (lines 64–78), kubectl apply, or kubectl wait can exit the script before the user ever sees the "Stop port-forward" hint on line 131. Port-forwards already started are then orphaned with no record of their PIDs.

Consider registering an EXIT trap early so cleanup always runs:

♻️ Proposed fix – register a cleanup trap
+cleanup() {
+  if [[ -s "$PF_PID_FILE" ]]; then
+    echo "Stopping port-forwards..."
+    xargs kill < "$PF_PID_FILE" 2>/dev/null || true
+    rm -f "$PF_PID_FILE"
+  fi
+}
+trap cleanup EXIT
+
 mkdir -p "$RUNTIME_DIR" >/dev/null 2>&1 || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 1 - 144,
Add a cleanup function (e.g., cleanup_port_forwards) and register it with a trap
(trap cleanup_port_forwards EXIT INT TERM) early in the script (after
PF_PID_FILE is defined) so port-forwards are always terminated on script exit;
the cleanup should check PF_PID_FILE exists and non-empty, kill the PIDs listed
(xargs kill or loop kill) while ignoring already-dead PIDs, and remove the
PF_PID_FILE; ensure start_port_forward continues to append PIDs to PF_PID_FILE
so the trap can find them.
Labs/10-Istio/demo/01-basic-setup/demo.sh (3)

144-144: bash -lc unnecessarily sources login profile scripts.

A login shell (-l) loads /etc/profile, ~/.bash_profile, and similar files. For a simple curl loop this can cause unintended side-effects (proxy settings, PATH overrides, aliases). Use bash -c instead.

♻️ Proposed fix
-  nohup bash -lc 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 &
+  nohup bash -c 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 &
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` at line 144, Replace the login
shell invocation in the background traffic loop so it doesn't source profile
scripts: in the nohup line that writes to "$TRAFFIC_LOG_FILE" (the command
starting with nohup bash -lc 'while true; do curl -fsS ...'), change the shell
flag from -l to -c (i.e., use bash -c) to avoid loading login profiles and
unintended side-effects while keeping the same curl loop and redirection
behavior.

68-77: is_port_listening silently skips detection on systems without lsof.

The else return 1 fallback treats the port as not listening whenever lsof is absent (common on minimal Linux/Alpine/WSL images). If the port is actually occupied, kubectl port-forward will fail silently — the error lands in the log file rather than on the terminal. ss (iproute2, standard on modern Linux) or netstat are widely available alternatives.

♻️ Proposed fallback chain
 is_port_listening() {
   local port="$1"
   if command -v lsof >/dev/null 2>&1; then
     lsof -nP -iTCP:"${port}" -sTCP:LISTEN >/dev/null 2>&1
+  elif command -v ss >/dev/null 2>&1; then
+    ss -lnt "sport = :${port}" 2>/dev/null | grep -q ":${port}"
+  elif command -v netstat >/dev/null 2>&1; then
+    netstat -lnt 2>/dev/null | grep -q ":${port}[[:space:]]"
   else
     return 1
   fi
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 68 - 77, The
is_port_listening function currently returns 1 if lsof is missing which silently
skips detection; update is_port_listening to try alternatives: first use ss -ltn
"(or ss -nlt sport = :PORT)" to check TCP LISTEN, then fall back to netstat -tnl
if ss is unavailable, and only then return 1; keep the function signature and
exit codes the same (0 if listening, non‑zero otherwise) and ensure you
quote/expand the port variable (local port="$1") when invoking the chosen
command so kubectl port-forward failures are detected correctly.

104-122: start_port_forward assumes RUNTIME_DIR already exists.

Line 121 appends to $PF_PID_FILE without ensuring $RUNTIME_DIR exists. This is safe in the current call order (both cleanup_previous_demo and the explicit stop_previous_port_forwards at line 330 create the directory first), but the function is fragile if called in isolation.

🛡️ Proposed defensive guard
 start_port_forward() {
   local namespace="$1"
   local svc="$2"
   local local_port="$3"
   local remote_port="$4"
   local log_file="$RUNTIME_DIR/port-forward-${namespace}-${svc}-${local_port}.log"
+
+  mkdir -p "$RUNTIME_DIR" >/dev/null 2>&1 || true
 
   if is_port_listening "${local_port}"; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 104 - 122, The
start_port_forward function assumes RUNTIME_DIR exists before creating log files
and appending to PF_PID_FILE; make it defensive by ensuring the directory exists
(e.g., mkdir -p "$RUNTIME_DIR" and check for success) before computing log_file
or writing the PID, and handle failure (return non-zero and log an error) so
standalone calls to start_port_forward don't fail silently; update
start_port_forward (and any code that writes PF_PID_FILE) rather than relying on
cleanup_previous_demo / stop_previous_port_forwards to create RUNTIME_DIR.
Labs/10-Istio/scripts/01-install-istio.sh (1)

25-31: Scope cluster-wide pruning more defensively.

The current name-based matching (grep -i istio) can delete unrelated Istio resources on shared clusters when Helm annotations are missing or different.

🔧 Proposed safety guard
+  # Optional: only allow broad prune when explicitly enabled.
+  : "${ALLOW_BROAD_ISTIO_PRUNE:=false}"
+  if [[ "${ALLOW_BROAD_ISTIO_PRUNE}" != "true" ]]; then
+    print_warning "Skipping broad Istio resource prune (set ALLOW_BROAD_ISTIO_PRUNE=true to enable)."
+  else
   for crd in $(kubectl get crd -o name 2>/dev/null | grep istio.io || true); do
     release=$(kubectl get "$crd" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true)
     if [ "$release" != "istio-base" ]; then
       print_info "Removing $crd (not managed by Helm istio-base)"
       kubectl delete "$crd" --ignore-not-found 2>/dev/null || true
     fi
   done
+  fi

Also applies to: 33-39, 41-47, 67-73, 74-80

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/01-install-istio.sh` around lines 25 - 31, The loop
that prunes CRDs uses name-based matching and will delete resources lacking Helm
annotations; change the logic in the block that iterates over crd so it only
considers CRDs that explicitly carry the Helm release annotation metadata
(metadata.annotations["meta.helm.sh/release-name"]) — skip any CRD with no such
annotation instead of deleting by name; then delete only when release exists and
is not "istio-base" (use the existing release variable) and retain the current
--ignore-not-found safety flags; apply the same defensive change to the other
similar loops/blocks that use crd/name-based grep so they all require the Helm
annotation before deletion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Labs/10-Istio/monitor.sh`:
- Line 369: The unguarded clear call in main() can fail in non-interactive/CI
environments (with set -e), so wrap the clear invocation in a guard that only
runs it when a TTY/terminal is available; update main() so it checks for a
terminal (e.g., using -t on stdout/stderr or a non-empty TERM) before calling
clear, and skip clear otherwise to avoid script exit in automated runs.

In `@Labs/10-Istio/scripts/05-verify.sh`:
- Around line 10-11: The addon-health check loop currently omits Loki; update
the addons list used by the verification loop (e.g., the ADDONS or addons array)
to include "loki" and ensure the same health-check function (e.g.,
verify_addons, check_addon, or the loop that inspects kubectl/helm status)
validates Loki's Deployment/DaemonSet/StatefulSet and corresponding Service/Pod
readiness. Add "loki" to any switch/case or conditional that maps addon names to
their k8s resource checks so a failing Loki deployment will cause the script to
mark the addons step as failed.

In `@Labs/10-Istio/scripts/common.sh`:
- Around line 24-25: The print_error function currently uses echo which writes
to stdout; change its implementation so the formatted error string is written to
stderr (e.g., use echo -e "…"$RED…"$NC …" >&2 or use printf "…\n" >&2) so the
behavior matches the comment and other logging functions; update the function
body of print_error accordingly.

---

Outside diff comments:
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Around line 90-97: The rm invocation that removes PF_PID_FILE inside the
conditional block can fail under set -euo pipefail and currently lacks the same
defensive redirect-and-|| true pattern used elsewhere; update the rm call that
references PF_PID_FILE in the block (after the while loop that reads and kills
PIDs) to append >/dev/null 2>&1 || true so the removal failure won't abort the
script while keeping behavior consistent with other rm calls.
- Around line 342-345: The inline hints printed by the echo statements
incorrectly reference log files as ".port-forward-...log" and omit the
RUNTIME_DIR where start_port_forward actually writes logs; update the two echo
messages in demo.sh (the lines that output "Kiali: not reachable yet (...)" and
"Bookinfo: not reachable yet (...)") to reference the correct runtime path by
including $RUNTIME_DIR and removing the leading dot so they match the naming
used by start_port_forward (port-forward-<ns>-<svc>-<port>.log).
- Around line 228-231: Replace the broad namespace loop that labels every
non-system namespace (the for ... kubectl label namespace "$ns"
istio-injection=enabled --overwrite block) with explicit labeling of only the
namespaces needed for the demo (e.g., run kubectl label namespace default
istio-injection=enabled --overwrite and kubectl label namespace demo
istio-injection=enabled --overwrite) and update the cleanup logic to remove
those labels (use kubectl label namespace <name> istio-injection- to unset) so
you don’t affect unrelated user namespaces.
- Line 323: The cleanup hint in demo.sh currently suggests only running istioctl
uninstall --purge -y and kubectl delete ns istio-system demo but omits removing
the istio-injection label(s); update the echo string to instruct removing the
label (e.g., kubectl label namespace default istio-injection-) and mention
removing the label from any other namespaces you labeled in the loop so pods
stop receiving sidecars after Istio is uninstalled.
- Around line 265-294: The step comments in demo.sh are out of sequence (e.g.,
"Step 11: Creating demo VirtualService for reviews service...", then "Step 14:
Creating demo namespace...", "Step 15: Deploying Nginx...", "Step 16: Deploying
HTTPD...", "Step 17: Verifying demo deployment..."); update these echo step
numbers to be sequential to match the lab flow (renumber them to fill the
missing 10,12,13 as appropriate) so the messages that reference reviews-vs.yaml,
demo-namespace.yaml, nginx-demo.yaml, httpd-demo.yaml and the final verification
reflect a continuous sequence; ensure all "Step X:" prefixes in the apply/wait
blocks are renumbered consistently and any cross-references in the script or
docs are adjusted to the new numbering.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh`:
- Around line 18-22: The is_port_listening guard currently treats missing lsof
as "not listening", causing start_port_forward to launch duplicate kubectl
port-forwards; update is_port_listening to try ss and then netstat when lsof is
absent (checking for their commands and using appropriate flags to test LISTEN
on "${port}"), and if none are present log a clear warning/error stating the
port-check guard is disabled so the user knows duplicate port-forwards may
occur; ensure start_port_forward still respects the boolean result from
is_port_listening.
- Around line 111-112: The script currently uses touch "$PF_PID_FILE" which
preserves old PIDs; change this to truncate/create PF_PID_FILE before starting
port forwards so stale PIDs don't accumulate and get killed by xargs kill <
$PF_PID_FILE; update the block around the touch and start_port_forward calls
(referencing PF_PID_FILE and start_port_forward) to ensure PF_PID_FILE is
emptied/created fresh at runtime (use a truncating open/redirect) before any
PIDs are appended.
- Line 138: The cleanup echo currently prints relative paths ("kubectl delete -f
manifests/...") which fails if the user isn't in the script's working dir;
update the echo in demo.sh that prints the cleanup command so it uses absolute
paths (e.g. expand to "${WORK_DIR}/manifests/02‑istio‑routing.yaml" and
"${WORK_DIR}/manifests/01‑app.yaml" or compute the script dir via a variable and
use that) instead of bare "manifests/..." so the printed hint is copy‑paste safe
from any CWD.

In `@Labs/10-Istio/monitor.sh`:
- Around line 119-130: The addon health and service-summary blocks miss Loki;
update the monitoring checks in monitor.sh to include "loki" alongside
prometheus,grafana,jaeger,kiali: add "loki" to the for loop that sets STATUS
(used with kubectl get pods and variables STATUS/print_info/print_error), add
"loki" to the kubectl service label selector ('app in (...)') and to the
fallback grep pattern (the grep -E list), and apply the same change to the other
similar blocks referenced (the repeated kubectl get svc / grep sections). Ensure
all occurrences of the app list and grep patterns consistently include "loki" so
pod status, svc listing, and summary output cover Loki.
- Around line 373-393: The script accesses positional parameter $1 directly in
the three conditional blocks (the branches that call show_summary,
test_pipeline, etc.), which breaks under set -u when no args are provided;
update those conditionals to safely handle an absent argument by using a default
expansion (e.g., ${1:-}) or by first checking $# -eq 0 for interactive behavior,
and then compare that safe value against "full"/"-f"/"--full",
"test"/"-t"/"--test", and "summary"/"-s"/"--summary" so show_summary,
check_istio, check_pods, check_addons, check_bookinfo, check_traffic,
check_mtls, and test_pipeline are only invoked when a valid argument is present.
- Around line 238-275: The pipeline commands that assign ISTIOD, GW, and RUNNING
use "grep -c" which can return exit code 1 and, under set -euo pipefail, abort
the script; update each pipeline that ends with "| grep -c Running" or "| grep
-c Running)" (the assignments to ISTIOD, the fallback GW assignment, the RUNNING
inside the observability addons loop, and the RUNNING inside the Bookinfo app
loop) to append "|| true" so the pipeline never fails on zero matches and your
subsequent pass/fail logic can run.

In `@Labs/10-Istio/scripts/05-verify.sh`:
- Line 19: Several kubectl command substitutions (e.g., ISTIOD_STATUS,
GW_STATUS, STATUS, CONTAINERS, CRONJOB, PROM_POD, METRIC_COUNT) are unguarded
and will cause the script to exit under set -e if kubectl fails; update each
substitution to append a fallback "|| true" so the command returns success on
failure (for example change the assignment for ISTIOD_STATUS, both GW_STATUS
occurrences, the loop STATUS assignments, CONTAINERS, CRONJOB, PROM_POD, and the
METRIC_COUNT pipeline involving kubectl exec) to ensure the script continues to
the final pass/fail summary even when kubectl errors occur.

---

Duplicate comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 60-63: The kubectl calls that populate GATEWAY_IP can cause the
whole script to exit under set -e when the service/LoadBalancer isn't ready;
update the demo.sh logic around the GATEWAY_IP assignment (the lines that run
kubectl get svc istio-ingressgateway -n istio-system ...) so they are protected
from errexit—either temporarily disable errexit around those commands (set +e /
set -e) or ensure each kubectl invocation is made safe by appending a
non-failing fallback (e.g., using || true) and then test GATEWAY_IP for
emptiness before continuing; keep the variable name GATEWAY_IP and the exact
kubectl get svc istio-ingressgateway command when applying the change.

---

Nitpick comments:
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Line 144: Replace the login shell invocation in the background traffic loop so
it doesn't source profile scripts: in the nohup line that writes to
"$TRAFFIC_LOG_FILE" (the command starting with nohup bash -lc 'while true; do
curl -fsS ...'), change the shell flag from -l to -c (i.e., use bash -c) to
avoid loading login profiles and unintended side-effects while keeping the same
curl loop and redirection behavior.
- Around line 68-77: The is_port_listening function currently returns 1 if lsof
is missing which silently skips detection; update is_port_listening to try
alternatives: first use ss -ltn "(or ss -nlt sport = :PORT)" to check TCP
LISTEN, then fall back to netstat -tnl if ss is unavailable, and only then
return 1; keep the function signature and exit codes the same (0 if listening,
non‑zero otherwise) and ensure you quote/expand the port variable (local
port="$1") when invoking the chosen command so kubectl port-forward failures are
detected correctly.
- Around line 104-122: The start_port_forward function assumes RUNTIME_DIR
exists before creating log files and appending to PF_PID_FILE; make it defensive
by ensuring the directory exists (e.g., mkdir -p "$RUNTIME_DIR" and check for
success) before computing log_file or writing the PID, and handle failure
(return non-zero and log an error) so standalone calls to start_port_forward
don't fail silently; update start_port_forward (and any code that writes
PF_PID_FILE) rather than relying on cleanup_previous_demo /
stop_previous_port_forwards to create RUNTIME_DIR.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh`:
- Around line 49-56: The open_url function can return a non-zero status when
neither 'open' nor 'xdg-open' are found because the last executed test exits 1;
update the open_url function to explicitly return 0 in the no-op path so it
always exits success when it intentionally does nothing (ensure open_url has an
explicit success return for the branch where neither command exists and that all
other branches also end with an explicit successful return to make the behavior
deterministic).
- Around line 1-144: Add a cleanup function (e.g., cleanup_port_forwards) and
register it with a trap (trap cleanup_port_forwards EXIT INT TERM) early in the
script (after PF_PID_FILE is defined) so port-forwards are always terminated on
script exit; the cleanup should check PF_PID_FILE exists and non-empty, kill the
PIDs listed (xargs kill or loop kill) while ignoring already-dead PIDs, and
remove the PF_PID_FILE; ensure start_port_forward continues to append PIDs to
PF_PID_FILE so the trap can find them.

In `@Labs/10-Istio/scripts/01-install-istio.sh`:
- Around line 25-31: The loop that prunes CRDs uses name-based matching and will
delete resources lacking Helm annotations; change the logic in the block that
iterates over crd so it only considers CRDs that explicitly carry the Helm
release annotation metadata (metadata.annotations["meta.helm.sh/release-name"])
— skip any CRD with no such annotation instead of deleting by name; then delete
only when release exists and is not "istio-base" (use the existing release
variable) and retain the current --ignore-not-found safety flags; apply the same
defensive change to the other similar loops/blocks that use crd/name-based grep
so they all require the Helm annotation before deletion.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac2eeaa and dca7c98.

📒 Files selected for processing (17)
  • Labs/10-Istio/README.md
  • Labs/10-Istio/demo.sh
  • Labs/10-Istio/demo/01-basic-setup/demo.sh
  • Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh
  • Labs/10-Istio/manifests/addons/grafana.yaml
  • Labs/10-Istio/manifests/addons/kiali.yaml
  • Labs/10-Istio/manifests/addons/loki.yaml
  • Labs/10-Istio/manifests/addons/prometheus.yaml
  • Labs/10-Istio/manifests/bookinfo-gateway.yaml
  • Labs/10-Istio/manifests/ingress.yaml
  • Labs/10-Istio/monitor.sh
  • Labs/10-Istio/scripts/01-install-istio.sh
  • Labs/10-Istio/scripts/02-install-addons.sh
  • Labs/10-Istio/scripts/03-deploy-bookinfo.sh
  • Labs/10-Istio/scripts/04-traffic-generator.sh
  • Labs/10-Istio/scripts/05-verify.sh
  • Labs/10-Istio/scripts/common.sh
💤 Files with no reviewable changes (1)
  • Labs/10-Istio/manifests/ingress.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • Labs/10-Istio/manifests/addons/grafana.yaml
  • Labs/10-Istio/scripts/04-traffic-generator.sh
  • Labs/10-Istio/manifests/addons/loki.yaml
  • Labs/10-Istio/scripts/03-deploy-bookinfo.sh

Comment thread Labs/10-Istio/monitor.sh
# Entry point: run full/test/summary report or start interactive menu.
# Args: $@ - Optional mode: full, test, summary, or none for menu.
main() {
clear
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the file size and context
wc -l Labs/10-Istio/monitor.sh

Repository: nirgeier/KubernetesLabs

Length of output: 95


🏁 Script executed:

# Examine the script around line 369 and check for 'set -e'
sed -n '1,30p' Labs/10-Istio/monitor.sh
sed -n '360,375p' Labs/10-Istio/monitor.sh

Repository: nirgeier/KubernetesLabs

Length of output: 1395


🏁 Script executed:

# Search for set -e in the script
grep -n 'set -e' Labs/10-Istio/monitor.sh

Repository: nirgeier/KubernetesLabs

Length of output: 86


🏁 Script executed:

# Get exact context around line 369
sed -n '365,375p' Labs/10-Istio/monitor.sh

Repository: nirgeier/KubernetesLabs

Length of output: 395


🏁 Script executed:

# Check if clear is called in any conditional or if-statements
grep -n -B2 -A2 'clear' Labs/10-Istio/monitor.sh

Repository: nirgeier/KubernetesLabs

Length of output: 333


🏁 Script executed:

# Check the main function execution flow to understand when clear is called
sed -n '367,400p' Labs/10-Istio/monitor.sh

Repository: nirgeier/KubernetesLabs

Length of output: 769


Guard clear at line 369 to prevent script failure in non-interactive environments.

The script has set -e enabled (line 2) and calls clear unconditionally in the main() function (line 369). When invoked with non-interactive modes (./monitor.sh full, ./monitor.sh test, ./monitor.sh summary) in CI environments or containers where TERM is unset, clear fails. With set -e active, this failure causes the entire script to exit before any monitoring checks run, making the script non-functional in automated/non-TTY contexts.

Proposed fix
-  clear
+  if [ -t 1 ] && [ -n "${TERM:-}" ]; then
+    clear
+  fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
clear
if [ -t 1 ] && [ -n "${TERM:-}" ]; then
clear
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/monitor.sh` at line 369, The unguarded clear call in main() can
fail in non-interactive/CI environments (with set -e), so wrap the clear
invocation in a guard that only runs it when a TTY/terminal is available; update
main() so it checks for a terminal (e.g., using -t on stdout/stderr or a
non-empty TERM) before calling clear, and skip clear otherwise to avoid script
exit in automated runs.

Comment on lines +10 to +11
# Verify Istio control plane, ingress gateway, addons, Bookinfo, traffic
# generator, and Prometheus metrics. Prints pass/fail summary.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include Loki in the addon health checks.

The verification scope now includes observability addons, but Loki is still not validated in the addon loop. A failed Loki deployment can currently pass verification.

🔧 Proposed fix
-  for addon in prometheus grafana kiali jaeger; do
+  for addon in prometheus grafana kiali jaeger; do
     STATUS=$(kubectl get pods -n istio-system -l app=$addon -o jsonpath='{.items[0].status.phase}' 2>/dev/null)
     if [ "$STATUS" = "Running" ]; then
       print_success "$addon: Running"
     else
       print_error "$addon: ${STATUS:-Not Found}"
       errors=$((errors + 1))
     fi
   done
+
+  LOKI_STATUS=$(kubectl get pods -n istio-system -l app.kubernetes.io/name=loki -o jsonpath='{.items[0].status.phase}' 2>/dev/null || true)
+  if [ "$LOKI_STATUS" = "Running" ]; then
+    print_success "loki: Running"
+  else
+    print_error "loki: ${LOKI_STATUS:-Not Found}"
+    errors=$((errors + 1))
+  fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/05-verify.sh` around lines 10 - 11, The addon-health
check loop currently omits Loki; update the addons list used by the verification
loop (e.g., the ADDONS or addons array) to include "loki" and ensure the same
health-check function (e.g., verify_addons, check_addon, or the loop that
inspects kubectl/helm status) validates Loki's Deployment/DaemonSet/StatefulSet
and corresponding Service/Pod readiness. Add "loki" to any switch/case or
conditional that maps addon names to their k8s resource checks so a failing Loki
deployment will cause the script to mark the addons step as failed.

Comment on lines +24 to 25
# Print an error message to stderr.
print_error() { echo -e "${RED}[ERROR]${NC} $1"; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "common.sh" -path "*/Istio/*" 2>/dev/null | head -20

Repository: nirgeier/KubernetesLabs

Length of output: 49


🏁 Script executed:

if [ -f "Labs/10-Istio/scripts/common.sh" ]; then
  echo "=== File found at Labs/10-Istio/scripts/common.sh ===" 
  cat -n "Labs/10-Istio/scripts/common.sh" | head -30
else
  echo "File not found at expected path"
fi

Repository: nirgeier/KubernetesLabs

Length of output: 1203


🏁 Script executed:

# Search for all usages of print_error in the repository
rg "print_error" --type sh -B 2 -A 2

Repository: nirgeier/KubernetesLabs

Length of output: 23887


print_error outputs to stdout instead of stderr.

The function comment declares it should print to stderr, but the implementation uses echo without redirection, which outputs to stdout. This should be fixed to align the implementation with the documented behavior:

Fix
-print_error() { echo -e "${RED}[ERROR]${NC} $1"; }
+print_error() { echo -e "${RED}[ERROR]${NC} $1" >&2; }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Print an error message to stderr.
print_error() { echo -e "${RED}[ERROR]${NC} $1"; }
# Print an error message to stderr.
print_error() { echo -e "${RED}[ERROR]${NC} $1" >&2; }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/common.sh` around lines 24 - 25, The print_error
function currently uses echo which writes to stdout; change its implementation
so the formatted error string is written to stderr (e.g., use echo -e
"…"$RED…"$NC …" >&2 or use printf "…\n" >&2) so the behavior matches the comment
and other logging functions; update the function body of print_error
accordingly.

…ripts will run on both Mac and Ubuntu.

Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
Labs/10-Istio/scripts/05-verify.sh (1)

61-65: ⚠️ Potential issue | 🟠 Major

Missing sidecar should fail verification, not just warn.

A pod running without istio-proxy currently does not increment errors, so the final status can still pass with broken injection.

🔧 Proposed fix
       if echo "$CONTAINERS" | grep -q "istio-proxy"; then
         print_success "$app: Running (sidecar injected)"
       else
-        print_warning "$app: Running (NO sidecar!)"
+        print_error "$app: Running (NO sidecar!)"
+        errors=$((errors + 1))
       fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/05-verify.sh` around lines 61 - 65, The verification
should treat missing Istio sidecars as an error, not just a warning: in the
block that checks CONTAINERS for "istio-proxy" (the if/else using
print_success/print_warning), change the else branch to increment the global
error counter (errors) and call print_error (or at minimum still call
print_warning but also do errors=$((errors+1))) so missing sidecars are counted
toward failure; update the else branch where print_warning "$app: Running (NO
sidecar!)" is emitted to also increment errors so the final status fails when
injection is broken.
Labs/10-Istio/demo/01-basic-setup/demo.sh (1)

378-378: ⚠️ Potential issue | 🟡 Minor

$(cat $TRAFFIC_PID_FILE ...) is eagerly expanded at echo time, baking in a stale PID.

The resulting printed command contains a literal integer (e.g., kill "12345") rather than a deferred file reference. If copied by the user after the process is long gone — or if the traffic generator was never started — the command is either stale or degrades to kill "" (which exits non-zero).

🐛 Proposed fix — use single quotes to defer evaluation
-echo "To stop them: xargs kill < $PF_PID_FILE; kill \"$(cat $TRAFFIC_PID_FILE 2>/dev/null)\""
+echo "To stop them: xargs kill < $PF_PID_FILE; [ -f \"$TRAFFIC_PID_FILE\" ] && kill \"\$(cat \"$TRAFFIC_PID_FILE\")\""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` at line 378, The echo currently
expands $(cat $TRAFFIC_PID_FILE 2>/dev/null) at echo-time, producing a stale or
empty PID; change the echo so the command substitution is not evaluated (e.g.,
print the literal command by using single quotes or escaping the $/() sequence)
so the printed instruction references $PF_PID_FILE and $TRAFFIC_PID_FILE
verbatim; update the echo that contains PF_PID_FILE and TRAFFIC_PID_FILE to
quote/escape the substitution to defer evaluation until the user runs the
printed command.
Labs/10-Istio/monitor.sh (2)

373-394: ⚠️ Potential issue | 🟡 Minor

Handle unknown CLI modes explicitly instead of entering interactive mode.

For non-empty invalid args, current flow drops into read prompts, which can hang automation unexpectedly.

Proposed fix
   if [ "$mode" = "summary" ] || [ "$mode" = "-s" ] || [ "$mode" = "--summary" ]; then
     show_summary
     exit 0
   fi
+
+  if [ -n "$mode" ]; then
+    print_error "Unknown mode: $mode"
+    echo "Usage: $0 [full|test|summary]"
+    exit 1
+  fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/monitor.sh` around lines 373 - 394, The script currently treats
any unrecognized non-empty CLI argument as empty and falls through into
interactive prompts; update the top-level mode handling (the local variable
"mode" and the branches that call show_summary, test_pipeline, etc.) to
explicitly detect invalid non-empty modes: after the existing "full", "test",
and "summary" checks add an else-if that checks if "$mode" is non-empty and not
one of the recognized flags, then print a brief error/usage message (e.g.,
"Invalid mode: $mode. Valid modes: full,test,summary") to stderr and exit with a
non-zero code to prevent entering interactive read prompts. Ensure the
conditional references the same "mode" variable and returns non-zero so
automation won't hang.

199-201: ⚠️ Potential issue | 🟠 Major

CronJob probe failures will terminate the script under strict mode.

The script runs with set -euo pipefail (line 2), which causes immediate exit on any command failure. Lines 199 and 301 execute kubectl get cronjob without a fallback mechanism. Although stderr is suppressed via 2>/dev/null, the exit code is not—so when the CronJob doesn't exist, the command fails with a non-zero exit code, terminating the script before the fallback logic at lines 203 and 305 can execute.

Add || true to both commands to allow the script to continue and properly handle missing CronJobs:

Proposed fix
-  CRONJOB=$(kubectl get cronjob -n traffic-gen traffic-generator -o jsonpath='{.spec.schedule}' 2>/dev/null)
+  CRONJOB=$(kubectl get cronjob -n traffic-gen traffic-generator -o jsonpath='{.spec.schedule}' 2>/dev/null || true)
@@
-  TG=$(kubectl get cronjob -n traffic-gen traffic-generator --no-headers 2>/dev/null | wc -l | tr -d ' ')
+  TG=$(kubectl get cronjob -n traffic-gen traffic-generator --no-headers 2>/dev/null | wc -l | tr -d ' ' || true)
+  TG="${TG:-0}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/monitor.sh` around lines 199 - 201, The CronJob probe commands
that populate CRONJOB and the later cronjob lookup for LAST_JOB call kubectl and
can exit non-zero under set -euo pipefail, which will abort the script when the
CronJob is absent; update the two kubectl invocations that assign CRONJOB and
the later cronjob check to append "|| true" so the command returns success on
missing resources and the subsequent fallback logic (variables CRONJOB and
LAST_JOB and the surrounding if checks) can handle absence gracefully.
♻️ Duplicate comments (2)
Labs/10-Istio/scripts/05-verify.sh (1)

44-52: ⚠️ Potential issue | 🟠 Major

Loki is still omitted from addon health verification.

The addon loop still excludes Loki, so this step can report healthy while Loki failed to deploy.

🔧 Proposed fix
   for addon in prometheus grafana kiali jaeger; do
     STATUS=$(kubectl get pods -n istio-system -l app=$addon -o jsonpath='{.items[0].status.phase}' 2>/dev/null) || true
     if [ "$STATUS" = "Running" ]; then
       print_success "$addon: Running"
     else
       print_error "$addon: ${STATUS:-Not Found}"
       errors=$((errors + 1))
     fi
   done
+
+  LOKI_STATUS=$(kubectl get pods -n istio-system -l app.kubernetes.io/name=loki -o jsonpath='{.items[0].status.phase}' 2>/dev/null) || true
+  if [ "$LOKI_STATUS" = "Running" ]; then
+    print_success "loki: Running"
+  else
+    print_error "loki: ${LOKI_STATUS:-Not Found}"
+    errors=$((errors + 1))
+  fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/05-verify.sh` around lines 44 - 52, The health-check
loop for addons (for addon in prometheus grafana kiali jaeger) omits Loki, so
update the loop to include "loki" (e.g., for addon in prometheus grafana kiali
jaeger loki) so that the existing check that uses STATUS, print_success,
print_error and increments errors covers Loki as well; ensure the same jsonpath
and namespace logic is used for the new "loki" entry.
Labs/10-Istio/monitor.sh (1)

369-369: ⚠️ Potential issue | 🟠 Major

Guard clear in main() to avoid non-interactive aborts.

With strict mode enabled, Line 369 can fail in CI/non-TTY environments (e.g., missing TERM) and exit before full/test/summary execution.

Proposed fix
-  clear
+  if [ -t 1 ] && [ -n "${TERM:-}" ]; then
+    clear
+  fi
@@
-    clear
+    if [ -t 1 ] && [ -n "${TERM:-}" ]; then
+      clear
+    fi
#!/bin/bash
# Verify strict mode + unguarded clear call sites.
rg -n -C2 'set -euo pipefail|^\s*clear\s*$' Labs/10-Istio/monitor.sh
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/monitor.sh` at line 369, The unguarded clear call in main() can
fail in non-interactive/CI environments and abort the script; wrap the clear
invocation (the plain clear call in main()) with a condition that ensures it
only runs in an interactive TTY (e.g., test -t 1) and/or when TERM is set, so
the script continues to execute full/test/summary in CI; update main() to
perform this guard check before calling clear.
🧹 Nitpick comments (4)
Labs/10-Istio/demo/01-basic-setup/demo.sh (3)

158-167: Inconsistency: start_traffic_generator calls nohup unconditionally, unlike start_port_forward.

start_port_forward (lines 137–141) guards nohup with command -v nohup and falls back to a bare background process. start_traffic_generator does not, so a missing nohup would abort the script under set -euo pipefail.

♻️ Proposed fix
-  nohup bash -c 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 &
+  if command -v nohup >/dev/null 2>&1; then
+    nohup bash -c 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 &
+  else
+    bash -c 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 &
+  fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 158 - 167,
start_traffic_generator currently uses nohup unconditionally which will cause
the script to abort under set -euo pipefail if nohup is missing; modify
start_traffic_generator to mirror start_port_forward by checking for nohup via
command -v nohup and using nohup ... & only if present, otherwise fall back to
launching the same bash -c loop in the background without nohup, keeping the
same redirection to TRAFFIC_LOG_FILE and saving the background PID to
TRAFFIC_PID_FILE; reference the start_traffic_generator function and the
variables TRAFFIC_LOG_FILE and TRAFFIC_PID_FILE when making the change.

100-104: Optional: cache TRAFFIC_PID_FILE contents in a local variable to avoid the double cat.

cat is called twice on the same file within the TOCTOU window of the kill -0 check and the kill itself.

♻️ Proposed refactor
   if [ -f "$TRAFFIC_PID_FILE" ]; then
-    if kill -0 "$(cat "$TRAFFIC_PID_FILE" 2>/dev/null)" 2>/dev/null; then
-      kill "$(cat "$TRAFFIC_PID_FILE" 2>/dev/null)" 2>/dev/null || true
+    local traffic_pid
+    traffic_pid=$(cat "$TRAFFIC_PID_FILE" 2>/dev/null)
+    if kill -0 "${traffic_pid}" 2>/dev/null; then
+      kill "${traffic_pid}" 2>/dev/null || true
     fi
     rm -f "$TRAFFIC_PID_FILE" "$TRAFFIC_LOG_FILE" >/dev/null 2>&1 || true
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 100 - 104, The script
uses cat twice on "$TRAFFIC_PID_FILE" between the kill -0 check and the kill
call, creating a TOCTOU race and redundant IO; cache the PID by reading it once
into a local variable (e.g., pid) before the kill -0 test, use that variable for
both the kill -0 and kill calls, and still handle failures gracefully and remove
TRAFFIC_PID_FILE and TRAFFIC_LOG_FILE as before; update the block that
references TRAFFIC_PID_FILE and TRAFFIC_LOG_FILE accordingly.

246-251: Redundant label: demo namespace labeled at Step 05 before it exists, then again at Step 11.

Line 249 silently no-ops (namespace absent), while line 291 is the real effective label application. Drop line 249's demo labeling to reduce noise.

♻️ Proposed fix
 echo "Step 05: Enabling Istio injection in default and demo namespaces..."
 kubectl label namespace default istio-injection=enabled --overwrite
-kubectl label namespace demo istio-injection=enabled --overwrite 2>/dev/null || true
 echo "Injection enabled."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 246 - 251, Remove the
redundant labeling of the demo namespace in demo.sh: delete the kubectl command
that runs "kubectl label namespace demo istio-injection=enabled --overwrite
2>/dev/null || true" in the Step 05 block (the earlier kubectl label call for
namespace demo), since the demo namespace is created later and the actual label
is applied in Step 11; keep the default namespace labeling and the later demo
labeling in Step 11.
Labs/10-Istio/monitor.sh (1)

25-63: Consider sourcing shared helpers from scripts/common.sh to avoid drift.

print_header() and check_prerequisites() are duplicated here while similar implementations already exist in Labs/10-Istio/scripts/common.sh (including broader prereq checks). Consolidating would reduce maintenance divergence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/monitor.sh` around lines 25 - 63, Replace the duplicated helper
implementations by sourcing the shared common helpers and removing the local
defs: add a single source of the shared common.sh at the top of this script and
delete the local print_header() and check_prerequisites() implementations;
ensure the script relies on the shared print_header and check_prerequisites
symbols, adjust any call sites if the shared check_prerequisites has a broader
signature, and run a quick smoke test (and shellcheck) to confirm no
regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 77-80: In the for-loop that builds HOSTS_NEEDED, replace the plain
grep check that can match regexes and commented lines with a fixed-string,
non-commented-line check; for example, change the condition that uses grep -q
"$host" /etc/hosts to an awk invocation that ignores lines starting with # and
searches for the literal host (e.g., use awk -v h="$host" '!/^[[:space:]]*#/ &&
index($0,h){exit 0} END{exit 1}' /etc/hosts) so HOSTS_NEEDED is only left
unchanged when an actual, non-commented hosts entry for the literal host exists.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Around line 361-366: The single short sleep (sleep 2) before probing
port-forward readiness is unreliable; replace it with a lightweight retry loop
that attempts the two curl probes (the Kiali check to
http://127.0.0.1:20001/kiali/ and the Bookinfo check to
http://127.0.0.1:8080/productpage) a few times with small waits (e.g., 5–10s)
and a max retry count, exiting the loop early when each service becomes
reachable and printing the existing "reachable" or "not reachable yet (check
${RUNTIME_DIR}/...log)" messages; update the block in demo.sh that currently
contains sleep 2 and the curl checks to use this retry logic so port-forward has
time to bind before giving up.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh`:
- Around line 18-21: When backgrounding kubectl port-forward, don't write $! to
PF_PID_FILE unconditionally; capture the background PID (pid=$!) then verify the
process is still the port-forward (alive) before persisting it by using a
liveliness check (e.g., kill -0 "$pid" >/dev/null 2>&1 or a brief sleep+check)
and only append to PF_PID_FILE if the check succeeds; apply the same change to
both places where port-forward PIDs are recorded and ensure the cleanup loop
that reads PF_PID_FILE still skips empty lines and only kills recorded PIDs (as
it already does).

In `@Labs/10-Istio/scripts/01-install-istio.sh`:
- Around line 25-50: The current guard '[ -n "$release" ] || continue' skips
resources that lack a Helm release annotation, leaving stale Istio objects;
update each cleanup loop (the for crd in $(kubectl get crd ...), for webhook in
$(kubectl get validatingwebhookconfiguration ...), for webhook in $(kubectl get
mutatingwebhookconfiguration ...), and the clusterrole/clusterrolebinding
cleanup loops) to delete resources when the release is empty or not equal to
"istio-base" (e.g. replace the guard with a conditional like: if [ -z "$release"
] || [ "$release" != "istio-base" ]; then print_info ...; kubectl delete ...;
fi) so unannotated conflicting resources are removed.

In `@Labs/10-Istio/scripts/05-verify.sh`:
- Around line 85-86: The current check stores METRIC_COUNT by grepping for
`"value"` in the Prometheus HTTP response (involving PROM_POD and the query
count(istio_requests_total)) which yields false positives even when the metric
value is 0; update the logic to parse the JSON response and examine the actual
numeric result instead of counting `"value"` strings — call the Prometheus API
in the same block that defines PROM_POD, capture the raw JSON, extract the
numeric value from .data.result[0].value[1] (or treat result length 0 as zero),
set METRIC_COUNT to that numeric value (or 0) and then test if METRIC_COUNT -gt
0 to decide the if-branch. Ensure you use the same kubectl exec to fetch the
JSON and avoid relying on grep/wc for presence checks.

---

Outside diff comments:
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Line 378: The echo currently expands $(cat $TRAFFIC_PID_FILE 2>/dev/null) at
echo-time, producing a stale or empty PID; change the echo so the command
substitution is not evaluated (e.g., print the literal command by using single
quotes or escaping the $/() sequence) so the printed instruction references
$PF_PID_FILE and $TRAFFIC_PID_FILE verbatim; update the echo that contains
PF_PID_FILE and TRAFFIC_PID_FILE to quote/escape the substitution to defer
evaluation until the user runs the printed command.

In `@Labs/10-Istio/monitor.sh`:
- Around line 373-394: The script currently treats any unrecognized non-empty
CLI argument as empty and falls through into interactive prompts; update the
top-level mode handling (the local variable "mode" and the branches that call
show_summary, test_pipeline, etc.) to explicitly detect invalid non-empty modes:
after the existing "full", "test", and "summary" checks add an else-if that
checks if "$mode" is non-empty and not one of the recognized flags, then print a
brief error/usage message (e.g., "Invalid mode: $mode. Valid modes:
full,test,summary") to stderr and exit with a non-zero code to prevent entering
interactive read prompts. Ensure the conditional references the same "mode"
variable and returns non-zero so automation won't hang.
- Around line 199-201: The CronJob probe commands that populate CRONJOB and the
later cronjob lookup for LAST_JOB call kubectl and can exit non-zero under set
-euo pipefail, which will abort the script when the CronJob is absent; update
the two kubectl invocations that assign CRONJOB and the later cronjob check to
append "|| true" so the command returns success on missing resources and the
subsequent fallback logic (variables CRONJOB and LAST_JOB and the surrounding if
checks) can handle absence gracefully.

In `@Labs/10-Istio/scripts/05-verify.sh`:
- Around line 61-65: The verification should treat missing Istio sidecars as an
error, not just a warning: in the block that checks CONTAINERS for "istio-proxy"
(the if/else using print_success/print_warning), change the else branch to
increment the global error counter (errors) and call print_error (or at minimum
still call print_warning but also do errors=$((errors+1))) so missing sidecars
are counted toward failure; update the else branch where print_warning "$app:
Running (NO sidecar!)" is emitted to also increment errors so the final status
fails when injection is broken.

---

Duplicate comments:
In `@Labs/10-Istio/monitor.sh`:
- Line 369: The unguarded clear call in main() can fail in non-interactive/CI
environments and abort the script; wrap the clear invocation (the plain clear
call in main()) with a condition that ensures it only runs in an interactive TTY
(e.g., test -t 1) and/or when TERM is set, so the script continues to execute
full/test/summary in CI; update main() to perform this guard check before
calling clear.

In `@Labs/10-Istio/scripts/05-verify.sh`:
- Around line 44-52: The health-check loop for addons (for addon in prometheus
grafana kiali jaeger) omits Loki, so update the loop to include "loki" (e.g.,
for addon in prometheus grafana kiali jaeger loki) so that the existing check
that uses STATUS, print_success, print_error and increments errors covers Loki
as well; ensure the same jsonpath and namespace logic is used for the new "loki"
entry.

---

Nitpick comments:
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Around line 158-167: start_traffic_generator currently uses nohup
unconditionally which will cause the script to abort under set -euo pipefail if
nohup is missing; modify start_traffic_generator to mirror start_port_forward by
checking for nohup via command -v nohup and using nohup ... & only if present,
otherwise fall back to launching the same bash -c loop in the background without
nohup, keeping the same redirection to TRAFFIC_LOG_FILE and saving the
background PID to TRAFFIC_PID_FILE; reference the start_traffic_generator
function and the variables TRAFFIC_LOG_FILE and TRAFFIC_PID_FILE when making the
change.
- Around line 100-104: The script uses cat twice on "$TRAFFIC_PID_FILE" between
the kill -0 check and the kill call, creating a TOCTOU race and redundant IO;
cache the PID by reading it once into a local variable (e.g., pid) before the
kill -0 test, use that variable for both the kill -0 and kill calls, and still
handle failures gracefully and remove TRAFFIC_PID_FILE and TRAFFIC_LOG_FILE as
before; update the block that references TRAFFIC_PID_FILE and TRAFFIC_LOG_FILE
accordingly.
- Around line 246-251: Remove the redundant labeling of the demo namespace in
demo.sh: delete the kubectl command that runs "kubectl label namespace demo
istio-injection=enabled --overwrite 2>/dev/null || true" in the Step 05 block
(the earlier kubectl label call for namespace demo), since the demo namespace is
created later and the actual label is applied in Step 11; keep the default
namespace labeling and the later demo labeling in Step 11.

In `@Labs/10-Istio/monitor.sh`:
- Around line 25-63: Replace the duplicated helper implementations by sourcing
the shared common helpers and removing the local defs: add a single source of
the shared common.sh at the top of this script and delete the local
print_header() and check_prerequisites() implementations; ensure the script
relies on the shared print_header and check_prerequisites symbols, adjust any
call sites if the shared check_prerequisites has a broader signature, and run a
quick smoke test (and shellcheck) to confirm no regressions.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dca7c98 and c788987.

📒 Files selected for processing (6)
  • Labs/10-Istio/demo.sh
  • Labs/10-Istio/demo/01-basic-setup/demo.sh
  • Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh
  • Labs/10-Istio/monitor.sh
  • Labs/10-Istio/scripts/01-install-istio.sh
  • Labs/10-Istio/scripts/05-verify.sh

Comment thread Labs/10-Istio/demo.sh
Comment on lines +77 to 80
for host in kiali.local grafana.local jaeger.local prometheus.local loki.local bookinfo.local; do
if ! grep -q "$host" /etc/hosts 2>/dev/null; then
HOSTS_NEEDED="$HOSTS_NEEDED $host"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n Labs/10-Istio/demo.sh | sed -n '75,85p'

Repository: nirgeier/KubernetesLabs

Length of output: 511


Use fixed-string matching for /etc/hosts host checks to avoid regex interpretation and commented entries.

The current grep -q "$host" has two issues:

  1. The . in .local is treated as a regex wildcard matching any character, potentially causing false positives
  2. More importantly, commented entries like # 127.0.0.1 kiali.local would incorrectly match, causing the script to skip adding the host even though it's not actually configured
🔧 Proposed fix
   for host in kiali.local grafana.local jaeger.local prometheus.local loki.local bookinfo.local; do
-    if ! grep -q "$host" /etc/hosts 2>/dev/null; then
+    if ! grep -vE '^[[:space:]]*#' /etc/hosts 2>/dev/null | grep -qwF "$host"; then
       HOSTS_NEEDED="$HOSTS_NEEDED $host"
     fi
   done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for host in kiali.local grafana.local jaeger.local prometheus.local loki.local bookinfo.local; do
if ! grep -q "$host" /etc/hosts 2>/dev/null; then
HOSTS_NEEDED="$HOSTS_NEEDED $host"
fi
for host in kiali.local grafana.local jaeger.local prometheus.local loki.local bookinfo.local; do
if ! grep -vE '^[[:space:]]*#' /etc/hosts 2>/dev/null | grep -qwF "$host"; then
HOSTS_NEEDED="$HOSTS_NEEDED $host"
fi
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo.sh` around lines 77 - 80, In the for-loop that builds
HOSTS_NEEDED, replace the plain grep check that can match regexes and commented
lines with a fixed-string, non-commented-line check; for example, change the
condition that uses grep -q "$host" /etc/hosts to an awk invocation that ignores
lines starting with # and searches for the literal host (e.g., use awk -v
h="$host" '!/^[[:space:]]*#/ && index($0,h){exit 0} END{exit 1}' /etc/hosts) so
HOSTS_NEEDED is only left unchanged when an actual, non-commented hosts entry
for the literal host exists.

Comment on lines 361 to 366
sleep 2

if command -v curl >/dev/null 2>&1; then
curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check .port-forward-istio-system-kiali-20001.log)"
curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check .port-forward-istio-system-istio-ingressgateway-8080.log)"
curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-kiali-20001.log)"
curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-istio-ingressgateway-8080.log)"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

sleep 2 is likely too short before probing port-forward readiness.

kubectl port-forward must complete its API-server handshake and bind the local socket; 2 seconds is frequently insufficient on slow clusters or in CI, producing false "not reachable yet" messages. A short retry loop or a longer sleep would improve reliability.

🛡️ Proposed fix — lightweight retry loop
-sleep 2
-
-if command -v curl >/dev/null 2>&1; then
-  curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-kiali-20001.log)"
-  curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-istio-ingressgateway-8080.log)"
-fi
+if command -v curl >/dev/null 2>&1; then
+  for attempt in $(seq 1 10); do
+    sleep 2
+    curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && { echo "Kiali: reachable on http://localhost:20001/kiali/"; break; }
+    [ "${attempt}" -eq 10 ] && echo "Kiali: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-kiali-20001.log)"
+  done
+  curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-istio-ingressgateway-8080.log)"
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 361 - 366, The single
short sleep (sleep 2) before probing port-forward readiness is unreliable;
replace it with a lightweight retry loop that attempts the two curl probes (the
Kiali check to http://127.0.0.1:20001/kiali/ and the Bookinfo check to
http://127.0.0.1:8080/productpage) a few times with small waits (e.g., 5–10s)
and a max retry count, exiting the loop early when each service becomes
reachable and printing the existing "reachable" or "not reachable yet (check
${RUNTIME_DIR}/...log)" messages; update the block in demo.sh that currently
contains sleep 2 and the curl checks to use this retry logic so port-forward has
time to bind before giving up.

Comment on lines +18 to +21
while IFS= read -r pid; do
[ -n "$pid" ] || continue
kill "$pid" 2>/dev/null || true
done <"$PF_PID_FILE"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh | head -100

Repository: nirgeier/KubernetesLabs

Length of output: 4132


🏁 Script executed:

# Check the cleanup function and port-forward recording logic
grep -n "cleanup_port_forwards\|PF_PID_FILE\|echo.*>>" Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh | head -20

Repository: nirgeier/KubernetesLabs

Length of output: 705


🏁 Script executed:

# Run the verification script provided in the comment
rg -n -C2 'echo \$! >>"\$PF_PID_FILE"|kill -0 "\$pf_pid"|^\s*\[\[\s*"\$pid"\s*=~\s*\^\[0-9\]\+\$\s*\]\]' \
  Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh

Repository: nirgeier/KubernetesLabs

Length of output: 225


Record only live port-forward PIDs before persisting them.

The script writes $! immediately after backgrounding kubectl port-forward without verifying the process survives. If the command exits quickly (e.g., missing service, authentication failure), that PID can be reused later and the cleanup function will kill an unrelated process.

🔧 Proposed fix
 cleanup_port_forwards() {
   [ -f "$PF_PID_FILE" ] || return 0
   [ -s "$PF_PID_FILE" ] || return 0
   while IFS= read -r pid; do
     [ -n "$pid" ] || continue
+    [[ "$pid" =~ ^[0-9]+$ ]] || continue
     kill "$pid" 2>/dev/null || true
   done <"$PF_PID_FILE"
   rm -f "$PF_PID_FILE" >/dev/null 2>&1 || true
 }
@@
   if command -v nohup >/dev/null 2>&1; then
     nohup kubectl -n "${namespace}" port-forward "svc/${svc}" "${local_port}:${remote_port}" >"${log_file}" 2>&1 &
   else
     kubectl -n "${namespace}" port-forward "svc/${svc}" "${local_port}:${remote_port}" >"${log_file}" 2>&1 &
   fi
-  echo $! >>"$PF_PID_FILE"
+  local pf_pid="$!"
+  sleep 1
+  if kill -0 "$pf_pid" 2>/dev/null; then
+    printf '%s\n' "$pf_pid" >>"$PF_PID_FILE"
+  else
+    echo "Warning: port-forward failed for ${namespace}/svc/${svc}. Check ${log_file}" >&2
+  fi
 }

Applies to lines 18–21 and 69–75.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 18 - 21,
When backgrounding kubectl port-forward, don't write $! to PF_PID_FILE
unconditionally; capture the background PID (pid=$!) then verify the process is
still the port-forward (alive) before persisting it by using a liveliness check
(e.g., kill -0 "$pid" >/dev/null 2>&1 or a brief sleep+check) and only append to
PF_PID_FILE if the check succeeds; apply the same change to both places where
port-forward PIDs are recorded and ensure the cleanup loop that reads
PF_PID_FILE still skips empty lines and only kills recorded PIDs (as it already
does).

Comment on lines +25 to +50
for crd in $(kubectl get crd -o name 2>/dev/null | grep istio.io || true); do
release=$(kubectl get "$crd" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true)
[ -n "$release" ] || continue
if [ "$release" != "istio-base" ]; then
print_info "Removing $crd (not managed by Helm istio-base)"
kubectl delete "$crd" --ignore-not-found 2>/dev/null || true
fi
done

for webhook in $(kubectl get validatingwebhookconfiguration -o name 2>/dev/null | grep -i istio || true); do
release=$(kubectl get "$webhook" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true)
[ -n "$release" ] || continue
if [ "$release" != "istio-base" ]; then
print_info "Removing $webhook (not managed by Helm istio-base)"
kubectl delete "$webhook" --ignore-not-found 2>/dev/null || true
fi
done

for webhook in $(kubectl get mutatingwebhookconfiguration -o name 2>/dev/null | grep -i istio || true); do
release=$(kubectl get "$webhook" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true)
[ -n "$release" ] || continue
if [ "$release" != "istio-base" ]; then
print_info "Removing $webhook (not managed by Helm istio-base)"
kubectl delete "$webhook" --ignore-not-found 2>/dev/null || true
fi
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unannotated conflicting resources are skipped instead of cleaned.

The guard "[ -n "$release" ] || continue" prevents deletion of resources with no Helm release annotation, so stale Istio resources from prior installs can still block Helm with ownership conflicts.

🔧 Proposed fix
-    [ -n "$release" ] || continue
-    if [ "$release" != "istio-base" ]; then
+    if [ -z "$release" ] || [ "$release" != "istio-base" ]; then
       print_info "Removing $crd (not managed by Helm istio-base)"
       kubectl delete "$crd" --ignore-not-found 2>/dev/null || true
     fi

Apply the same condition pattern to the validating webhook, mutating webhook, clusterrole, and clusterrolebinding loops.

Also applies to: 70-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/01-install-istio.sh` around lines 25 - 50, The current
guard '[ -n "$release" ] || continue' skips resources that lack a Helm release
annotation, leaving stale Istio objects; update each cleanup loop (the for crd
in $(kubectl get crd ...), for webhook in $(kubectl get
validatingwebhookconfiguration ...), for webhook in $(kubectl get
mutatingwebhookconfiguration ...), and the clusterrole/clusterrolebinding
cleanup loops) to delete resources when the release is empty or not equal to
"istio-base" (e.g. replace the guard with a conditional like: if [ -z "$release"
] || [ "$release" != "istio-base" ]; then print_info ...; kubectl delete ...;
fi) so unannotated conflicting resources are removed.

Comment on lines +85 to +86
METRIC_COUNT=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | grep -o '"value"' | wc -l | tr -d ' ') || true
if [ "${METRIC_COUNT:-0}" -gt 0 ]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Istio metrics check uses a false-positive condition.

Counting "value" keys does not validate metric value; it can pass even when count(istio_requests_total) is 0.

🔧 Proposed fix
-    METRIC_COUNT=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | grep -o '"value"' | wc -l | tr -d ' ') || true
-    if [ "${METRIC_COUNT:-0}" -gt 0 ]; then
+    METRIC_VALUE=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | awk -F'"' '/"value":\[/ {print $(NF-1); exit}') || true
+    if awk "BEGIN {exit !(${METRIC_VALUE:-0} > 0)}"; then
       print_success "Istio metrics available in Prometheus"
     else
       print_warning "No Istio metrics yet (traffic may need more time)"
     fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
METRIC_COUNT=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | grep -o '"value"' | wc -l | tr -d ' ') || true
if [ "${METRIC_COUNT:-0}" -gt 0 ]; then
METRIC_VALUE=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | awk -F'"' '/"value":\[/ {print $(NF-1); exit}') || true
if awk "BEGIN {exit !(${METRIC_VALUE:-0} > 0)}"; then
print_success "Istio metrics available in Prometheus"
else
print_warning "No Istio metrics yet (traffic may need more time)"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Labs/10-Istio/scripts/05-verify.sh` around lines 85 - 86, The current check
stores METRIC_COUNT by grepping for `"value"` in the Prometheus HTTP response
(involving PROM_POD and the query count(istio_requests_total)) which yields
false positives even when the metric value is 0; update the logic to parse the
JSON response and examine the actual numeric result instead of counting
`"value"` strings — call the Prometheus API in the same block that defines
PROM_POD, capture the raw JSON, extract the numeric value from
.data.result[0].value[1] (or treat result length 0 as zero), set METRIC_COUNT to
that numeric value (or 0) and then test if METRIC_COUNT -gt 0 to decide the
if-branch. Ensure you use the same kubectl exec to fetch the JSON and avoid
relying on grep/wc for presence checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant