lab10: Istio sidecar injection, Loki observability addon, and deploy/cleanup refactor#8
lab10: Istio sidecar injection, Loki observability addon, and deploy/cleanup refactor#8talorlik wants to merge 5 commits into
Conversation
📝 WalkthroughWalkthroughAdds Loki to the observability stack, enables Istio sidecar injection for observability components, replaces nginx Ingress with Istio Gateway VirtualServices, and refactors scripts (LAB_DIR sourcing, improved cleanup, port-forward guards, tolerant verification) and manifests (security contexts, PVC cleanup). Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DNS as DNS/hosts
participant Gateway as Istio Ingress Gateway
participant VS as VirtualService (bookinfo-gateway)
participant Service as Addon Service (loki)
participant Pod as Loki Pod
User->>DNS: resolve loki.local
DNS-->>User: return GATEWAY_IP
User->>Gateway: HTTP request for loki.local:3100
Gateway->>VS: match host -> route to service
VS->>Service: forward to loki:3100
Service->>Pod: kube-proxy -> pod (StatefulSet)
Pod-->>Service: response
Service-->>VS: response
VS-->>Gateway: response
Gateway-->>User: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Labs/10-Istio/manifests/addons/kiali.yaml (1)
144-175:⚠️ Potential issue | 🟠 MajorAdd explicit security context for the Kiali pod/container.
The deployment currently relies on default security context, which allows root-capable execution. Please harden the pod/container security context.
🔒 Proposed hardening patch
spec: + securityContext: + seccompProfile: + type: RuntimeDefault serviceAccountName: kiali containers: - name: kiali + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + capabilities: + drop: ["ALL"] image: quay.io/kiali/kiali:v1.86🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/kiali.yaml` around lines 144 - 175, The Kiali Deployment lacks explicit pod/container securityContext allowing root execution; update the pod spec and container spec where serviceAccountName is "kiali" and the container named "kiali" to harden it: add a podSecurityContext with fsGroup (e.g., 1000) and runAsNonRoot true, and set the container securityContext to runAsUser/runAsGroup (non-root UID/GID, e.g., 1000), runAsNonRoot: true, readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, drop all capabilities, and set seccompProfile to RuntimeDefault to enforce OS-level sandboxing.Labs/10-Istio/manifests/addons/grafana.yaml (1)
72-100:⚠️ Potential issue | 🟠 MajorHarden Grafana pod/container security context.
The deployment is still using default security context; this leaves the workload root-capable by default.
🔒 Proposed hardening patch
spec: + securityContext: + seccompProfile: + type: RuntimeDefault containers: - name: grafana + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + capabilities: + drop: ["ALL"] image: grafana/grafana:10.4.1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/grafana.yaml` around lines 72 - 100, The Grafana Deployment leaves pods root-capable; add a hardened securityContext at pod and container levels for the grafana container: in the pod spec (spec.template.spec) set runAsNonRoot: true, runAsUser to a non-zero uid (e.g. 1000) and fsGroup to a non-zero gid; in the grafana container spec (spec.template.spec.containers where name == "grafana") add securityContext with readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, capabilities.drop: ["ALL"], and seccompProfile: { type: "RuntimeDefault" } (and set runAsGroup/runAsUser here if you prefer container-level). Ensure these keys are added alongside the existing env/resources entries so the pod cannot start as root or escalate privileges.
🧹 Nitpick comments (5)
Labs/10-Istio/manifests/ingress.yaml (1)
95-116: Consider removing this Loki nginx Ingress if gateway routing is now canonical.This PR shifts addon access to Istio gateway routes; keeping a parallel nginx Ingress path increases config drift and can create an unintended alternate exposure path.
♻️ If nginx ingress is no longer needed, remove this block
-# Loki Ingress -apiVersion: networking.k8s.io/v1 -kind: Ingress -metadata: - name: loki - namespace: istio-system - labels: - app.kubernetes.io/name: loki -spec: - ingressClassName: nginx - rules: - - host: loki.local - http: - paths: - - path: / - pathType: Prefix - backend: - service: - name: loki - port: - number: 3100 ----🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/ingress.yaml` around lines 95 - 116, Remove the nginx Ingress resource for Loki to avoid creating an alternate exposure path now that Istio gateway routing is canonical: delete the Ingress manifest block for kind: Ingress with metadata.name: loki (namespace: istio-system) that sets ingressClassName: nginx and host: loki.local; if you must keep it temporarily, add a clear comment above the resource explaining it's legacy and intended for fallback only, but prefer removing the resource to prevent config drift with the Istio Gateway/VirtualService routes that now serve Loki.Labs/10-Istio/manifests/addons/prometheus.yaml (1)
137-139: Redundant annotation — only the label is needed for sidecar injection in modern Istio.In modern Istio, setting
sidecar.istio.io/inject: "true"as a label on the pod template spec triggers sidecar injection, whereas the same value as an annotation does not. The annotation is a legacy mechanism and is superseded by the label in Istio 1.10+. Keeping the annotation alongside the label is harmless for backward compatibility, but you may want to document this intent explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/prometheus.yaml` around lines 137 - 139, The pod spec currently sets sidecar.istio.io/inject: "true" both as a label and as an annotation; remove the redundant annotation entry (the annotations: sidecar.istio.io/inject key) and keep only the pod template label sidecar.istio.io/inject: "true" (or add a brief comment explaining it's intentionally retained for legacy compatibility) so that only the label controls Istio sidecar injection (refer to the sidecar.istio.io/inject key in the pod template).Labs/10-Istio/manifests/addons/loki.yaml (1)
265-265: Hardcodedchecksum/configannotation will silently stale when the ConfigMap changes.This annotation is typically generated at
helm template/helm upgradetime. Since it's committed as a static value, any manual edit to thelokiConfigMap will not trigger a pod rollout. Either remove it or add a note that it must be updated whenconfig.yamlchanges.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/loki.yaml` at line 265, The committed static checksum/config annotation will prevent pod rollouts when the referenced ConfigMap changes; remove the hardcoded "checksum/config: 7beee9ce6dba8de1e670322d1f02ca6271acbff4e76efe12d443937c11f802f8" from the pod template (or update it as part of your Helm pipeline) so that changes to the Loki ConfigMap (config.yaml) trigger restarts; search for the literal annotation key "checksum/config" in the Loki manifest and either delete that annotation from the Deployment/StatefulSet podTemplate or replace it with a templated/value that is updated when the ConfigMap changes.Labs/10-Istio/scripts/01-install-istio.sh (2)
68-81:grep -i istiomay inadvertently delete ClusterRoles owned byistio-base.The condition
release != "istiod"will delete any ClusterRole/ClusterRoleBinding containingistioin its name that isn't owned by theistiodHelm release — including those that might be owned byistio-base(release-name ="istio-base"). Ifistio-basecreates any cluster-scoped roles in some versions, they'd be wiped here just beforeistiodinstall. Consider guarding with[ "$release" != "istiod" ] && [ "$release" != "istio-base" ].🔧 Proposed fix
for role in $(kubectl get clusterrole -o name 2>/dev/null | grep -i istio || true); do release=$(kubectl get "$role" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true) - if [ "$release" != "istiod" ]; then + if [ "$release" != "istiod" ] && [ "$release" != "istio-base" ]; then print_info "Removing $role (not managed by Helm istiod)" kubectl delete "$role" --ignore-not-found 2>/dev/null || true fi done for binding in $(kubectl get clusterrolebinding -o name 2>/dev/null | grep -i istio || true); do release=$(kubectl get "$binding" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true) - if [ "$release" != "istiod" ]; then + if [ "$release" != "istiod" ] && [ "$release" != "istio-base" ]; then print_info "Removing $binding (not managed by Helm istiod)" kubectl delete "$binding" --ignore-not-found 2>/dev/null || true fi done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/scripts/01-install-istio.sh` around lines 68 - 81, The cleanup loops for ClusterRole/ClusterRoleBinding (variables role, binding and the release extraction using kubectl jsonpath) currently delete any resource containing "istio" unless its release equals "istiod"; update the conditional to also skip resources owned by the "istio-base" release by ensuring the if-test checks that release is neither "istiod" nor "istio-base" (i.e., change the condition around release for both loops to require [ "$release" != "istiod" ] && [ "$release" != "istio-base" ] before calling kubectl delete), keeping the rest of the logic (print_info and --ignore-not-found) unchanged.
23-48:--wait=falseon CRD deletion followed by a fixedsleep 3may not be sufficient.CRDs with existing custom resources (from a prior install) go through garbage collection before being fully removed. Three seconds is an optimistic bound; the Helm install that immediately follows may race against an in-progress CRD deletion, resulting in
"already exists"or"invalid ownership metadata"errors — the exact problem this cleanup aims to prevent.Consider either:
- Using
--wait(drop--wait=false) and allowing Kubernetes to confirm removal before proceeding, or- Increasing the sleep and adding a
kubectl waitpoll for the CRD to disappear.🔧 Proposed fix
for crd in $(kubectl get crd -o name 2>/dev/null | grep istio.io || true); do release=$(kubectl get "$crd" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true) if [ "$release" != "istio-base" ]; then print_info "Removing $crd (not managed by Helm istio-base)" - kubectl delete "$crd" --ignore-not-found --wait=false 2>/dev/null || true + kubectl delete "$crd" --ignore-not-found 2>/dev/null || true fi done - # Allow removal to complete before Helm install - sleep 3 + # Allow removal to complete before Helm install + sleep 5🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/scripts/01-install-istio.sh` around lines 23 - 48, The CRD cleanup loop uses kubectl delete "$crd" --ignore-not-found --wait=false and then a fixed sleep 3, which can race with Helm; change the CRD deletion to wait for actual removal (e.g., drop --wait=false so kubectl waits, or keep delete non-blocking but follow it with a kubectl wait --for=delete on the same resource name) and remove or extend the fixed sleep; apply the same waiting logic after the CRD loop (and optionally for the validating/mutating webhook deletions) so the script only proceeds to Helm install once kubectl wait confirms the resources are gone.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 154-156: The cleanup step currently deletes addons and routes but
doesn't remove PVCs created by Loki's volumeClaimTemplates (e.g.,
storage-loki-0), leaving orphaned 10Gi volumes; after the existing kubectl
delete -f "${LAB_DIR}/manifests/addons/" command add an explicit PVC deletion
step that targets Loki's PVCs (for example by name "storage-loki-0" and/or by
label selector used by the Loki StatefulSet) or delete PVCs owned by the Loki
StatefulSet to ensure the persistent volume is removed after the StatefulSet is
gone; place this PVC deletion immediately after the addons delete and before the
print_success line so orphaned storage is cleaned up.
In `@Labs/10-Istio/manifests/addons/loki.yaml`:
- Around line 359-367: The anti-affinity currently uses
requiredDuringSchedulingIgnoredDuringExecution which can block scheduling on
single-node labs; change the pod anti-affinity to use
preferredDuringSchedulingIgnoredDuringExecution instead, replacing the required
block with a preferred entry (with a weight and a podAffinityTerm containing the
same labelSelector/matchLabels and topologyKey) so the scheduler treats it as a
soft preference; update the affinity stanza around affinity -> podAntiAffinity
and remove the requiredDuringSchedulingIgnoredDuringExecution key while reusing
the existing matchLabels (app.kubernetes.io/component,
app.kubernetes.io/instance, app.kubernetes.io/name) and topologyKey
kubernetes.io/hostname.
- Around line 120-153: The ClusterRole "loki-clusterrole" currently grants
cluster-wide read access to "secrets" which is over-privileged; modify the RBAC
so the rules for loki-clusterrole only include "configmaps" (remove "secrets")
or alternatively replace the ClusterRole/ClusterRoleBinding pair
(loki-clusterrole and loki-clusterrolebinding) with a namespaced Role and
RoleBinding bound to the loki ServiceAccount in istio-system so Loki cannot read
secrets cluster-wide.
- Around line 287-358: The loki and loki-sc-rules containers currently have no
resource limits (resources: {} or omitted), which can allow unbounded CPU/memory
use; add a resources block to both container specs (for container names "loki"
and "loki-sc-rules") including requests and limits for cpu and memory (e.g.,
requests: cpu 100m, memory 128Mi; limits: cpu 500m, memory 512Mi for loki and
smaller for the sidecar such as requests 50m/64Mi and limits 200m/256Mi),
ensuring the resources field is present under each container entry so the
cluster can enforce QoS and avoid OOM/CPU starvation.
- Around line 350-358: The loki-sc-rules sidecar (k8s-sidecar image) is marked
readOnlyRootFilesystem: true but lacks a writable /tmp mount; add a volumeMount
for mountPath "/tmp" to the loki-sc-rules container (e.g., name it "tmp") and
add a corresponding emptyDir volume entry in the Pod spec's volumes so the
sidecar has a writable /tmp; keep the existing sc-rules-volume mount and
securityContext unchanged except ensure the new /tmp mount provides write access
for temporary files.
In `@Labs/10-Istio/manifests/bookinfo-gateway.yaml`:
- Around line 27-29: The VirtualService host restriction 'bookinfo.local'
prevents matches for in-cluster requests sent to
istio-ingressgateway.istio-system.svc.cluster.local; update the VirtualService
(the hosts list in the VirtualService resource) to include the cluster-local
FQDN used by the traffic generator
(istio-ingressgateway.istio-system.svc.cluster.local) or replace the single host
with a broader match/wildcard so requests that don’t override the Host header
will also match the VirtualService.
---
Outside diff comments:
In `@Labs/10-Istio/manifests/addons/grafana.yaml`:
- Around line 72-100: The Grafana Deployment leaves pods root-capable; add a
hardened securityContext at pod and container levels for the grafana container:
in the pod spec (spec.template.spec) set runAsNonRoot: true, runAsUser to a
non-zero uid (e.g. 1000) and fsGroup to a non-zero gid; in the grafana container
spec (spec.template.spec.containers where name == "grafana") add securityContext
with readOnlyRootFilesystem: true, allowPrivilegeEscalation: false,
capabilities.drop: ["ALL"], and seccompProfile: { type: "RuntimeDefault" } (and
set runAsGroup/runAsUser here if you prefer container-level). Ensure these keys
are added alongside the existing env/resources entries so the pod cannot start
as root or escalate privileges.
In `@Labs/10-Istio/manifests/addons/kiali.yaml`:
- Around line 144-175: The Kiali Deployment lacks explicit pod/container
securityContext allowing root execution; update the pod spec and container spec
where serviceAccountName is "kiali" and the container named "kiali" to harden
it: add a podSecurityContext with fsGroup (e.g., 1000) and runAsNonRoot true,
and set the container securityContext to runAsUser/runAsGroup (non-root UID/GID,
e.g., 1000), runAsNonRoot: true, readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, drop all capabilities, and set seccompProfile
to RuntimeDefault to enforce OS-level sandboxing.
---
Nitpick comments:
In `@Labs/10-Istio/manifests/addons/loki.yaml`:
- Line 265: The committed static checksum/config annotation will prevent pod
rollouts when the referenced ConfigMap changes; remove the hardcoded
"checksum/config:
7beee9ce6dba8de1e670322d1f02ca6271acbff4e76efe12d443937c11f802f8" from the pod
template (or update it as part of your Helm pipeline) so that changes to the
Loki ConfigMap (config.yaml) trigger restarts; search for the literal annotation
key "checksum/config" in the Loki manifest and either delete that annotation
from the Deployment/StatefulSet podTemplate or replace it with a templated/value
that is updated when the ConfigMap changes.
In `@Labs/10-Istio/manifests/addons/prometheus.yaml`:
- Around line 137-139: The pod spec currently sets sidecar.istio.io/inject:
"true" both as a label and as an annotation; remove the redundant annotation
entry (the annotations: sidecar.istio.io/inject key) and keep only the pod
template label sidecar.istio.io/inject: "true" (or add a brief comment
explaining it's intentionally retained for legacy compatibility) so that only
the label controls Istio sidecar injection (refer to the sidecar.istio.io/inject
key in the pod template).
In `@Labs/10-Istio/manifests/ingress.yaml`:
- Around line 95-116: Remove the nginx Ingress resource for Loki to avoid
creating an alternate exposure path now that Istio gateway routing is canonical:
delete the Ingress manifest block for kind: Ingress with metadata.name: loki
(namespace: istio-system) that sets ingressClassName: nginx and host:
loki.local; if you must keep it temporarily, add a clear comment above the
resource explaining it's legacy and intended for fallback only, but prefer
removing the resource to prevent config drift with the Istio
Gateway/VirtualService routes that now serve Loki.
In `@Labs/10-Istio/scripts/01-install-istio.sh`:
- Around line 68-81: The cleanup loops for ClusterRole/ClusterRoleBinding
(variables role, binding and the release extraction using kubectl jsonpath)
currently delete any resource containing "istio" unless its release equals
"istiod"; update the conditional to also skip resources owned by the
"istio-base" release by ensuring the if-test checks that release is neither
"istiod" nor "istio-base" (i.e., change the condition around release for both
loops to require [ "$release" != "istiod" ] && [ "$release" != "istio-base" ]
before calling kubectl delete), keeping the rest of the logic (print_info and
--ignore-not-found) unchanged.
- Around line 23-48: The CRD cleanup loop uses kubectl delete "$crd"
--ignore-not-found --wait=false and then a fixed sleep 3, which can race with
Helm; change the CRD deletion to wait for actual removal (e.g., drop
--wait=false so kubectl waits, or keep delete non-blocking but follow it with a
kubectl wait --for=delete on the same resource name) and remove or extend the
fixed sleep; apply the same waiting logic after the CRD loop (and optionally for
the validating/mutating webhook deletions) so the script only proceeds to Helm
install once kubectl wait confirms the resources are gone.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
Labs/10-Istio/demo.shLabs/10-Istio/manifests/addons/grafana.yamlLabs/10-Istio/manifests/addons/jaeger.yamlLabs/10-Istio/manifests/addons/kiali.yamlLabs/10-Istio/manifests/addons/loki.yamlLabs/10-Istio/manifests/addons/prometheus.yamlLabs/10-Istio/manifests/bookinfo-gateway.yamlLabs/10-Istio/manifests/ingress.yamlLabs/10-Istio/manifests/observability-routes.yamlLabs/10-Istio/manifests/traffic-generator.yamlLabs/10-Istio/scripts/01-install-istio.shLabs/10-Istio/scripts/02-install-addons.shLabs/10-Istio/scripts/03-deploy-bookinfo.shLabs/10-Istio/scripts/04-traffic-generator.sh
Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
…deploy/cleanup processes. Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Labs/10-Istio/manifests/ingress.yaml (1)
4-4:⚠️ Potential issue | 🟡 MinorUpdate the header comment to include
loki.local.The comment lists the hosts but omits the newly added
loki.local.📝 Proposed fix
-# Hosts: kiali.local, grafana.local, jaeger.local, prometheus.local, bookinfo.local +# Hosts: kiali.local, grafana.local, jaeger.local, prometheus.local, loki.local, bookinfo.local🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/ingress.yaml` at line 4, Update the header comment that lists hosts to include the newly added host by appending "loki.local" to the existing comma-separated list (the comment line starting with "# Hosts: kiali.local, grafana.local, jaeger.local, prometheus.local, bookinfo.local"); ensure formatting matches the existing comment (comma and space separation).Labs/10-Istio/demo.sh (1)
184-184:⚠️ Potential issue | 🟠 Major
set -uwill abort with "unbound variable" when no argument is given.Running
./demo.shwithout arguments hits${1}whilenounsetis active, producing a cryptic bash error instead of the friendly usage text on line 192.Proposed fix
-case "${1}" in +case "${1:-}" in🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo.sh` at line 184, The script fails under set -u because case "${1}" expands an unset positional and aborts; change the case to use a safe default (e.g. case "${1:-}" in) or explicitly check $# -eq 0 before the case and print the existing usage string, so the demo.sh entry point (the case "${1}" in) won't trigger an "unbound variable" when no argument is provided.
♻️ Duplicate comments (8)
Labs/10-Istio/manifests/addons/loki.yaml (4)
120-153: ClusterRole grants cluster-wide secrets read access — already flagged.The
secretsresource in the ClusterRole is still present per the previous review. If secrets are removed from the ClusterRole, also update theloki-sc-rulescontainer env varRESOURCEfrom"both"to"configmap"(line 343) to stay consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/loki.yaml` around lines 120 - 153, Remove "secrets" from the ClusterRole resources list in the ClusterRole resource named "loki-clusterrole" (keep only "configmaps") and then update the loki-sc-rules container environment variable RESOURCE from "both" to "configmap" so permissions and the container config remain consistent; locate the ClusterRole block with metadata.name: loki-clusterrole and the container definition for loki-sc-rules that defines env var RESOURCE and change the values accordingly.
287-331: Missing resource limits for both containers — already flagged.
lokihasresources: {}(line 330-331) andloki-sc-rulesomits the field entirely. See previous review for suggested bounds.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/loki.yaml` around lines 287 - 331, The loki and loki-sc-rules containers are missing resource constraints: update the container specs (container name "loki" and container name "loki-sc-rules") to include appropriate resources.requests and resources.limits blocks (e.g. CPU and memory request/limit values consistent with the previous review bounds) instead of resources: {} or omitting the field; add the same or adjusted values to both containers' spec.volumeMounts sections so Kubernetes can schedule and limit them reliably.
359-367: Hard anti-affinity blocks single-node lab clusters — already flagged.
requiredDuringSchedulingIgnoredDuringExecutionis still in place. See previous review for thepreferredDuringSchedulingalternative.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/loki.yaml` around lines 359 - 367, The affinity block uses hard pod anti-affinity (requiredDuringSchedulingIgnoredDuringExecution), which prevents scheduling on single-node labs; replace requiredDuringSchedulingIgnoredDuringExecution under affinity.podAntiAffinity with preferredDuringSchedulingIgnoredDuringExecution and convert the existing list item into a WeightedPodAffinityTerm (set weight (e.g., 100) and move the current podAffinityTerm contents — labelSelector with matchLabels and topologyKey — under podAffinityTerm) so anti-affinity becomes a preference rather than a hard requirement (update the affinity/podAntiAffinity structure where requiredDuringSchedulingIgnoredDuringExecution is declared).
350-358:loki-sc-rulessidecar missing/tmpmount — already flagged.The sidecar has
readOnlyRootFilesystem: truebut no writable/tmpemptyDir, unlike the mainlokicontainer. See previous review for the fix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/loki.yaml` around lines 350 - 358, The sidecar container "loki-sc-rules" is marked readOnlyRootFilesystem: true but lacks a writable /tmp mount; add an emptyDir volume (e.g. name: sc-rules-tmp) to the Pod spec's volumes and add a volumeMount in the loki-sc-rules container mounting that volume at /tmp (mountPath: "/tmp") so the sidecar gets a writable tmpfs while keeping other mounts read-only; ensure the new volume name is referenced exactly in the loki-sc-rules volumeMount entry.Labs/10-Istio/manifests/addons/grafana.yaml (1)
70-74: Sidecar injection annotation correctly enabled; same inert label note as Jaeger.The annotation change from
"false"to"true"is correct. Thesidecar.istio.io/injectlabel on line 72 is the same pattern noted in the Jaeger review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/grafana.yaml` around lines 70 - 74, There is a duplicate Istio sidecar injection marker — remove the sidecar.istio.io/inject entry from the metadata.labels block (the labels: app: grafana / sidecar.istio.io/inject) and keep the sidecar.istio.io/inject annotation in annotations: so only the annotation controls injection; update the labels block to only include app: grafana and ensure no duplicate label key remains.Labs/10-Istio/manifests/addons/prometheus.yaml (1)
135-139: Same sidecar label/annotation pattern — annotation correctly enables injection.Same note as the Jaeger review regarding the inert
sidecar.istio.io/injectlabel.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/prometheus.yaml` around lines 135 - 139, The labels block contains an inert sidecar.istio.io/inject label that should be removed—keep only the annotation sidecar.istio.io/inject: "true" (in the annotations block) to enable Istio sidecar injection; update the manifest by deleting the labels entry sidecar.istio.io/inject and leaving app: prometheus unchanged so only the annotation controls injection.Labs/10-Istio/manifests/addons/kiali.yaml (1)
142-146: Same sidecar label/annotation pattern — annotation correctly enables injection.Same note as the Jaeger review regarding the inert
sidecar.istio.io/injectlabel.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/kiali.yaml` around lines 142 - 146, The manifest sets sidecar.istio.io/inject as both a label and an annotation under the kiali resource; the label is inert and should be removed to avoid confusion—keep the annotation that actually enables injection. Edit the labels block (where app: kiali appears) to remove the sidecar.istio.io/inject label and ensure sidecar.istio.io/inject: "true" remains only under the annotations block so injection behavior is controlled solely by the annotation.Labs/10-Istio/demo.sh (1)
154-156: Loki PVCs fromvolumeClaimTemplatesare still not cleaned up.As noted in a prior review,
kubectl delete -f addons/removes the StatefulSet but leaves its PVCs (e.g.,storage-loki-0) orphaned. This concern is still unaddressed.Proposed fix — delete Loki PVCs after StatefulSet removal
kubectl delete -f "${LAB_DIR}/manifests/addons/" 2>/dev/null || true + kubectl delete pvc -l app.kubernetes.io/name=loki -n istio-system 2>/dev/null || true print_success "Addons and gateway routes removed"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo.sh` around lines 154 - 156, After removing the addons StatefulSet with kubectl delete -f "${LAB_DIR}/manifests/addons/", also explicitly delete the orphaned Loki PVCs (e.g., storage-loki-0) before printing success: run a kubectl delete pvc command that targets the Loki PVCs (by name pattern or by label selector used by the Loki StatefulSet) in the same namespace, ensure the command tolerates no-matches (suppress errors or use || true), then call print_success "Addons and gateway routes removed" as before; update the block that currently contains kubectl delete -f "${LAB_DIR}/manifests/addons/" and print_success to include this additional PVC cleanup step.
🧹 Nitpick comments (3)
Labs/10-Istio/manifests/addons/jaeger.yaml (1)
77-81:sidecar.istio.io/injectis correctly set as a label — this is the canonical Istio control mechanism.The label on line 79 is not inert; per official Istio documentation,
sidecar.istio.io/injectas a label on the pod template is the reliable and explicit way to control sidecar injection at the pod level. The annotation on line 81 is redundant. Both work, but having both is unnecessary — keep the label and consider removing the annotation for clarity.🧹 Remove redundant annotation
labels: app: jaeger sidecar.istio.io/inject: "true" annotations: - sidecar.istio.io/inject: "true" prometheus.io/scrape: "true" prometheus.io/port: "14269"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/manifests/addons/jaeger.yaml` around lines 77 - 81, The pod template currently sets sidecar.istio.io/inject both as a label and as an annotation; keep the canonical label (labels: sidecar.istio.io/inject: "true") and remove the redundant annotations block entry (annotations: sidecar.istio.io/inject: "true") so only the label remains in the Jaeger deployment/podTemplate (the entries around labels: app: jaeger and annotations: sidecar.istio.io/inject).Labs/10-Istio/demo.sh (2)
176-177: Consider--ignore-not-foundto suppress noisy "not found" errors.When some of these resources exist but others don't,
kubectl deleteemits error messages to stderr before|| trueswallows the exit code. Adding--ignore-not-foundkeeps the output clean and is consistent with the CRD cleanup loop on line 172.Suggested diff
- kubectl delete clusterrole istio-prometheus kiali loki-clusterrole 2>/dev/null || true - kubectl delete clusterrolebinding istio-prometheus kiali loki-clusterrolebinding 2>/dev/null || true + kubectl delete clusterrole istio-prometheus kiali loki-clusterrole --ignore-not-found 2>/dev/null || true + kubectl delete clusterrolebinding istio-prometheus kiali loki-clusterrolebinding --ignore-not-found 2>/dev/null || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo.sh` around lines 176 - 177, Replace the noisy kubectl delete invocations that pipe stderr to /dev/null with versions that use the --ignore-not-found flag: update the two commands that delete clusterroles ("kubectl delete clusterrole istio-prometheus kiali loki-clusterrole") and clusterrolebindings ("kubectl delete clusterrolebinding istio-prometheus kiali loki-clusterrolebinding") to include --ignore-not-found and remove the stderr redirection/|| true so missed resources do not print errors.
149-150: Minor: PeerAuthentication cleanup runs after thebookinfonamespace is already deleted.Lines 141–144 delete the
bookinfonamespace (which removes all namespaced resources within it), so thekubectl delete peerauthentication --all -n bookinfoon line 149 is effectively a no-op. Consider moving these lines before the namespace deletion, or removing line 149 entirely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo.sh` around lines 149 - 150, The PeerAuthentication cleanup for the bookinfo namespace is redundant because the script deletes the bookinfo namespace earlier; either remove the redundant "kubectl delete peerauthentication --all -n bookinfo" line or move that "kubectl delete peerauthentication --all -n bookinfo" invocation to run before the "kubectl delete namespace bookinfo" call so the namespaced resource is actually deleted; keep the "kubectl delete peerauthentication --all -n istio-system" line as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 59-62: The assignments to GATEWAY_IP use command substitution with
kubectl which can return non-zero and abort the script under set -e; update both
kubectl calls that reference istio-ingressgateway (the jsonpath for
.status.loadBalancer.ingress[0].ip and the hostname fallback) to guard the
command substitution from propagating failure by appending || true to each
kubectl command so a missing/unready service won’t terminate the script and
GATEWAY_IP will simply be empty if not found.
---
Outside diff comments:
In `@Labs/10-Istio/demo.sh`:
- Line 184: The script fails under set -u because case "${1}" expands an unset
positional and aborts; change the case to use a safe default (e.g. case "${1:-}"
in) or explicitly check $# -eq 0 before the case and print the existing usage
string, so the demo.sh entry point (the case "${1}" in) won't trigger an
"unbound variable" when no argument is provided.
In `@Labs/10-Istio/manifests/ingress.yaml`:
- Line 4: Update the header comment that lists hosts to include the newly added
host by appending "loki.local" to the existing comma-separated list (the comment
line starting with "# Hosts: kiali.local, grafana.local, jaeger.local,
prometheus.local, bookinfo.local"); ensure formatting matches the existing
comment (comma and space separation).
---
Duplicate comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 154-156: After removing the addons StatefulSet with kubectl delete
-f "${LAB_DIR}/manifests/addons/", also explicitly delete the orphaned Loki PVCs
(e.g., storage-loki-0) before printing success: run a kubectl delete pvc command
that targets the Loki PVCs (by name pattern or by label selector used by the
Loki StatefulSet) in the same namespace, ensure the command tolerates no-matches
(suppress errors or use || true), then call print_success "Addons and gateway
routes removed" as before; update the block that currently contains kubectl
delete -f "${LAB_DIR}/manifests/addons/" and print_success to include this
additional PVC cleanup step.
In `@Labs/10-Istio/manifests/addons/grafana.yaml`:
- Around line 70-74: There is a duplicate Istio sidecar injection marker —
remove the sidecar.istio.io/inject entry from the metadata.labels block (the
labels: app: grafana / sidecar.istio.io/inject) and keep the
sidecar.istio.io/inject annotation in annotations: so only the annotation
controls injection; update the labels block to only include app: grafana and
ensure no duplicate label key remains.
In `@Labs/10-Istio/manifests/addons/kiali.yaml`:
- Around line 142-146: The manifest sets sidecar.istio.io/inject as both a label
and an annotation under the kiali resource; the label is inert and should be
removed to avoid confusion—keep the annotation that actually enables injection.
Edit the labels block (where app: kiali appears) to remove the
sidecar.istio.io/inject label and ensure sidecar.istio.io/inject: "true" remains
only under the annotations block so injection behavior is controlled solely by
the annotation.
In `@Labs/10-Istio/manifests/addons/loki.yaml`:
- Around line 120-153: Remove "secrets" from the ClusterRole resources list in
the ClusterRole resource named "loki-clusterrole" (keep only "configmaps") and
then update the loki-sc-rules container environment variable RESOURCE from
"both" to "configmap" so permissions and the container config remain consistent;
locate the ClusterRole block with metadata.name: loki-clusterrole and the
container definition for loki-sc-rules that defines env var RESOURCE and change
the values accordingly.
- Around line 287-331: The loki and loki-sc-rules containers are missing
resource constraints: update the container specs (container name "loki" and
container name "loki-sc-rules") to include appropriate resources.requests and
resources.limits blocks (e.g. CPU and memory request/limit values consistent
with the previous review bounds) instead of resources: {} or omitting the field;
add the same or adjusted values to both containers' spec.volumeMounts sections
so Kubernetes can schedule and limit them reliably.
- Around line 359-367: The affinity block uses hard pod anti-affinity
(requiredDuringSchedulingIgnoredDuringExecution), which prevents scheduling on
single-node labs; replace requiredDuringSchedulingIgnoredDuringExecution under
affinity.podAntiAffinity with preferredDuringSchedulingIgnoredDuringExecution
and convert the existing list item into a WeightedPodAffinityTerm (set weight
(e.g., 100) and move the current podAffinityTerm contents — labelSelector with
matchLabels and topologyKey — under podAffinityTerm) so anti-affinity becomes a
preference rather than a hard requirement (update the affinity/podAntiAffinity
structure where requiredDuringSchedulingIgnoredDuringExecution is declared).
- Around line 350-358: The sidecar container "loki-sc-rules" is marked
readOnlyRootFilesystem: true but lacks a writable /tmp mount; add an emptyDir
volume (e.g. name: sc-rules-tmp) to the Pod spec's volumes and add a volumeMount
in the loki-sc-rules container mounting that volume at /tmp (mountPath: "/tmp")
so the sidecar gets a writable tmpfs while keeping other mounts read-only;
ensure the new volume name is referenced exactly in the loki-sc-rules
volumeMount entry.
In `@Labs/10-Istio/manifests/addons/prometheus.yaml`:
- Around line 135-139: The labels block contains an inert
sidecar.istio.io/inject label that should be removed—keep only the annotation
sidecar.istio.io/inject: "true" (in the annotations block) to enable Istio
sidecar injection; update the manifest by deleting the labels entry
sidecar.istio.io/inject and leaving app: prometheus unchanged so only the
annotation controls injection.
---
Nitpick comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 176-177: Replace the noisy kubectl delete invocations that pipe
stderr to /dev/null with versions that use the --ignore-not-found flag: update
the two commands that delete clusterroles ("kubectl delete clusterrole
istio-prometheus kiali loki-clusterrole") and clusterrolebindings ("kubectl
delete clusterrolebinding istio-prometheus kiali loki-clusterrolebinding") to
include --ignore-not-found and remove the stderr redirection/|| true so missed
resources do not print errors.
- Around line 149-150: The PeerAuthentication cleanup for the bookinfo namespace
is redundant because the script deletes the bookinfo namespace earlier; either
remove the redundant "kubectl delete peerauthentication --all -n bookinfo" line
or move that "kubectl delete peerauthentication --all -n bookinfo" invocation to
run before the "kubectl delete namespace bookinfo" call so the namespaced
resource is actually deleted; keep the "kubectl delete peerauthentication --all
-n istio-system" line as-is.
In `@Labs/10-Istio/manifests/addons/jaeger.yaml`:
- Around line 77-81: The pod template currently sets sidecar.istio.io/inject
both as a label and as an annotation; keep the canonical label (labels:
sidecar.istio.io/inject: "true") and remove the redundant annotations block
entry (annotations: sidecar.istio.io/inject: "true") so only the label remains
in the Jaeger deployment/podTemplate (the entries around labels: app: jaeger and
annotations: sidecar.istio.io/inject).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
Labs/10-Istio/demo.shLabs/10-Istio/manifests/addons/grafana.yamlLabs/10-Istio/manifests/addons/jaeger.yamlLabs/10-Istio/manifests/addons/kiali.yamlLabs/10-Istio/manifests/addons/loki.yamlLabs/10-Istio/manifests/addons/prometheus.yamlLabs/10-Istio/manifests/bookinfo-gateway.yamlLabs/10-Istio/manifests/ingress.yamlLabs/10-Istio/manifests/observability-routes.yamlLabs/10-Istio/manifests/traffic-generator.yamlLabs/10-Istio/scripts/01-install-istio.shLabs/10-Istio/scripts/02-install-addons.shLabs/10-Istio/scripts/03-deploy-bookinfo.shLabs/10-Istio/scripts/04-traffic-generator.sh
🚧 Files skipped from review as they are similar to previous changes (5)
- Labs/10-Istio/scripts/02-install-addons.sh
- Labs/10-Istio/scripts/01-install-istio.sh
- Labs/10-Istio/manifests/observability-routes.yaml
- Labs/10-Istio/manifests/traffic-generator.yaml
- Labs/10-Istio/manifests/bookinfo-gateway.yaml
… updated documentation. Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (12)
Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh (3)
18-22:⚠️ Potential issue | 🟡 Minor
lsof-absent fallback silently allows duplicate port-forwards.When
lsofis unavailable,is_port_listeningalways returns 1 ("not listening"), sostart_port_forwardunconditionally launches a newkubectl port-forward. Because the process is backgrounded,set -enever fires — the conflict silently surfaces only in the per-session log file under$RUNTIME_DIR, with no warning to the user.Consider falling back to
ssornetstatbefore giving up, or at least log a warning whenlsofis absent so the user knows the guard is inactive:🛡️ Proposed fix – warn when guard is disabled
is_port_listening() { local port="$1" if command -v lsof >/dev/null 2>&1; then lsof -nP -iTCP:"${port}" -sTCP:LISTEN >/dev/null 2>&1 + elif command -v ss >/dev/null 2>&1; then + ss -tlnH "sport = :${port}" | grep -q . else + echo "Warning: neither lsof nor ss found; cannot check port ${port}." >&2 return 1 fi }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 18 - 22, The is_port_listening guard currently treats missing lsof as "not listening", causing start_port_forward to launch duplicate kubectl port-forwards; update is_port_listening to try ss and then netstat when lsof is absent (checking for their commands and using appropriate flags to test LISTEN on "${port}"), and if none are present log a clear warning/error stating the port-check guard is disabled so the user knows duplicate port-forwards may occur; ensure start_port_forward still respects the boolean result from is_port_listening.
111-112:⚠️ Potential issue | 🟡 Minor
touchdoes not truncate — stale PIDs accumulate across script runs.
touch "$PF_PID_FILE"creates the file if absent but leaves existing content intact. On a second run, old (possibly recycled) PIDs pile up, andxargs kill < $PF_PID_FILEon line 131 may then kill unrelated processes.Replace
touchwith a truncating open (or>) to start fresh each run:🐛 Proposed fix
-touch "$PF_PID_FILE" +>"$PF_PID_FILE"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 111 - 112, The script currently uses touch "$PF_PID_FILE" which preserves old PIDs; change this to truncate/create PF_PID_FILE before starting port forwards so stale PIDs don't accumulate and get killed by xargs kill < $PF_PID_FILE; update the block around the touch and start_port_forward calls (referencing PF_PID_FILE and start_port_forward) to ensure PF_PID_FILE is emptied/created fresh at runtime (use a truncating open/redirect) before any PIDs are appended.
138-138:⚠️ Potential issue | 🟡 MinorCleanup command uses bare relative paths — breaks when run from any other directory.
kubectl delete -f manifests/...relies on the user's CWD being$WORK_DIR. Printing absolute paths makes the hint copy-paste safe regardless of where the user's shell is.🐛 Proposed fix
-echo "Cleanup resources: kubectl delete -f manifests/02-istio-routing.yaml; kubectl delete -f manifests/01-app.yaml" +echo "Cleanup resources: kubectl delete -f ${WORK_DIR}/manifests/02-istio-routing.yaml; kubectl delete -f ${WORK_DIR}/manifests/01-app.yaml"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` at line 138, The cleanup echo currently prints relative paths ("kubectl delete -f manifests/...") which fails if the user isn't in the script's working dir; update the echo in demo.sh that prints the cleanup command so it uses absolute paths (e.g. expand to "${WORK_DIR}/manifests/02‑istio‑routing.yaml" and "${WORK_DIR}/manifests/01‑app.yaml" or compute the script dir via a variable and use that) instead of bare "manifests/..." so the printed hint is copy‑paste safe from any CWD.Labs/10-Istio/demo/01-basic-setup/demo.sh (5)
90-97:⚠️ Potential issue | 🟡 Minor
rm -f "$PF_PID_FILE"is missing the|| trueguard (inconsistent with the rest of the file).Every other
rmcall in the script appends>/dev/null 2>&1 || true. Underset -euo pipefail, a failure here (e.g., permission issue) would abort the script.🐛 Proposed fix
- rm -f "$PF_PID_FILE" + rm -f "$PF_PID_FILE" >/dev/null 2>&1 || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 90 - 97, The rm invocation that removes PF_PID_FILE inside the conditional block can fail under set -euo pipefail and currently lacks the same defensive redirect-and-|| true pattern used elsewhere; update the rm call that references PF_PID_FILE in the block (after the while loop that reads and kills PIDs) to append >/dev/null 2>&1 || true so the removal failure won't abort the script while keeping behavior consistent with other rm calls.
342-345:⚠️ Potential issue | 🟡 MinorLog file paths in the error hints are wrong — users won't be able to find them.
start_port_forwardnames its log files$RUNTIME_DIR/port-forward-<ns>-<svc>-<port>.log(no dot prefix, stored under$RUNTIME_DIR). The hardcoded strings in the inline hints omit$RUNTIME_DIRand add a spurious leading dot, making the hint misleading.🐛 Proposed fix
- curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check .port-forward-istio-system-kiali-20001.log)" - curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check .port-forward-istio-system-istio-ingressgateway-8080.log)" + curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check $RUNTIME_DIR/port-forward-istio-system-kiali-20001.log)" + curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check $RUNTIME_DIR/port-forward-istio-system-istio-ingressgateway-8080.log)"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 342 - 345, The inline hints printed by the echo statements incorrectly reference log files as ".port-forward-...log" and omit the RUNTIME_DIR where start_port_forward actually writes logs; update the two echo messages in demo.sh (the lines that output "Kiali: not reachable yet (...)" and "Bookinfo: not reachable yet (...)") to reference the correct runtime path by including $RUNTIME_DIR and removing the leading dot so they match the naming used by start_port_forward (port-forward-<ns>-<svc>-<port>.log).
228-231:⚠️ Potential issue | 🟠 MajorInjecting Istio into ALL non-system namespaces is too broad and destructive on shared clusters.
This loop labels every namespace that isn't
kube-system,kube-public,kube-node-lease, oristio-system. On any shared or pre-existing cluster this silently schedules sidecar injection for all workloads in all user namespaces (including unrelated ones) on their next restart — and the labels are not removed by the cleanup function.Only the namespaces that actually need injection (
defaultfor Bookinfo,demofor the demo workloads) should be labelled.🐛 Proposed targeted approach
-for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}'); do - case "$ns" in kube-system | kube-public | kube-node-lease | istio-system) continue ;; esac - kubectl label namespace "$ns" istio-injection=enabled --overwrite -done +for ns in default demo; do + if kubectl get ns "$ns" >/dev/null 2>&1; then + kubectl label namespace "$ns" istio-injection=enabled --overwrite + fi +done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 228 - 231, Replace the broad namespace loop that labels every non-system namespace (the for ... kubectl label namespace "$ns" istio-injection=enabled --overwrite block) with explicit labeling of only the namespaces needed for the demo (e.g., run kubectl label namespace default istio-injection=enabled --overwrite and kubectl label namespace demo istio-injection=enabled --overwrite) and update the cleanup logic to remove those labels (use kubectl label namespace <name> istio-injection- to unset) so you don’t affect unrelated user namespaces.
323-323:⚠️ Potential issue | 🟡 MinorCleanup hint omits reversing
istio-injectionlabels.The
istioctl uninstall --purge+ namespace deletion hint is correct, but theistio-injection=enabledlabels added todefault(and potentially other namespaces when the broader loop is used) are not cleaned up. Pods in thedefaultnamespace will continue to have sidecars injected after Istio is gone, causingCrashLoopBackOfferrors until the label is removed.Consider adding to the cleanup hint:
kubectl label namespace default istio-injection-🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` at line 323, The cleanup hint in demo.sh currently suggests only running istioctl uninstall --purge -y and kubectl delete ns istio-system demo but omits removing the istio-injection label(s); update the echo string to instruct removing the label (e.g., kubectl label namespace default istio-injection-) and mention removing the label from any other namespaces you labeled in the loop so pods stop receiving sidecars after Istio is uninstalled.
265-294:⚠️ Potential issue | 🟡 MinorStep numbers are non-sequential — steps 10, 12, and 13 are missing from the comments.
The numbering jumps from Step 09 → Step 11 → Step 14. This likely reflects removed steps whose numbers were never renumbered. Keeping sequential step comments makes it easier to cross-reference with lab documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 265 - 294, The step comments in demo.sh are out of sequence (e.g., "Step 11: Creating demo VirtualService for reviews service...", then "Step 14: Creating demo namespace...", "Step 15: Deploying Nginx...", "Step 16: Deploying HTTPD...", "Step 17: Verifying demo deployment..."); update these echo step numbers to be sequential to match the lab flow (renumber them to fill the missing 10,12,13 as appropriate) so the messages that reference reviews-vs.yaml, demo-namespace.yaml, nginx-demo.yaml, httpd-demo.yaml and the final verification reflect a continuous sequence; ensure all "Step X:" prefixes in the apply/wait blocks are renumbered consistently and any cross-references in the script or docs are adjusted to the new numbering.Labs/10-Istio/scripts/05-verify.sh (1)
19-19:⚠️ Potential issue | 🔴 CriticalGuard
kubectlcommand substitutions to prevent premature script exit.The script uses
set -e(line 2), and multiplekubectlcommand substitutions lack guards. If any of these commands fail (e.g., cluster inaccessible, permission denied), the script will terminate before reaching the pass/fail summary (lines 94–98), preventing users from seeing verification results.Apply
|| trueto the following lines to ensure the script always completes:
- Line 19:
ISTIOD_STATUS=$(kubectl get pods ... 2>/dev/null || true)- Line 28:
GW_STATUS=$(kubectl get pods ... 2>/dev/null || true)- Line 33:
GW_STATUS=$(kubectl get pods ... 2>/dev/null || true)- Line 45:
STATUS=$(kubectl get pods ... 2>/dev/null || true)(in loop)- Line 57:
STATUS=$(kubectl get pods ... 2>/dev/null || true)(in loop)- Line 60:
CONTAINERS=$(kubectl get pods ... 2>/dev/null || true)(also unguarded)- Line 74:
CRONJOB=$(kubectl get cronjob ... 2>/dev/null || true)- Line 83:
PROM_POD=$(kubectl get pods ... 2>/dev/null || true)- Line 85: The pipeline with
kubectl execalso needs guarding:METRIC_COUNT=$(kubectl exec ... 2>/dev/null || true | ...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/scripts/05-verify.sh` at line 19, Several kubectl command substitutions (e.g., ISTIOD_STATUS, GW_STATUS, STATUS, CONTAINERS, CRONJOB, PROM_POD, METRIC_COUNT) are unguarded and will cause the script to exit under set -e if kubectl fails; update each substitution to append a fallback "|| true" so the command returns success on failure (for example change the assignment for ISTIOD_STATUS, both GW_STATUS occurrences, the loop STATUS assignments, CONTAINERS, CRONJOB, PROM_POD, and the METRIC_COUNT pipeline involving kubectl exec) to ensure the script continues to the final pass/fail summary even when kubectl errors occur.Labs/10-Istio/monitor.sh (3)
119-130:⚠️ Potential issue | 🟠 MajorLoki is missing from addon health/test/summary coverage (Line 119).
PR objective includes Loki observability, but addon checks and status views only track Prometheus/Grafana/Jaeger/Kiali. This leaves a monitoring blind spot.
💡 Proposed fix
- for addon in prometheus grafana jaeger kiali; do + for addon in prometheus grafana jaeger kiali loki; do @@ - kubectl get svc -n istio-system -l 'app in (prometheus,grafana,kiali,jaeger)' 2>/dev/null || - kubectl get svc -n istio-system 2>/dev/null | grep -E '(NAME|prometheus|grafana|kiali|tracing|jaeger|zipkin)' + kubectl get svc -n istio-system -l 'app in (prometheus,grafana,kiali,jaeger,loki)' 2>/dev/null || + kubectl get svc -n istio-system 2>/dev/null | grep -E '(NAME|prometheus|grafana|kiali|tracing|jaeger|zipkin|loki)' @@ - for addon in prometheus grafana kiali jaeger; do + for addon in prometheus grafana kiali jaeger loki; do @@ - for addon in kiali prometheus grafana jaeger; do + for addon in kiali prometheus grafana jaeger loki; doAlso applies to: 261-270, 324-327
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/monitor.sh` around lines 119 - 130, The addon health and service-summary blocks miss Loki; update the monitoring checks in monitor.sh to include "loki" alongside prometheus,grafana,jaeger,kiali: add "loki" to the for loop that sets STATUS (used with kubectl get pods and variables STATUS/print_info/print_error), add "loki" to the kubectl service label selector ('app in (...)') and to the fallback grep pattern (the grep -E list), and apply the same change to the other similar blocks referenced (the repeated kubectl get svc / grep sections). Ensure all occurrences of the app list and grep patterns consistently include "loki" so pod status, svc listing, and summary output cover Loki.
373-393:⚠️ Potential issue | 🔴 CriticalHandle missing CLI args safely under strict mode (Lines 373–390).
With
set -uenabled, the script exits with an unbound variable error when called without arguments, preventing interactive mode from running. The$1parameter is accessed directly in three if-conditions without checking if it's set.Proposed fix
main() { clear print_header "ISTIO + KIALI LAB MONITORING" check_prerequisites + local mode="${1:-}" - if [ "$1" = "full" ] || [ "$1" = "-f" ] || [ "$1" = "--full" ]; then + if [ "$mode" = "full" ] || [ "$mode" = "-f" ] || [ "$mode" = "--full" ]; then show_summary check_istio check_pods @@ - if [ "$1" = "test" ] || [ "$1" = "-t" ] || [ "$1" = "--test" ]; then + if [ "$mode" = "test" ] || [ "$mode" = "-t" ] || [ "$mode" = "--test" ]; then test_pipeline exit 0 fi - if [ "$1" = "summary" ] || [ "$1" = "-s" ] || [ "$1" = "--summary" ]; then + if [ "$mode" = "summary" ] || [ "$mode" = "-s" ] || [ "$mode" = "--summary" ]; then show_summary exit 0 fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/monitor.sh` around lines 373 - 393, The script accesses positional parameter $1 directly in the three conditional blocks (the branches that call show_summary, test_pipeline, etc.), which breaks under set -u when no args are provided; update those conditionals to safely handle an absent argument by using a default expansion (e.g., ${1:-}) or by first checking $# -eq 0 for interactive behavior, and then compare that safe value against "full"/"-f"/"--full", "test"/"-t"/"--test", and "summary"/"-s"/"--summary" so show_summary, check_istio, check_pods, check_addons, check_bookinfo, check_traffic, check_mtls, and test_pipeline are only invoked when a valid argument is present.
238-275:⚠️ Potential issue | 🔴 CriticalAdd
|| trueto allgrep -cpipelines to prevent script termination withpipefail.The script uses
set -euo pipefail(line 2), which causes any command returning exit code 1 to terminate execution. Whengrep -cfinds zero matches, it returns exit code 1, causing the pipeline to fail and the script to exit before your pass/fail logic runs.This affects 8 locations:
- Lines 148, 238, 248, 250, 262, 274, 329, 332
💡 Proposed fix pattern
- HAS_SIDECAR=$(echo "$line" | grep -c "istio-proxy") + HAS_SIDECAR=$(echo "$line" | grep -c "istio-proxy" || true) - ISTIOD=$(kubectl get pods -n istio-system -l app=istiod --no-headers 2>/dev/null | grep -c Running) + ISTIOD=$(kubectl get pods -n istio-system -l app=istiod --no-headers 2>/dev/null | grep -c Running || true) - GW=$(kubectl get pods -n istio-system -l istio=ingressgateway --no-headers 2>/dev/null | grep -c Running) + GW=$(kubectl get pods -n istio-system -l istio=ingressgateway --no-headers 2>/dev/null | grep -c Running || true) - RUNNING=$(kubectl get pods -n istio-system -l app=$addon --no-headers 2>/dev/null | grep -c Running) + RUNNING=$(kubectl get pods -n istio-system -l app=$addon --no-headers 2>/dev/null | grep -c Running || true) - BOOKINFO_PODS=$(kubectl get pods -n bookinfo --no-headers 2>/dev/null | grep -c Running) + BOOKINFO_PODS=$(kubectl get pods -n bookinfo --no-headers 2>/dev/null | grep -c Running || true) - SIDECAR_PODS=$(kubectl get pods -n bookinfo -o jsonpath='{range .items[*]}{.spec.containers[*].name}{"\n"}{end}' 2>/dev/null | grep -c istio-proxy) + SIDECAR_PODS=$(kubectl get pods -n bookinfo -o jsonpath='{range .items[*]}{.spec.containers[*].name}{"\n"}{end}' 2>/dev/null | grep -c istio-proxy || true)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/monitor.sh` around lines 238 - 275, The pipeline commands that assign ISTIOD, GW, and RUNNING use "grep -c" which can return exit code 1 and, under set -euo pipefail, abort the script; update each pipeline that ends with "| grep -c Running" or "| grep -c Running)" (the assignments to ISTIOD, the fallback GW assignment, the RUNNING inside the observability addons loop, and the RUNNING inside the Bookinfo app loop) to append "|| true" so the pipeline never fails on zero matches and your subsequent pass/fail logic can run.
♻️ Duplicate comments (1)
Labs/10-Istio/demo.sh (1)
60-63:⚠️ Potential issue | 🟠 MajorGuard gateway lookups from
set -eexits.Line 60 and Line 62 can still abort the script if
kubectl get svcreturns non-zero (e.g., pending/missing LB), so access info may not print.🔧 Proposed fix
- GATEWAY_IP=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null) + GATEWAY_IP=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null || true) if [ -z "$GATEWAY_IP" ]; then - GATEWAY_IP=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' 2>/dev/null) + GATEWAY_IP=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' 2>/dev/null || true) fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo.sh` around lines 60 - 63, The kubectl calls that populate GATEWAY_IP can cause the whole script to exit under set -e when the service/LoadBalancer isn't ready; update the demo.sh logic around the GATEWAY_IP assignment (the lines that run kubectl get svc istio-ingressgateway -n istio-system ...) so they are protected from errexit—either temporarily disable errexit around those commands (set +e / set -e) or ensure each kubectl invocation is made safe by appending a non-failing fallback (e.g., using || true) and then test GATEWAY_IP for emptiness before continuing; keep the variable name GATEWAY_IP and the exact kubectl get svc istio-ingressgateway command when applying the change.
🧹 Nitpick comments (6)
Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh (2)
49-56:open_urlimplicitly exits with code 1 when no browser is found.When neither
opennorxdg-openis present, the last executed command is theelif command -v xdg-opentest, which exits 1. All current call sites guard with|| true, so this is not a problem today — but it is fragile if a future call site omits the guard. Adding an explicitreturn 0makes the no-op intent self-documenting.♻️ Proposed fix
open_url() { local url="$1" if command -v open >/dev/null 2>&1; then open "${url}" >/dev/null 2>&1 || true elif command -v xdg-open >/dev/null 2>&1; then xdg-open "${url}" >/dev/null 2>&1 || true fi + return 0 }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 49 - 56, The open_url function can return a non-zero status when neither 'open' nor 'xdg-open' are found because the last executed test exits 1; update the open_url function to explicitly return 0 in the no-op path so it always exits success when it intentionally does nothing (ensure open_url has an explicit success return for the branch where neither command exists and that all other branches also end with an explicit successful return to make the behavior deterministic).
1-144: NoEXITtrap — port-forwards are left running silently on any script failure.With
set -eactive, any prerequisite check failure (lines 64–78),kubectl apply, orkubectl waitcan exit the script before the user ever sees the "Stop port-forward" hint on line 131. Port-forwards already started are then orphaned with no record of their PIDs.Consider registering an
EXITtrap early so cleanup always runs:♻️ Proposed fix – register a cleanup trap
+cleanup() { + if [[ -s "$PF_PID_FILE" ]]; then + echo "Stopping port-forwards..." + xargs kill < "$PF_PID_FILE" 2>/dev/null || true + rm -f "$PF_PID_FILE" + fi +} +trap cleanup EXIT + mkdir -p "$RUNTIME_DIR" >/dev/null 2>&1 || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 1 - 144, Add a cleanup function (e.g., cleanup_port_forwards) and register it with a trap (trap cleanup_port_forwards EXIT INT TERM) early in the script (after PF_PID_FILE is defined) so port-forwards are always terminated on script exit; the cleanup should check PF_PID_FILE exists and non-empty, kill the PIDs listed (xargs kill or loop kill) while ignoring already-dead PIDs, and remove the PF_PID_FILE; ensure start_port_forward continues to append PIDs to PF_PID_FILE so the trap can find them.Labs/10-Istio/demo/01-basic-setup/demo.sh (3)
144-144:bash -lcunnecessarily sources login profile scripts.A login shell (
-l) loads/etc/profile,~/.bash_profile, and similar files. For a simplecurlloop this can cause unintended side-effects (proxy settings,PATHoverrides, aliases). Usebash -cinstead.♻️ Proposed fix
- nohup bash -lc 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 & + nohup bash -c 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 &🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` at line 144, Replace the login shell invocation in the background traffic loop so it doesn't source profile scripts: in the nohup line that writes to "$TRAFFIC_LOG_FILE" (the command starting with nohup bash -lc 'while true; do curl -fsS ...'), change the shell flag from -l to -c (i.e., use bash -c) to avoid loading login profiles and unintended side-effects while keeping the same curl loop and redirection behavior.
68-77:is_port_listeningsilently skips detection on systems withoutlsof.The
else return 1fallback treats the port as not listening wheneverlsofis absent (common on minimal Linux/Alpine/WSL images). If the port is actually occupied,kubectl port-forwardwill fail silently — the error lands in the log file rather than on the terminal.ss(iproute2, standard on modern Linux) ornetstatare widely available alternatives.♻️ Proposed fallback chain
is_port_listening() { local port="$1" if command -v lsof >/dev/null 2>&1; then lsof -nP -iTCP:"${port}" -sTCP:LISTEN >/dev/null 2>&1 + elif command -v ss >/dev/null 2>&1; then + ss -lnt "sport = :${port}" 2>/dev/null | grep -q ":${port}" + elif command -v netstat >/dev/null 2>&1; then + netstat -lnt 2>/dev/null | grep -q ":${port}[[:space:]]" else return 1 fi }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 68 - 77, The is_port_listening function currently returns 1 if lsof is missing which silently skips detection; update is_port_listening to try alternatives: first use ss -ltn "(or ss -nlt sport = :PORT)" to check TCP LISTEN, then fall back to netstat -tnl if ss is unavailable, and only then return 1; keep the function signature and exit codes the same (0 if listening, non‑zero otherwise) and ensure you quote/expand the port variable (local port="$1") when invoking the chosen command so kubectl port-forward failures are detected correctly.
104-122:start_port_forwardassumesRUNTIME_DIRalready exists.Line 121 appends to
$PF_PID_FILEwithout ensuring$RUNTIME_DIRexists. This is safe in the current call order (bothcleanup_previous_demoand the explicitstop_previous_port_forwardsat line 330 create the directory first), but the function is fragile if called in isolation.🛡️ Proposed defensive guard
start_port_forward() { local namespace="$1" local svc="$2" local local_port="$3" local remote_port="$4" local log_file="$RUNTIME_DIR/port-forward-${namespace}-${svc}-${local_port}.log" + + mkdir -p "$RUNTIME_DIR" >/dev/null 2>&1 || true if is_port_listening "${local_port}"; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 104 - 122, The start_port_forward function assumes RUNTIME_DIR exists before creating log files and appending to PF_PID_FILE; make it defensive by ensuring the directory exists (e.g., mkdir -p "$RUNTIME_DIR" and check for success) before computing log_file or writing the PID, and handle failure (return non-zero and log an error) so standalone calls to start_port_forward don't fail silently; update start_port_forward (and any code that writes PF_PID_FILE) rather than relying on cleanup_previous_demo / stop_previous_port_forwards to create RUNTIME_DIR.Labs/10-Istio/scripts/01-install-istio.sh (1)
25-31: Scope cluster-wide pruning more defensively.The current name-based matching (
grep -i istio) can delete unrelated Istio resources on shared clusters when Helm annotations are missing or different.🔧 Proposed safety guard
+ # Optional: only allow broad prune when explicitly enabled. + : "${ALLOW_BROAD_ISTIO_PRUNE:=false}" + if [[ "${ALLOW_BROAD_ISTIO_PRUNE}" != "true" ]]; then + print_warning "Skipping broad Istio resource prune (set ALLOW_BROAD_ISTIO_PRUNE=true to enable)." + else for crd in $(kubectl get crd -o name 2>/dev/null | grep istio.io || true); do release=$(kubectl get "$crd" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true) if [ "$release" != "istio-base" ]; then print_info "Removing $crd (not managed by Helm istio-base)" kubectl delete "$crd" --ignore-not-found 2>/dev/null || true fi done + fiAlso applies to: 33-39, 41-47, 67-73, 74-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/scripts/01-install-istio.sh` around lines 25 - 31, The loop that prunes CRDs uses name-based matching and will delete resources lacking Helm annotations; change the logic in the block that iterates over crd so it only considers CRDs that explicitly carry the Helm release annotation metadata (metadata.annotations["meta.helm.sh/release-name"]) — skip any CRD with no such annotation instead of deleting by name; then delete only when release exists and is not "istio-base" (use the existing release variable) and retain the current --ignore-not-found safety flags; apply the same defensive change to the other similar loops/blocks that use crd/name-based grep so they all require the Helm annotation before deletion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Labs/10-Istio/monitor.sh`:
- Line 369: The unguarded clear call in main() can fail in non-interactive/CI
environments (with set -e), so wrap the clear invocation in a guard that only
runs it when a TTY/terminal is available; update main() so it checks for a
terminal (e.g., using -t on stdout/stderr or a non-empty TERM) before calling
clear, and skip clear otherwise to avoid script exit in automated runs.
In `@Labs/10-Istio/scripts/05-verify.sh`:
- Around line 10-11: The addon-health check loop currently omits Loki; update
the addons list used by the verification loop (e.g., the ADDONS or addons array)
to include "loki" and ensure the same health-check function (e.g.,
verify_addons, check_addon, or the loop that inspects kubectl/helm status)
validates Loki's Deployment/DaemonSet/StatefulSet and corresponding Service/Pod
readiness. Add "loki" to any switch/case or conditional that maps addon names to
their k8s resource checks so a failing Loki deployment will cause the script to
mark the addons step as failed.
In `@Labs/10-Istio/scripts/common.sh`:
- Around line 24-25: The print_error function currently uses echo which writes
to stdout; change its implementation so the formatted error string is written to
stderr (e.g., use echo -e "…"$RED…"$NC …" >&2 or use printf "…\n" >&2) so the
behavior matches the comment and other logging functions; update the function
body of print_error accordingly.
---
Outside diff comments:
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Around line 90-97: The rm invocation that removes PF_PID_FILE inside the
conditional block can fail under set -euo pipefail and currently lacks the same
defensive redirect-and-|| true pattern used elsewhere; update the rm call that
references PF_PID_FILE in the block (after the while loop that reads and kills
PIDs) to append >/dev/null 2>&1 || true so the removal failure won't abort the
script while keeping behavior consistent with other rm calls.
- Around line 342-345: The inline hints printed by the echo statements
incorrectly reference log files as ".port-forward-...log" and omit the
RUNTIME_DIR where start_port_forward actually writes logs; update the two echo
messages in demo.sh (the lines that output "Kiali: not reachable yet (...)" and
"Bookinfo: not reachable yet (...)") to reference the correct runtime path by
including $RUNTIME_DIR and removing the leading dot so they match the naming
used by start_port_forward (port-forward-<ns>-<svc>-<port>.log).
- Around line 228-231: Replace the broad namespace loop that labels every
non-system namespace (the for ... kubectl label namespace "$ns"
istio-injection=enabled --overwrite block) with explicit labeling of only the
namespaces needed for the demo (e.g., run kubectl label namespace default
istio-injection=enabled --overwrite and kubectl label namespace demo
istio-injection=enabled --overwrite) and update the cleanup logic to remove
those labels (use kubectl label namespace <name> istio-injection- to unset) so
you don’t affect unrelated user namespaces.
- Line 323: The cleanup hint in demo.sh currently suggests only running istioctl
uninstall --purge -y and kubectl delete ns istio-system demo but omits removing
the istio-injection label(s); update the echo string to instruct removing the
label (e.g., kubectl label namespace default istio-injection-) and mention
removing the label from any other namespaces you labeled in the loop so pods
stop receiving sidecars after Istio is uninstalled.
- Around line 265-294: The step comments in demo.sh are out of sequence (e.g.,
"Step 11: Creating demo VirtualService for reviews service...", then "Step 14:
Creating demo namespace...", "Step 15: Deploying Nginx...", "Step 16: Deploying
HTTPD...", "Step 17: Verifying demo deployment..."); update these echo step
numbers to be sequential to match the lab flow (renumber them to fill the
missing 10,12,13 as appropriate) so the messages that reference reviews-vs.yaml,
demo-namespace.yaml, nginx-demo.yaml, httpd-demo.yaml and the final verification
reflect a continuous sequence; ensure all "Step X:" prefixes in the apply/wait
blocks are renumbered consistently and any cross-references in the script or
docs are adjusted to the new numbering.
In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh`:
- Around line 18-22: The is_port_listening guard currently treats missing lsof
as "not listening", causing start_port_forward to launch duplicate kubectl
port-forwards; update is_port_listening to try ss and then netstat when lsof is
absent (checking for their commands and using appropriate flags to test LISTEN
on "${port}"), and if none are present log a clear warning/error stating the
port-check guard is disabled so the user knows duplicate port-forwards may
occur; ensure start_port_forward still respects the boolean result from
is_port_listening.
- Around line 111-112: The script currently uses touch "$PF_PID_FILE" which
preserves old PIDs; change this to truncate/create PF_PID_FILE before starting
port forwards so stale PIDs don't accumulate and get killed by xargs kill <
$PF_PID_FILE; update the block around the touch and start_port_forward calls
(referencing PF_PID_FILE and start_port_forward) to ensure PF_PID_FILE is
emptied/created fresh at runtime (use a truncating open/redirect) before any
PIDs are appended.
- Line 138: The cleanup echo currently prints relative paths ("kubectl delete -f
manifests/...") which fails if the user isn't in the script's working dir;
update the echo in demo.sh that prints the cleanup command so it uses absolute
paths (e.g. expand to "${WORK_DIR}/manifests/02‑istio‑routing.yaml" and
"${WORK_DIR}/manifests/01‑app.yaml" or compute the script dir via a variable and
use that) instead of bare "manifests/..." so the printed hint is copy‑paste safe
from any CWD.
In `@Labs/10-Istio/monitor.sh`:
- Around line 119-130: The addon health and service-summary blocks miss Loki;
update the monitoring checks in monitor.sh to include "loki" alongside
prometheus,grafana,jaeger,kiali: add "loki" to the for loop that sets STATUS
(used with kubectl get pods and variables STATUS/print_info/print_error), add
"loki" to the kubectl service label selector ('app in (...)') and to the
fallback grep pattern (the grep -E list), and apply the same change to the other
similar blocks referenced (the repeated kubectl get svc / grep sections). Ensure
all occurrences of the app list and grep patterns consistently include "loki" so
pod status, svc listing, and summary output cover Loki.
- Around line 373-393: The script accesses positional parameter $1 directly in
the three conditional blocks (the branches that call show_summary,
test_pipeline, etc.), which breaks under set -u when no args are provided;
update those conditionals to safely handle an absent argument by using a default
expansion (e.g., ${1:-}) or by first checking $# -eq 0 for interactive behavior,
and then compare that safe value against "full"/"-f"/"--full",
"test"/"-t"/"--test", and "summary"/"-s"/"--summary" so show_summary,
check_istio, check_pods, check_addons, check_bookinfo, check_traffic,
check_mtls, and test_pipeline are only invoked when a valid argument is present.
- Around line 238-275: The pipeline commands that assign ISTIOD, GW, and RUNNING
use "grep -c" which can return exit code 1 and, under set -euo pipefail, abort
the script; update each pipeline that ends with "| grep -c Running" or "| grep
-c Running)" (the assignments to ISTIOD, the fallback GW assignment, the RUNNING
inside the observability addons loop, and the RUNNING inside the Bookinfo app
loop) to append "|| true" so the pipeline never fails on zero matches and your
subsequent pass/fail logic can run.
In `@Labs/10-Istio/scripts/05-verify.sh`:
- Line 19: Several kubectl command substitutions (e.g., ISTIOD_STATUS,
GW_STATUS, STATUS, CONTAINERS, CRONJOB, PROM_POD, METRIC_COUNT) are unguarded
and will cause the script to exit under set -e if kubectl fails; update each
substitution to append a fallback "|| true" so the command returns success on
failure (for example change the assignment for ISTIOD_STATUS, both GW_STATUS
occurrences, the loop STATUS assignments, CONTAINERS, CRONJOB, PROM_POD, and the
METRIC_COUNT pipeline involving kubectl exec) to ensure the script continues to
the final pass/fail summary even when kubectl errors occur.
---
Duplicate comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 60-63: The kubectl calls that populate GATEWAY_IP can cause the
whole script to exit under set -e when the service/LoadBalancer isn't ready;
update the demo.sh logic around the GATEWAY_IP assignment (the lines that run
kubectl get svc istio-ingressgateway -n istio-system ...) so they are protected
from errexit—either temporarily disable errexit around those commands (set +e /
set -e) or ensure each kubectl invocation is made safe by appending a
non-failing fallback (e.g., using || true) and then test GATEWAY_IP for
emptiness before continuing; keep the variable name GATEWAY_IP and the exact
kubectl get svc istio-ingressgateway command when applying the change.
---
Nitpick comments:
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Line 144: Replace the login shell invocation in the background traffic loop so
it doesn't source profile scripts: in the nohup line that writes to
"$TRAFFIC_LOG_FILE" (the command starting with nohup bash -lc 'while true; do
curl -fsS ...'), change the shell flag from -l to -c (i.e., use bash -c) to
avoid loading login profiles and unintended side-effects while keeping the same
curl loop and redirection behavior.
- Around line 68-77: The is_port_listening function currently returns 1 if lsof
is missing which silently skips detection; update is_port_listening to try
alternatives: first use ss -ltn "(or ss -nlt sport = :PORT)" to check TCP
LISTEN, then fall back to netstat -tnl if ss is unavailable, and only then
return 1; keep the function signature and exit codes the same (0 if listening,
non‑zero otherwise) and ensure you quote/expand the port variable (local
port="$1") when invoking the chosen command so kubectl port-forward failures are
detected correctly.
- Around line 104-122: The start_port_forward function assumes RUNTIME_DIR
exists before creating log files and appending to PF_PID_FILE; make it defensive
by ensuring the directory exists (e.g., mkdir -p "$RUNTIME_DIR" and check for
success) before computing log_file or writing the PID, and handle failure
(return non-zero and log an error) so standalone calls to start_port_forward
don't fail silently; update start_port_forward (and any code that writes
PF_PID_FILE) rather than relying on cleanup_previous_demo /
stop_previous_port_forwards to create RUNTIME_DIR.
In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh`:
- Around line 49-56: The open_url function can return a non-zero status when
neither 'open' nor 'xdg-open' are found because the last executed test exits 1;
update the open_url function to explicitly return 0 in the no-op path so it
always exits success when it intentionally does nothing (ensure open_url has an
explicit success return for the branch where neither command exists and that all
other branches also end with an explicit successful return to make the behavior
deterministic).
- Around line 1-144: Add a cleanup function (e.g., cleanup_port_forwards) and
register it with a trap (trap cleanup_port_forwards EXIT INT TERM) early in the
script (after PF_PID_FILE is defined) so port-forwards are always terminated on
script exit; the cleanup should check PF_PID_FILE exists and non-empty, kill the
PIDs listed (xargs kill or loop kill) while ignoring already-dead PIDs, and
remove the PF_PID_FILE; ensure start_port_forward continues to append PIDs to
PF_PID_FILE so the trap can find them.
In `@Labs/10-Istio/scripts/01-install-istio.sh`:
- Around line 25-31: The loop that prunes CRDs uses name-based matching and will
delete resources lacking Helm annotations; change the logic in the block that
iterates over crd so it only considers CRDs that explicitly carry the Helm
release annotation metadata (metadata.annotations["meta.helm.sh/release-name"])
— skip any CRD with no such annotation instead of deleting by name; then delete
only when release exists and is not "istio-base" (use the existing release
variable) and retain the current --ignore-not-found safety flags; apply the same
defensive change to the other similar loops/blocks that use crd/name-based grep
so they all require the Helm annotation before deletion.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
Labs/10-Istio/README.mdLabs/10-Istio/demo.shLabs/10-Istio/demo/01-basic-setup/demo.shLabs/10-Istio/demo/02-traffic-splitting-3pods/demo.shLabs/10-Istio/manifests/addons/grafana.yamlLabs/10-Istio/manifests/addons/kiali.yamlLabs/10-Istio/manifests/addons/loki.yamlLabs/10-Istio/manifests/addons/prometheus.yamlLabs/10-Istio/manifests/bookinfo-gateway.yamlLabs/10-Istio/manifests/ingress.yamlLabs/10-Istio/monitor.shLabs/10-Istio/scripts/01-install-istio.shLabs/10-Istio/scripts/02-install-addons.shLabs/10-Istio/scripts/03-deploy-bookinfo.shLabs/10-Istio/scripts/04-traffic-generator.shLabs/10-Istio/scripts/05-verify.shLabs/10-Istio/scripts/common.sh
💤 Files with no reviewable changes (1)
- Labs/10-Istio/manifests/ingress.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
- Labs/10-Istio/manifests/addons/grafana.yaml
- Labs/10-Istio/scripts/04-traffic-generator.sh
- Labs/10-Istio/manifests/addons/loki.yaml
- Labs/10-Istio/scripts/03-deploy-bookinfo.sh
| # Entry point: run full/test/summary report or start interactive menu. | ||
| # Args: $@ - Optional mode: full, test, summary, or none for menu. | ||
| main() { | ||
| clear |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the file size and context
wc -l Labs/10-Istio/monitor.shRepository: nirgeier/KubernetesLabs
Length of output: 95
🏁 Script executed:
# Examine the script around line 369 and check for 'set -e'
sed -n '1,30p' Labs/10-Istio/monitor.sh
sed -n '360,375p' Labs/10-Istio/monitor.shRepository: nirgeier/KubernetesLabs
Length of output: 1395
🏁 Script executed:
# Search for set -e in the script
grep -n 'set -e' Labs/10-Istio/monitor.shRepository: nirgeier/KubernetesLabs
Length of output: 86
🏁 Script executed:
# Get exact context around line 369
sed -n '365,375p' Labs/10-Istio/monitor.shRepository: nirgeier/KubernetesLabs
Length of output: 395
🏁 Script executed:
# Check if clear is called in any conditional or if-statements
grep -n -B2 -A2 'clear' Labs/10-Istio/monitor.shRepository: nirgeier/KubernetesLabs
Length of output: 333
🏁 Script executed:
# Check the main function execution flow to understand when clear is called
sed -n '367,400p' Labs/10-Istio/monitor.shRepository: nirgeier/KubernetesLabs
Length of output: 769
Guard clear at line 369 to prevent script failure in non-interactive environments.
The script has set -e enabled (line 2) and calls clear unconditionally in the main() function (line 369). When invoked with non-interactive modes (./monitor.sh full, ./monitor.sh test, ./monitor.sh summary) in CI environments or containers where TERM is unset, clear fails. With set -e active, this failure causes the entire script to exit before any monitoring checks run, making the script non-functional in automated/non-TTY contexts.
Proposed fix
- clear
+ if [ -t 1 ] && [ -n "${TERM:-}" ]; then
+ clear
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| clear | |
| if [ -t 1 ] && [ -n "${TERM:-}" ]; then | |
| clear | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Labs/10-Istio/monitor.sh` at line 369, The unguarded clear call in main() can
fail in non-interactive/CI environments (with set -e), so wrap the clear
invocation in a guard that only runs it when a TTY/terminal is available; update
main() so it checks for a terminal (e.g., using -t on stdout/stderr or a
non-empty TERM) before calling clear, and skip clear otherwise to avoid script
exit in automated runs.
| # Verify Istio control plane, ingress gateway, addons, Bookinfo, traffic | ||
| # generator, and Prometheus metrics. Prints pass/fail summary. |
There was a problem hiding this comment.
Include Loki in the addon health checks.
The verification scope now includes observability addons, but Loki is still not validated in the addon loop. A failed Loki deployment can currently pass verification.
🔧 Proposed fix
- for addon in prometheus grafana kiali jaeger; do
+ for addon in prometheus grafana kiali jaeger; do
STATUS=$(kubectl get pods -n istio-system -l app=$addon -o jsonpath='{.items[0].status.phase}' 2>/dev/null)
if [ "$STATUS" = "Running" ]; then
print_success "$addon: Running"
else
print_error "$addon: ${STATUS:-Not Found}"
errors=$((errors + 1))
fi
done
+
+ LOKI_STATUS=$(kubectl get pods -n istio-system -l app.kubernetes.io/name=loki -o jsonpath='{.items[0].status.phase}' 2>/dev/null || true)
+ if [ "$LOKI_STATUS" = "Running" ]; then
+ print_success "loki: Running"
+ else
+ print_error "loki: ${LOKI_STATUS:-Not Found}"
+ errors=$((errors + 1))
+ fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Labs/10-Istio/scripts/05-verify.sh` around lines 10 - 11, The addon-health
check loop currently omits Loki; update the addons list used by the verification
loop (e.g., the ADDONS or addons array) to include "loki" and ensure the same
health-check function (e.g., verify_addons, check_addon, or the loop that
inspects kubectl/helm status) validates Loki's Deployment/DaemonSet/StatefulSet
and corresponding Service/Pod readiness. Add "loki" to any switch/case or
conditional that maps addon names to their k8s resource checks so a failing Loki
deployment will cause the script to mark the addons step as failed.
| # Print an error message to stderr. | ||
| print_error() { echo -e "${RED}[ERROR]${NC} $1"; } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "common.sh" -path "*/Istio/*" 2>/dev/null | head -20Repository: nirgeier/KubernetesLabs
Length of output: 49
🏁 Script executed:
if [ -f "Labs/10-Istio/scripts/common.sh" ]; then
echo "=== File found at Labs/10-Istio/scripts/common.sh ==="
cat -n "Labs/10-Istio/scripts/common.sh" | head -30
else
echo "File not found at expected path"
fiRepository: nirgeier/KubernetesLabs
Length of output: 1203
🏁 Script executed:
# Search for all usages of print_error in the repository
rg "print_error" --type sh -B 2 -A 2Repository: nirgeier/KubernetesLabs
Length of output: 23887
print_error outputs to stdout instead of stderr.
The function comment declares it should print to stderr, but the implementation uses echo without redirection, which outputs to stdout. This should be fixed to align the implementation with the documented behavior:
Fix
-print_error() { echo -e "${RED}[ERROR]${NC} $1"; }
+print_error() { echo -e "${RED}[ERROR]${NC} $1" >&2; }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Print an error message to stderr. | |
| print_error() { echo -e "${RED}[ERROR]${NC} $1"; } | |
| # Print an error message to stderr. | |
| print_error() { echo -e "${RED}[ERROR]${NC} $1" >&2; } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Labs/10-Istio/scripts/common.sh` around lines 24 - 25, The print_error
function currently uses echo which writes to stdout; change its implementation
so the formatted error string is written to stderr (e.g., use echo -e
"…"$RED…"$NC …" >&2 or use printf "…\n" >&2) so the behavior matches the comment
and other logging functions; update the function body of print_error
accordingly.
…ripts will run on both Mac and Ubuntu. Signed-off-by: “Tal Orlik” <talorlik@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
Labs/10-Istio/scripts/05-verify.sh (1)
61-65:⚠️ Potential issue | 🟠 MajorMissing sidecar should fail verification, not just warn.
A pod running without
istio-proxycurrently does not incrementerrors, so the final status can still pass with broken injection.🔧 Proposed fix
if echo "$CONTAINERS" | grep -q "istio-proxy"; then print_success "$app: Running (sidecar injected)" else - print_warning "$app: Running (NO sidecar!)" + print_error "$app: Running (NO sidecar!)" + errors=$((errors + 1)) fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/scripts/05-verify.sh` around lines 61 - 65, The verification should treat missing Istio sidecars as an error, not just a warning: in the block that checks CONTAINERS for "istio-proxy" (the if/else using print_success/print_warning), change the else branch to increment the global error counter (errors) and call print_error (or at minimum still call print_warning but also do errors=$((errors+1))) so missing sidecars are counted toward failure; update the else branch where print_warning "$app: Running (NO sidecar!)" is emitted to also increment errors so the final status fails when injection is broken.Labs/10-Istio/demo/01-basic-setup/demo.sh (1)
378-378:⚠️ Potential issue | 🟡 Minor
$(cat $TRAFFIC_PID_FILE ...)is eagerly expanded atechotime, baking in a stale PID.The resulting printed command contains a literal integer (e.g.,
kill "12345") rather than a deferred file reference. If copied by the user after the process is long gone — or if the traffic generator was never started — the command is either stale or degrades tokill ""(which exits non-zero).🐛 Proposed fix — use single quotes to defer evaluation
-echo "To stop them: xargs kill < $PF_PID_FILE; kill \"$(cat $TRAFFIC_PID_FILE 2>/dev/null)\"" +echo "To stop them: xargs kill < $PF_PID_FILE; [ -f \"$TRAFFIC_PID_FILE\" ] && kill \"\$(cat \"$TRAFFIC_PID_FILE\")\""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` at line 378, The echo currently expands $(cat $TRAFFIC_PID_FILE 2>/dev/null) at echo-time, producing a stale or empty PID; change the echo so the command substitution is not evaluated (e.g., print the literal command by using single quotes or escaping the $/() sequence) so the printed instruction references $PF_PID_FILE and $TRAFFIC_PID_FILE verbatim; update the echo that contains PF_PID_FILE and TRAFFIC_PID_FILE to quote/escape the substitution to defer evaluation until the user runs the printed command.Labs/10-Istio/monitor.sh (2)
373-394:⚠️ Potential issue | 🟡 MinorHandle unknown CLI modes explicitly instead of entering interactive mode.
For non-empty invalid args, current flow drops into
readprompts, which can hang automation unexpectedly.Proposed fix
if [ "$mode" = "summary" ] || [ "$mode" = "-s" ] || [ "$mode" = "--summary" ]; then show_summary exit 0 fi + + if [ -n "$mode" ]; then + print_error "Unknown mode: $mode" + echo "Usage: $0 [full|test|summary]" + exit 1 + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/monitor.sh` around lines 373 - 394, The script currently treats any unrecognized non-empty CLI argument as empty and falls through into interactive prompts; update the top-level mode handling (the local variable "mode" and the branches that call show_summary, test_pipeline, etc.) to explicitly detect invalid non-empty modes: after the existing "full", "test", and "summary" checks add an else-if that checks if "$mode" is non-empty and not one of the recognized flags, then print a brief error/usage message (e.g., "Invalid mode: $mode. Valid modes: full,test,summary") to stderr and exit with a non-zero code to prevent entering interactive read prompts. Ensure the conditional references the same "mode" variable and returns non-zero so automation won't hang.
199-201:⚠️ Potential issue | 🟠 MajorCronJob probe failures will terminate the script under strict mode.
The script runs with
set -euo pipefail(line 2), which causes immediate exit on any command failure. Lines 199 and 301 executekubectl get cronjobwithout a fallback mechanism. Although stderr is suppressed via2>/dev/null, the exit code is not—so when the CronJob doesn't exist, the command fails with a non-zero exit code, terminating the script before the fallback logic at lines 203 and 305 can execute.Add
|| trueto both commands to allow the script to continue and properly handle missing CronJobs:Proposed fix
- CRONJOB=$(kubectl get cronjob -n traffic-gen traffic-generator -o jsonpath='{.spec.schedule}' 2>/dev/null) + CRONJOB=$(kubectl get cronjob -n traffic-gen traffic-generator -o jsonpath='{.spec.schedule}' 2>/dev/null || true) @@ - TG=$(kubectl get cronjob -n traffic-gen traffic-generator --no-headers 2>/dev/null | wc -l | tr -d ' ') + TG=$(kubectl get cronjob -n traffic-gen traffic-generator --no-headers 2>/dev/null | wc -l | tr -d ' ' || true) + TG="${TG:-0}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/monitor.sh` around lines 199 - 201, The CronJob probe commands that populate CRONJOB and the later cronjob lookup for LAST_JOB call kubectl and can exit non-zero under set -euo pipefail, which will abort the script when the CronJob is absent; update the two kubectl invocations that assign CRONJOB and the later cronjob check to append "|| true" so the command returns success on missing resources and the subsequent fallback logic (variables CRONJOB and LAST_JOB and the surrounding if checks) can handle absence gracefully.
♻️ Duplicate comments (2)
Labs/10-Istio/scripts/05-verify.sh (1)
44-52:⚠️ Potential issue | 🟠 MajorLoki is still omitted from addon health verification.
The addon loop still excludes Loki, so this step can report healthy while Loki failed to deploy.
🔧 Proposed fix
for addon in prometheus grafana kiali jaeger; do STATUS=$(kubectl get pods -n istio-system -l app=$addon -o jsonpath='{.items[0].status.phase}' 2>/dev/null) || true if [ "$STATUS" = "Running" ]; then print_success "$addon: Running" else print_error "$addon: ${STATUS:-Not Found}" errors=$((errors + 1)) fi done + + LOKI_STATUS=$(kubectl get pods -n istio-system -l app.kubernetes.io/name=loki -o jsonpath='{.items[0].status.phase}' 2>/dev/null) || true + if [ "$LOKI_STATUS" = "Running" ]; then + print_success "loki: Running" + else + print_error "loki: ${LOKI_STATUS:-Not Found}" + errors=$((errors + 1)) + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/scripts/05-verify.sh` around lines 44 - 52, The health-check loop for addons (for addon in prometheus grafana kiali jaeger) omits Loki, so update the loop to include "loki" (e.g., for addon in prometheus grafana kiali jaeger loki) so that the existing check that uses STATUS, print_success, print_error and increments errors covers Loki as well; ensure the same jsonpath and namespace logic is used for the new "loki" entry.Labs/10-Istio/monitor.sh (1)
369-369:⚠️ Potential issue | 🟠 MajorGuard
clearinmain()to avoid non-interactive aborts.With strict mode enabled, Line 369 can fail in CI/non-TTY environments (e.g., missing
TERM) and exit beforefull/test/summaryexecution.Proposed fix
- clear + if [ -t 1 ] && [ -n "${TERM:-}" ]; then + clear + fi @@ - clear + if [ -t 1 ] && [ -n "${TERM:-}" ]; then + clear + fi#!/bin/bash # Verify strict mode + unguarded clear call sites. rg -n -C2 'set -euo pipefail|^\s*clear\s*$' Labs/10-Istio/monitor.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/monitor.sh` at line 369, The unguarded clear call in main() can fail in non-interactive/CI environments and abort the script; wrap the clear invocation (the plain clear call in main()) with a condition that ensures it only runs in an interactive TTY (e.g., test -t 1) and/or when TERM is set, so the script continues to execute full/test/summary in CI; update main() to perform this guard check before calling clear.
🧹 Nitpick comments (4)
Labs/10-Istio/demo/01-basic-setup/demo.sh (3)
158-167: Inconsistency:start_traffic_generatorcallsnohupunconditionally, unlikestart_port_forward.
start_port_forward(lines 137–141) guardsnohupwithcommand -v nohupand falls back to a bare background process.start_traffic_generatordoes not, so a missingnohupwould abort the script underset -euo pipefail.♻️ Proposed fix
- nohup bash -c 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 & + if command -v nohup >/dev/null 2>&1; then + nohup bash -c 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 & + else + bash -c 'while true; do curl -fsS http://127.0.0.1:8080/productpage >/dev/null; curl -fsS http://127.0.0.1:8080/api/v1/products/0/reviews >/dev/null; sleep 1; done' >"$TRAFFIC_LOG_FILE" 2>&1 & + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 158 - 167, start_traffic_generator currently uses nohup unconditionally which will cause the script to abort under set -euo pipefail if nohup is missing; modify start_traffic_generator to mirror start_port_forward by checking for nohup via command -v nohup and using nohup ... & only if present, otherwise fall back to launching the same bash -c loop in the background without nohup, keeping the same redirection to TRAFFIC_LOG_FILE and saving the background PID to TRAFFIC_PID_FILE; reference the start_traffic_generator function and the variables TRAFFIC_LOG_FILE and TRAFFIC_PID_FILE when making the change.
100-104: Optional: cacheTRAFFIC_PID_FILEcontents in a local variable to avoid the doublecat.
catis called twice on the same file within the TOCTOU window of thekill -0check and thekillitself.♻️ Proposed refactor
if [ -f "$TRAFFIC_PID_FILE" ]; then - if kill -0 "$(cat "$TRAFFIC_PID_FILE" 2>/dev/null)" 2>/dev/null; then - kill "$(cat "$TRAFFIC_PID_FILE" 2>/dev/null)" 2>/dev/null || true + local traffic_pid + traffic_pid=$(cat "$TRAFFIC_PID_FILE" 2>/dev/null) + if kill -0 "${traffic_pid}" 2>/dev/null; then + kill "${traffic_pid}" 2>/dev/null || true fi rm -f "$TRAFFIC_PID_FILE" "$TRAFFIC_LOG_FILE" >/dev/null 2>&1 || true fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 100 - 104, The script uses cat twice on "$TRAFFIC_PID_FILE" between the kill -0 check and the kill call, creating a TOCTOU race and redundant IO; cache the PID by reading it once into a local variable (e.g., pid) before the kill -0 test, use that variable for both the kill -0 and kill calls, and still handle failures gracefully and remove TRAFFIC_PID_FILE and TRAFFIC_LOG_FILE as before; update the block that references TRAFFIC_PID_FILE and TRAFFIC_LOG_FILE accordingly.
246-251: Redundant label:demonamespace labeled at Step 05 before it exists, then again at Step 11.Line 249 silently no-ops (namespace absent), while line 291 is the real effective label application. Drop line 249's
demolabeling to reduce noise.♻️ Proposed fix
echo "Step 05: Enabling Istio injection in default and demo namespaces..." kubectl label namespace default istio-injection=enabled --overwrite -kubectl label namespace demo istio-injection=enabled --overwrite 2>/dev/null || true echo "Injection enabled."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 246 - 251, Remove the redundant labeling of the demo namespace in demo.sh: delete the kubectl command that runs "kubectl label namespace demo istio-injection=enabled --overwrite 2>/dev/null || true" in the Step 05 block (the earlier kubectl label call for namespace demo), since the demo namespace is created later and the actual label is applied in Step 11; keep the default namespace labeling and the later demo labeling in Step 11.Labs/10-Istio/monitor.sh (1)
25-63: Consider sourcing shared helpers fromscripts/common.shto avoid drift.
print_header()andcheck_prerequisites()are duplicated here while similar implementations already exist inLabs/10-Istio/scripts/common.sh(including broader prereq checks). Consolidating would reduce maintenance divergence.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Labs/10-Istio/monitor.sh` around lines 25 - 63, Replace the duplicated helper implementations by sourcing the shared common helpers and removing the local defs: add a single source of the shared common.sh at the top of this script and delete the local print_header() and check_prerequisites() implementations; ensure the script relies on the shared print_header and check_prerequisites symbols, adjust any call sites if the shared check_prerequisites has a broader signature, and run a quick smoke test (and shellcheck) to confirm no regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Labs/10-Istio/demo.sh`:
- Around line 77-80: In the for-loop that builds HOSTS_NEEDED, replace the plain
grep check that can match regexes and commented lines with a fixed-string,
non-commented-line check; for example, change the condition that uses grep -q
"$host" /etc/hosts to an awk invocation that ignores lines starting with # and
searches for the literal host (e.g., use awk -v h="$host" '!/^[[:space:]]*#/ &&
index($0,h){exit 0} END{exit 1}' /etc/hosts) so HOSTS_NEEDED is only left
unchanged when an actual, non-commented hosts entry for the literal host exists.
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Around line 361-366: The single short sleep (sleep 2) before probing
port-forward readiness is unreliable; replace it with a lightweight retry loop
that attempts the two curl probes (the Kiali check to
http://127.0.0.1:20001/kiali/ and the Bookinfo check to
http://127.0.0.1:8080/productpage) a few times with small waits (e.g., 5–10s)
and a max retry count, exiting the loop early when each service becomes
reachable and printing the existing "reachable" or "not reachable yet (check
${RUNTIME_DIR}/...log)" messages; update the block in demo.sh that currently
contains sleep 2 and the curl checks to use this retry logic so port-forward has
time to bind before giving up.
In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh`:
- Around line 18-21: When backgrounding kubectl port-forward, don't write $! to
PF_PID_FILE unconditionally; capture the background PID (pid=$!) then verify the
process is still the port-forward (alive) before persisting it by using a
liveliness check (e.g., kill -0 "$pid" >/dev/null 2>&1 or a brief sleep+check)
and only append to PF_PID_FILE if the check succeeds; apply the same change to
both places where port-forward PIDs are recorded and ensure the cleanup loop
that reads PF_PID_FILE still skips empty lines and only kills recorded PIDs (as
it already does).
In `@Labs/10-Istio/scripts/01-install-istio.sh`:
- Around line 25-50: The current guard '[ -n "$release" ] || continue' skips
resources that lack a Helm release annotation, leaving stale Istio objects;
update each cleanup loop (the for crd in $(kubectl get crd ...), for webhook in
$(kubectl get validatingwebhookconfiguration ...), for webhook in $(kubectl get
mutatingwebhookconfiguration ...), and the clusterrole/clusterrolebinding
cleanup loops) to delete resources when the release is empty or not equal to
"istio-base" (e.g. replace the guard with a conditional like: if [ -z "$release"
] || [ "$release" != "istio-base" ]; then print_info ...; kubectl delete ...;
fi) so unannotated conflicting resources are removed.
In `@Labs/10-Istio/scripts/05-verify.sh`:
- Around line 85-86: The current check stores METRIC_COUNT by grepping for
`"value"` in the Prometheus HTTP response (involving PROM_POD and the query
count(istio_requests_total)) which yields false positives even when the metric
value is 0; update the logic to parse the JSON response and examine the actual
numeric result instead of counting `"value"` strings — call the Prometheus API
in the same block that defines PROM_POD, capture the raw JSON, extract the
numeric value from .data.result[0].value[1] (or treat result length 0 as zero),
set METRIC_COUNT to that numeric value (or 0) and then test if METRIC_COUNT -gt
0 to decide the if-branch. Ensure you use the same kubectl exec to fetch the
JSON and avoid relying on grep/wc for presence checks.
---
Outside diff comments:
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Line 378: The echo currently expands $(cat $TRAFFIC_PID_FILE 2>/dev/null) at
echo-time, producing a stale or empty PID; change the echo so the command
substitution is not evaluated (e.g., print the literal command by using single
quotes or escaping the $/() sequence) so the printed instruction references
$PF_PID_FILE and $TRAFFIC_PID_FILE verbatim; update the echo that contains
PF_PID_FILE and TRAFFIC_PID_FILE to quote/escape the substitution to defer
evaluation until the user runs the printed command.
In `@Labs/10-Istio/monitor.sh`:
- Around line 373-394: The script currently treats any unrecognized non-empty
CLI argument as empty and falls through into interactive prompts; update the
top-level mode handling (the local variable "mode" and the branches that call
show_summary, test_pipeline, etc.) to explicitly detect invalid non-empty modes:
after the existing "full", "test", and "summary" checks add an else-if that
checks if "$mode" is non-empty and not one of the recognized flags, then print a
brief error/usage message (e.g., "Invalid mode: $mode. Valid modes:
full,test,summary") to stderr and exit with a non-zero code to prevent entering
interactive read prompts. Ensure the conditional references the same "mode"
variable and returns non-zero so automation won't hang.
- Around line 199-201: The CronJob probe commands that populate CRONJOB and the
later cronjob lookup for LAST_JOB call kubectl and can exit non-zero under set
-euo pipefail, which will abort the script when the CronJob is absent; update
the two kubectl invocations that assign CRONJOB and the later cronjob check to
append "|| true" so the command returns success on missing resources and the
subsequent fallback logic (variables CRONJOB and LAST_JOB and the surrounding if
checks) can handle absence gracefully.
In `@Labs/10-Istio/scripts/05-verify.sh`:
- Around line 61-65: The verification should treat missing Istio sidecars as an
error, not just a warning: in the block that checks CONTAINERS for "istio-proxy"
(the if/else using print_success/print_warning), change the else branch to
increment the global error counter (errors) and call print_error (or at minimum
still call print_warning but also do errors=$((errors+1))) so missing sidecars
are counted toward failure; update the else branch where print_warning "$app:
Running (NO sidecar!)" is emitted to also increment errors so the final status
fails when injection is broken.
---
Duplicate comments:
In `@Labs/10-Istio/monitor.sh`:
- Line 369: The unguarded clear call in main() can fail in non-interactive/CI
environments and abort the script; wrap the clear invocation (the plain clear
call in main()) with a condition that ensures it only runs in an interactive TTY
(e.g., test -t 1) and/or when TERM is set, so the script continues to execute
full/test/summary in CI; update main() to perform this guard check before
calling clear.
In `@Labs/10-Istio/scripts/05-verify.sh`:
- Around line 44-52: The health-check loop for addons (for addon in prometheus
grafana kiali jaeger) omits Loki, so update the loop to include "loki" (e.g.,
for addon in prometheus grafana kiali jaeger loki) so that the existing check
that uses STATUS, print_success, print_error and increments errors covers Loki
as well; ensure the same jsonpath and namespace logic is used for the new "loki"
entry.
---
Nitpick comments:
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh`:
- Around line 158-167: start_traffic_generator currently uses nohup
unconditionally which will cause the script to abort under set -euo pipefail if
nohup is missing; modify start_traffic_generator to mirror start_port_forward by
checking for nohup via command -v nohup and using nohup ... & only if present,
otherwise fall back to launching the same bash -c loop in the background without
nohup, keeping the same redirection to TRAFFIC_LOG_FILE and saving the
background PID to TRAFFIC_PID_FILE; reference the start_traffic_generator
function and the variables TRAFFIC_LOG_FILE and TRAFFIC_PID_FILE when making the
change.
- Around line 100-104: The script uses cat twice on "$TRAFFIC_PID_FILE" between
the kill -0 check and the kill call, creating a TOCTOU race and redundant IO;
cache the PID by reading it once into a local variable (e.g., pid) before the
kill -0 test, use that variable for both the kill -0 and kill calls, and still
handle failures gracefully and remove TRAFFIC_PID_FILE and TRAFFIC_LOG_FILE as
before; update the block that references TRAFFIC_PID_FILE and TRAFFIC_LOG_FILE
accordingly.
- Around line 246-251: Remove the redundant labeling of the demo namespace in
demo.sh: delete the kubectl command that runs "kubectl label namespace demo
istio-injection=enabled --overwrite 2>/dev/null || true" in the Step 05 block
(the earlier kubectl label call for namespace demo), since the demo namespace is
created later and the actual label is applied in Step 11; keep the default
namespace labeling and the later demo labeling in Step 11.
In `@Labs/10-Istio/monitor.sh`:
- Around line 25-63: Replace the duplicated helper implementations by sourcing
the shared common helpers and removing the local defs: add a single source of
the shared common.sh at the top of this script and delete the local
print_header() and check_prerequisites() implementations; ensure the script
relies on the shared print_header and check_prerequisites symbols, adjust any
call sites if the shared check_prerequisites has a broader signature, and run a
quick smoke test (and shellcheck) to confirm no regressions.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
Labs/10-Istio/demo.shLabs/10-Istio/demo/01-basic-setup/demo.shLabs/10-Istio/demo/02-traffic-splitting-3pods/demo.shLabs/10-Istio/monitor.shLabs/10-Istio/scripts/01-install-istio.shLabs/10-Istio/scripts/05-verify.sh
| for host in kiali.local grafana.local jaeger.local prometheus.local loki.local bookinfo.local; do | ||
| if ! grep -q "$host" /etc/hosts 2>/dev/null; then | ||
| HOSTS_NEEDED="$HOSTS_NEEDED $host" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n Labs/10-Istio/demo.sh | sed -n '75,85p'Repository: nirgeier/KubernetesLabs
Length of output: 511
Use fixed-string matching for /etc/hosts host checks to avoid regex interpretation and commented entries.
The current grep -q "$host" has two issues:
- The
.in.localis treated as a regex wildcard matching any character, potentially causing false positives - More importantly, commented entries like
# 127.0.0.1 kiali.localwould incorrectly match, causing the script to skip adding the host even though it's not actually configured
🔧 Proposed fix
for host in kiali.local grafana.local jaeger.local prometheus.local loki.local bookinfo.local; do
- if ! grep -q "$host" /etc/hosts 2>/dev/null; then
+ if ! grep -vE '^[[:space:]]*#' /etc/hosts 2>/dev/null | grep -qwF "$host"; then
HOSTS_NEEDED="$HOSTS_NEEDED $host"
fi
done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for host in kiali.local grafana.local jaeger.local prometheus.local loki.local bookinfo.local; do | |
| if ! grep -q "$host" /etc/hosts 2>/dev/null; then | |
| HOSTS_NEEDED="$HOSTS_NEEDED $host" | |
| fi | |
| for host in kiali.local grafana.local jaeger.local prometheus.local loki.local bookinfo.local; do | |
| if ! grep -vE '^[[:space:]]*#' /etc/hosts 2>/dev/null | grep -qwF "$host"; then | |
| HOSTS_NEEDED="$HOSTS_NEEDED $host" | |
| fi | |
| done |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Labs/10-Istio/demo.sh` around lines 77 - 80, In the for-loop that builds
HOSTS_NEEDED, replace the plain grep check that can match regexes and commented
lines with a fixed-string, non-commented-line check; for example, change the
condition that uses grep -q "$host" /etc/hosts to an awk invocation that ignores
lines starting with # and searches for the literal host (e.g., use awk -v
h="$host" '!/^[[:space:]]*#/ && index($0,h){exit 0} END{exit 1}' /etc/hosts) so
HOSTS_NEEDED is only left unchanged when an actual, non-commented hosts entry
for the literal host exists.
| sleep 2 | ||
|
|
||
| if command -v curl >/dev/null 2>&1; then | ||
| curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check .port-forward-istio-system-kiali-20001.log)" | ||
| curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check .port-forward-istio-system-istio-ingressgateway-8080.log)" | ||
| curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-kiali-20001.log)" | ||
| curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-istio-ingressgateway-8080.log)" | ||
| fi |
There was a problem hiding this comment.
sleep 2 is likely too short before probing port-forward readiness.
kubectl port-forward must complete its API-server handshake and bind the local socket; 2 seconds is frequently insufficient on slow clusters or in CI, producing false "not reachable yet" messages. A short retry loop or a longer sleep would improve reliability.
🛡️ Proposed fix — lightweight retry loop
-sleep 2
-
-if command -v curl >/dev/null 2>&1; then
- curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && echo "Kiali: reachable on http://localhost:20001/kiali/" || echo "Kiali: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-kiali-20001.log)"
- curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-istio-ingressgateway-8080.log)"
-fi
+if command -v curl >/dev/null 2>&1; then
+ for attempt in $(seq 1 10); do
+ sleep 2
+ curl -fsSI http://127.0.0.1:20001/kiali/ >/dev/null 2>&1 && { echo "Kiali: reachable on http://localhost:20001/kiali/"; break; }
+ [ "${attempt}" -eq 10 ] && echo "Kiali: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-kiali-20001.log)"
+ done
+ curl -fsS http://127.0.0.1:8080/productpage >/dev/null 2>&1 && echo "Bookinfo: reachable on http://localhost:8080/productpage" || echo "Bookinfo: not reachable yet (check ${RUNTIME_DIR}/port-forward-istio-system-istio-ingressgateway-8080.log)"
+fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Labs/10-Istio/demo/01-basic-setup/demo.sh` around lines 361 - 366, The single
short sleep (sleep 2) before probing port-forward readiness is unreliable;
replace it with a lightweight retry loop that attempts the two curl probes (the
Kiali check to http://127.0.0.1:20001/kiali/ and the Bookinfo check to
http://127.0.0.1:8080/productpage) a few times with small waits (e.g., 5–10s)
and a max retry count, exiting the loop early when each service becomes
reachable and printing the existing "reachable" or "not reachable yet (check
${RUNTIME_DIR}/...log)" messages; update the block in demo.sh that currently
contains sleep 2 and the curl checks to use this retry logic so port-forward has
time to bind before giving up.
| while IFS= read -r pid; do | ||
| [ -n "$pid" ] || continue | ||
| kill "$pid" 2>/dev/null || true | ||
| done <"$PF_PID_FILE" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh | head -100Repository: nirgeier/KubernetesLabs
Length of output: 4132
🏁 Script executed:
# Check the cleanup function and port-forward recording logic
grep -n "cleanup_port_forwards\|PF_PID_FILE\|echo.*>>" Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh | head -20Repository: nirgeier/KubernetesLabs
Length of output: 705
🏁 Script executed:
# Run the verification script provided in the comment
rg -n -C2 'echo \$! >>"\$PF_PID_FILE"|kill -0 "\$pf_pid"|^\s*\[\[\s*"\$pid"\s*=~\s*\^\[0-9\]\+\$\s*\]\]' \
Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.shRepository: nirgeier/KubernetesLabs
Length of output: 225
Record only live port-forward PIDs before persisting them.
The script writes $! immediately after backgrounding kubectl port-forward without verifying the process survives. If the command exits quickly (e.g., missing service, authentication failure), that PID can be reused later and the cleanup function will kill an unrelated process.
🔧 Proposed fix
cleanup_port_forwards() {
[ -f "$PF_PID_FILE" ] || return 0
[ -s "$PF_PID_FILE" ] || return 0
while IFS= read -r pid; do
[ -n "$pid" ] || continue
+ [[ "$pid" =~ ^[0-9]+$ ]] || continue
kill "$pid" 2>/dev/null || true
done <"$PF_PID_FILE"
rm -f "$PF_PID_FILE" >/dev/null 2>&1 || true
}
@@
if command -v nohup >/dev/null 2>&1; then
nohup kubectl -n "${namespace}" port-forward "svc/${svc}" "${local_port}:${remote_port}" >"${log_file}" 2>&1 &
else
kubectl -n "${namespace}" port-forward "svc/${svc}" "${local_port}:${remote_port}" >"${log_file}" 2>&1 &
fi
- echo $! >>"$PF_PID_FILE"
+ local pf_pid="$!"
+ sleep 1
+ if kill -0 "$pf_pid" 2>/dev/null; then
+ printf '%s\n' "$pf_pid" >>"$PF_PID_FILE"
+ else
+ echo "Warning: port-forward failed for ${namespace}/svc/${svc}. Check ${log_file}" >&2
+ fi
}Applies to lines 18–21 and 69–75.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Labs/10-Istio/demo/02-traffic-splitting-3pods/demo.sh` around lines 18 - 21,
When backgrounding kubectl port-forward, don't write $! to PF_PID_FILE
unconditionally; capture the background PID (pid=$!) then verify the process is
still the port-forward (alive) before persisting it by using a liveliness check
(e.g., kill -0 "$pid" >/dev/null 2>&1 or a brief sleep+check) and only append to
PF_PID_FILE if the check succeeds; apply the same change to both places where
port-forward PIDs are recorded and ensure the cleanup loop that reads
PF_PID_FILE still skips empty lines and only kills recorded PIDs (as it already
does).
| for crd in $(kubectl get crd -o name 2>/dev/null | grep istio.io || true); do | ||
| release=$(kubectl get "$crd" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true) | ||
| [ -n "$release" ] || continue | ||
| if [ "$release" != "istio-base" ]; then | ||
| print_info "Removing $crd (not managed by Helm istio-base)" | ||
| kubectl delete "$crd" --ignore-not-found 2>/dev/null || true | ||
| fi | ||
| done | ||
|
|
||
| for webhook in $(kubectl get validatingwebhookconfiguration -o name 2>/dev/null | grep -i istio || true); do | ||
| release=$(kubectl get "$webhook" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true) | ||
| [ -n "$release" ] || continue | ||
| if [ "$release" != "istio-base" ]; then | ||
| print_info "Removing $webhook (not managed by Helm istio-base)" | ||
| kubectl delete "$webhook" --ignore-not-found 2>/dev/null || true | ||
| fi | ||
| done | ||
|
|
||
| for webhook in $(kubectl get mutatingwebhookconfiguration -o name 2>/dev/null | grep -i istio || true); do | ||
| release=$(kubectl get "$webhook" -o jsonpath='{.metadata.annotations["meta.helm.sh/release-name"]}' 2>/dev/null || true) | ||
| [ -n "$release" ] || continue | ||
| if [ "$release" != "istio-base" ]; then | ||
| print_info "Removing $webhook (not managed by Helm istio-base)" | ||
| kubectl delete "$webhook" --ignore-not-found 2>/dev/null || true | ||
| fi | ||
| done |
There was a problem hiding this comment.
Unannotated conflicting resources are skipped instead of cleaned.
The guard "[ -n "$release" ] || continue" prevents deletion of resources with no Helm release annotation, so stale Istio resources from prior installs can still block Helm with ownership conflicts.
🔧 Proposed fix
- [ -n "$release" ] || continue
- if [ "$release" != "istio-base" ]; then
+ if [ -z "$release" ] || [ "$release" != "istio-base" ]; then
print_info "Removing $crd (not managed by Helm istio-base)"
kubectl delete "$crd" --ignore-not-found 2>/dev/null || true
fiApply the same condition pattern to the validating webhook, mutating webhook, clusterrole, and clusterrolebinding loops.
Also applies to: 70-85
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Labs/10-Istio/scripts/01-install-istio.sh` around lines 25 - 50, The current
guard '[ -n "$release" ] || continue' skips resources that lack a Helm release
annotation, leaving stale Istio objects; update each cleanup loop (the for crd
in $(kubectl get crd ...), for webhook in $(kubectl get
validatingwebhookconfiguration ...), for webhook in $(kubectl get
mutatingwebhookconfiguration ...), and the clusterrole/clusterrolebinding
cleanup loops) to delete resources when the release is empty or not equal to
"istio-base" (e.g. replace the guard with a conditional like: if [ -z "$release"
] || [ "$release" != "istio-base" ]; then print_info ...; kubectl delete ...;
fi) so unannotated conflicting resources are removed.
| METRIC_COUNT=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | grep -o '"value"' | wc -l | tr -d ' ') || true | ||
| if [ "${METRIC_COUNT:-0}" -gt 0 ]; then |
There was a problem hiding this comment.
Istio metrics check uses a false-positive condition.
Counting "value" keys does not validate metric value; it can pass even when count(istio_requests_total) is 0.
🔧 Proposed fix
- METRIC_COUNT=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | grep -o '"value"' | wc -l | tr -d ' ') || true
- if [ "${METRIC_COUNT:-0}" -gt 0 ]; then
+ METRIC_VALUE=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | awk -F'"' '/"value":\[/ {print $(NF-1); exit}') || true
+ if awk "BEGIN {exit !(${METRIC_VALUE:-0} > 0)}"; then
print_success "Istio metrics available in Prometheus"
else
print_warning "No Istio metrics yet (traffic may need more time)"
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| METRIC_COUNT=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | grep -o '"value"' | wc -l | tr -d ' ') || true | |
| if [ "${METRIC_COUNT:-0}" -gt 0 ]; then | |
| METRIC_VALUE=$(kubectl exec -n istio-system "$PROM_POD" -- wget -qO- 'http://localhost:9090/api/v1/query?query=count(istio_requests_total)' 2>/dev/null | awk -F'"' '/"value":\[/ {print $(NF-1); exit}') || true | |
| if awk "BEGIN {exit !(${METRIC_VALUE:-0} > 0)}"; then | |
| print_success "Istio metrics available in Prometheus" | |
| else | |
| print_warning "No Istio metrics yet (traffic may need more time)" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Labs/10-Istio/scripts/05-verify.sh` around lines 85 - 86, The current check
stores METRIC_COUNT by grepping for `"value"` in the Prometheus HTTP response
(involving PROM_POD and the query count(istio_requests_total)) which yields
false positives even when the metric value is 0; update the logic to parse the
JSON response and examine the actual numeric result instead of counting
`"value"` strings — call the Prometheus API in the same block that defines
PROM_POD, capture the raw JSON, extract the numeric value from
.data.result[0].value[1] (or treat result length 0 as zero), set METRIC_COUNT to
that numeric value (or 0) and then test if METRIC_COUNT -gt 0 to decide the
if-branch. Ensure you use the same kubectl exec to fetch the JSON and avoid
relying on grep/wc for presence checks.
Summary by CodeRabbit
New Features
Bug Fixes
Chores