🎥 Watch Loom Walkthrough (Webcam + Audio)
🎥 Watch Short Version (No Webcam)
This project implements Google’s Microservices Demo (Online Boutique) deployed on Amazon EKS with a full observability pipeline — combining metrics, logs, and live traffic simulation.
The goal was to replicate a real production-grade microservices environment, monitor it end-to-end, and learn how observability, scalability, and debugging actually work in the cloud.
| Layer | Tool | Purpose |
|---|---|---|
| Kubernetes Cluster | AWS EKS | Managed orchestration platform |
| Metrics | Prometheus | Scrapes and stores cluster/service metrics |
| Logs | Loki + Promtail | Centralized log aggregation |
| Visualization | Grafana | Dashboards combining metrics & logs |
| Load Testing | Locust | Simulates real user traffic |
| Storage Add-On | AWS EBS CSI Driver | Enables dynamic persistent volumes for stateful services |
| Deployment Tool | Helm | Chart-based automation for all services |
┌──────────────────────────┐
│ Grafana │
│ (Metrics + Logs UI) │
└───────────┬──────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼
┌──────────────┐ ┌────────────────┐
│ Prometheus │ │ Loki │
│ (Metrics DB) │ │ (Logs Storage) │
└──────┬───────┘ └────────┬────────┘
│ │
┌────────▼──────────┐ ┌──────────▼──────────┐
│ Microservices │ │ Promtail │
│ (Frontend + 9 svc)│ │ (Log Shippers) │
└────────┬──────────┘ └─────────┬───────────┘
│ │
└─────────────────────────────┬──────────┘
▼
┌───────────────┐
│ Locust │
│ (Load Gen) │
└───────────────┘
Key AWS EKS Components:
Resource Status Purpose
eksctl-microservices-demo-cluster ✅ CREATE_COMPLETE Main EKS control plane
eksctl-microservices-demo-nodegroup-managed-ng ✅ EC2 worker nodes
eksctl-microservices-demo-addon-aws-ebs-csi-driver ✅ Persistent volume provisioning
eksctl-microservices-demo-addon-vpc-cni ✅ Pod networking (CNI)
# 🧩 PART 1 — Project Story / Journey
*(Full Technical Reflection by Aditya Raj)*
---
## 🧭 Phase 1: Goal & Setup
I began with the goal of deploying **Google’s Microservices Demo (Online Boutique)** on **Amazon EKS**, and integrating a **complete observability stack** — **Grafana**, **Prometheus**, and **Loki**, along with **Locust** for realistic load testing.
At first, I experimented with several observability tools:
- Tried **SigNoz**, but encountered service dependency failures due to a missing **EBS CSI driver** in EKS.
- Pivoted to **Grafana + Prometheus**, only to see Prometheus pods crash for the same underlying reason.
- After deeper investigation, I identified the root cause: persistent volume claims couldn’t attach because **EBS CSI wasn’t installed**.
- Once the EBS driver add-on was deployed, both SigNoz and Prometheus stabilized.
🧠 **Lesson:**
> When stateful components fail on EKS, always check the storage layer first — especially the EBS CSI driver and PV/PVC bindings.
---
## ⚙️ Phase 2: Metrics and Dashboards
With Prometheus stable, I focused on metrics visualization:
- Exposed **Grafana** via a `LoadBalancer` service to make dashboards public.
- Added **Prometheus** as a data source to visualize CPU, memory, request rates, and service health.
- Introduced **Loki** for logs, but early Helm chart versions failed due to **storage backend validation**.
- After debugging `deploymentMode`, `filesystem`, and `auth_enabled` parameters, I deployed a working **single-binary Loki** instance (no Promtail, since logs weren’t critical for demo scope).
🧠 **Lesson:**
> Helm charts are *version-sensitive*. Always cross-check the chart version and deployment mode before applying values.
---
## 🚦 Phase 3: Load Testing with Locust
To simulate real user behavior:
- Deployed **Locust** using the `deliveryhero/locust` Helm chart.
- Initially faced DNS issues and `CrashLoopBackOff` due to missing **`locustfile` ConfigMap**.
- Resolved it by manually creating a `locustfile-config` ConfigMap defining load patterns.
- Patched **Frontend** & **Grafana** services to `LoadBalancer` type for public URLs.
- Verified Locust UI at its ELB endpoint, targeting the frontend’s ELB URL — traffic appeared instantly in Grafana metrics and Loki logs.
🧠 **Lesson:**
> Distributed systems rarely fail silently — read Helm templates, inspect ConfigMaps, and validate mounts and volume claims line by line.
---
## 📊 Phase 4: Final Integration & Verification
Once all services were healthy:
- Combined **Prometheus (metrics)** and **Loki (logs)** into unified Grafana dashboards.
- Validated:
- ✅ All pods healthy via `kubectl get pods -A`
- ✅ Application accessible via frontend ELB
- ✅ Locust load visible in Grafana dashboards
- Invited **siddarth@drdroid.io** as a Grafana viewer and shared public URLs for validation.
🧠 **Lesson:**
> Don’t just deploy — *validate observability end-to-end*. The system isn’t complete until insights are visible.
---
## 🧠 Core Concepts Applied
- **Kubernetes fundamentals:** Deployments, Services, StatefulSets, Helm
- **Load balancing:** EKS `LoadBalancer` Services + AWS ELB
- **Monitoring stack:** Prometheus scraping + Grafana visualization
- **Logging stack:** Loki (single-binary deployment)
- **Load testing:** Locust (distributed mode)
- **Debugging:** Helm charts and EBS CSI driver issues
- **Resource management:** PVC cleanup, namespace control, finalizers
- **Cloud cost discipline:** Cluster deletion post-validation
---
## ⚡ Major Problems Solved
| Problem | Root Cause | Solution |
|----------|------------|-----------|
| SigNoz pods failing | Missing EBS CSI driver | Installed AWS EBS CSI add-on |
| Elasticsearch health check stuck | SSL / auth mismatch | Disabled xpack security for demo |
| Loki Helm validation errors | DeploymentMode misconfigured | Switched to SingleBinary mode |
| Locust CrashLoopBackOff | Missing `locustfile-config` map | Created ConfigMap manually |
| LoadBalancer no response | Internal DNS target used | Patched to external ELB URLs |
---
## 💬 Final Reflection
This project wasn’t just a deployment — it was a **full-stack debugging journey**.
I broke, fixed, and understood every layer: Helm chart quirks, AWS EKS storage, observability pipelines, and Kubernetes networking.
The end result:
> A **production-like microservices environment** with real traffic, metrics, and logs — fully observable, entirely open-source, and completely torn down after validation.
---
---
## 🧹 Cluster Cleanup
After recording the Loom walkthrough, I deleted the EKS cluster to save AWS costs.
```bash
eksctl delete cluster --name microservices-demo --region us-east-1