Skip to content

Aditya870907/eks-microservices-observability-demo

🧩 Google Microservices Demo on Kubernetes (EKS) — Full Observability Stack

👨‍💻 Built by: Aditya Raj

🎥 Watch Loom Walkthrough (Webcam + Audio)
🎥 Watch Short Version (No Webcam)


🚀 Project Overview

This project implements Google’s Microservices Demo (Online Boutique) deployed on Amazon EKS with a full observability pipeline — combining metrics, logs, and live traffic simulation.

The goal was to replicate a real production-grade microservices environment, monitor it end-to-end, and learn how observability, scalability, and debugging actually work in the cloud.

🧰 Stack Overview

Layer Tool Purpose
Kubernetes Cluster AWS EKS Managed orchestration platform
Metrics Prometheus Scrapes and stores cluster/service metrics
Logs Loki + Promtail Centralized log aggregation
Visualization Grafana Dashboards combining metrics & logs
Load Testing Locust Simulates real user traffic
Storage Add-On AWS EBS CSI Driver Enables dynamic persistent volumes for stateful services
Deployment Tool Helm Chart-based automation for all services

🧱 Architecture Diagram

                     ┌──────────────────────────┐
                     │        Grafana           │
                     │ (Metrics + Logs UI)      │
                     └───────────┬──────────────┘
                                 │
             ┌───────────────────┼───────────────────┐
             ▼                                       ▼
     ┌──────────────┐                        ┌────────────────┐
     │ Prometheus   │                        │      Loki       │
     │ (Metrics DB) │                        │ (Logs Storage)  │
     └──────┬───────┘                        └────────┬────────┘
            │                                         │
   ┌────────▼──────────┐                   ┌──────────▼──────────┐
   │  Microservices    │                   │     Promtail        │
   │ (Frontend + 9 svc)│                   │ (Log Shippers)      │
   └────────┬──────────┘                   └─────────┬───────────┘
            │                                         │
            └─────────────────────────────┬──────────┘
                                          ▼
                                   ┌───────────────┐
                                   │    Locust     │
                                   │ (Load Gen)    │
                                   └───────────────┘
Key AWS EKS Components:

Resource	Status	Purpose
eksctl-microservices-demo-cluster	✅ CREATE_COMPLETE	Main EKS control plane
eksctl-microservices-demo-nodegroup-managed-ng	✅	EC2 worker nodes
eksctl-microservices-demo-addon-aws-ebs-csi-driver	✅	Persistent volume provisioning
eksctl-microservices-demo-addon-vpc-cni	✅	Pod networking (CNI)

# 🧩 PART 1 — Project Story / Journey  
*(Full Technical Reflection by Aditya Raj)*  

---

## 🧭 Phase 1: Goal & Setup  

I began with the goal of deploying **Google’s Microservices Demo (Online Boutique)** on **Amazon EKS**, and integrating a **complete observability stack** — **Grafana**, **Prometheus**, and **Loki**, along with **Locust** for realistic load testing.

At first, I experimented with several observability tools:

- Tried **SigNoz**, but encountered service dependency failures due to a missing **EBS CSI driver** in EKS.  
- Pivoted to **Grafana + Prometheus**, only to see Prometheus pods crash for the same underlying reason.  
- After deeper investigation, I identified the root cause: persistent volume claims couldn’t attach because **EBS CSI wasn’t installed**.  
- Once the EBS driver add-on was deployed, both SigNoz and Prometheus stabilized.

🧠 **Lesson:**  
> When stateful components fail on EKS, always check the storage layer first — especially the EBS CSI driver and PV/PVC bindings.

---

## ⚙️ Phase 2: Metrics and Dashboards  

With Prometheus stable, I focused on metrics visualization:

- Exposed **Grafana** via a `LoadBalancer` service to make dashboards public.  
- Added **Prometheus** as a data source to visualize CPU, memory, request rates, and service health.  
- Introduced **Loki** for logs, but early Helm chart versions failed due to **storage backend validation**.  
- After debugging `deploymentMode`, `filesystem`, and `auth_enabled` parameters, I deployed a working **single-binary Loki** instance (no Promtail, since logs weren’t critical for demo scope).

🧠 **Lesson:**  
> Helm charts are *version-sensitive*. Always cross-check the chart version and deployment mode before applying values.

---

## 🚦 Phase 3: Load Testing with Locust  

To simulate real user behavior:

- Deployed **Locust** using the `deliveryhero/locust` Helm chart.  
- Initially faced DNS issues and `CrashLoopBackOff` due to missing **`locustfile` ConfigMap**.  
- Resolved it by manually creating a `locustfile-config` ConfigMap defining load patterns.  
- Patched **Frontend** & **Grafana** services to `LoadBalancer` type for public URLs.  
- Verified Locust UI at its ELB endpoint, targeting the frontend’s ELB URL — traffic appeared instantly in Grafana metrics and Loki logs.

🧠 **Lesson:**  
> Distributed systems rarely fail silently — read Helm templates, inspect ConfigMaps, and validate mounts and volume claims line by line.

---

## 📊 Phase 4: Final Integration & Verification  

Once all services were healthy:

- Combined **Prometheus (metrics)** and **Loki (logs)** into unified Grafana dashboards.  
- Validated:
  - ✅ All pods healthy via `kubectl get pods -A`
  - ✅ Application accessible via frontend ELB
  - ✅ Locust load visible in Grafana dashboards  
- Invited **siddarth@drdroid.io** as a Grafana viewer and shared public URLs for validation.

🧠 **Lesson:**  
> Don’t just deploy — *validate observability end-to-end*. The system isn’t complete until insights are visible.

---

## 🧠 Core Concepts Applied  

- **Kubernetes fundamentals:** Deployments, Services, StatefulSets, Helm  
- **Load balancing:** EKS `LoadBalancer` Services + AWS ELB  
- **Monitoring stack:** Prometheus scraping + Grafana visualization  
- **Logging stack:** Loki (single-binary deployment)  
- **Load testing:** Locust (distributed mode)  
- **Debugging:** Helm charts and EBS CSI driver issues  
- **Resource management:** PVC cleanup, namespace control, finalizers  
- **Cloud cost discipline:** Cluster deletion post-validation  

---

## ⚡ Major Problems Solved  

| Problem | Root Cause | Solution |
|----------|------------|-----------|
| SigNoz pods failing | Missing EBS CSI driver | Installed AWS EBS CSI add-on |
| Elasticsearch health check stuck | SSL / auth mismatch | Disabled xpack security for demo |
| Loki Helm validation errors | DeploymentMode misconfigured | Switched to SingleBinary mode |
| Locust CrashLoopBackOff | Missing `locustfile-config` map | Created ConfigMap manually |
| LoadBalancer no response | Internal DNS target used | Patched to external ELB URLs |

---

## 💬 Final Reflection  

This project wasn’t just a deployment — it was a **full-stack debugging journey**.  
I broke, fixed, and understood every layer: Helm chart quirks, AWS EKS storage, observability pipelines, and Kubernetes networking.

The end result:  
> A **production-like microservices environment** with real traffic, metrics, and logs — fully observable, entirely open-source, and completely torn down after validation.

---
---

## 🧹 Cluster Cleanup  

After recording the Loom walkthrough, I deleted the EKS cluster to save AWS costs.

```bash
eksctl delete cluster --name microservices-demo --region us-east-1

About

This project deploys Google’s Online Boutique microservices demo on Amazon EKS and builds a full observability pipeline using Prometheus, Grafana, and Loki. It demonstrates end-to-end visibility of metrics, logs, and live traffic, powered by Helm for deployment and Locust for load simulation. The setup reflects real production debugging workflows

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors