Skip to content

y0s3ph/k8scope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

34 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

k8scope

CI Release Go Report Card GitHub release Go Version License

Opinionated observability stack for Kubernetes.

Deploy Prometheus, Grafana, Loki, Alertmanager, and OpenTelemetry Collector with battle-tested defaults in one command. Stop spending days configuring YAML β€” start observing your cluster in minutes.

Warning

k8scope is in early development. APIs and configuration may change.

πŸ”₯ The Problem

Setting up observability on Kubernetes means:

  • Configuring kube-prometheus-stack (4000+ lines of values.yaml)
  • Adding Loki separately with its own Helm chart
  • Setting up OpenTelemetry Collector pipelines for app telemetry
  • Connecting datasources in Grafana manually
  • Importing dashboards that may or may not work
  • Writing alerting rules from scratch (or living with 200 noisy defaults)
  • Figuring out retention, storage, ingress, and auth

This takes 2-5 days for an experienced SRE. k8scope reduces it to one command.


Table of Contents

πŸ—οΈ Architecture

k8scope deploys a hybrid observability architecture where Prometheus handles Kubernetes infrastructure metrics via scraping, while OpenTelemetry Collector acts as the unified ingestion layer for application telemetry via OTLP:

flowchart LR
    Apps["πŸ–₯️ Applications"]
    K8s["☸️ K8s Infrastructure"]
    OTel["OpenTelemetry Collector"]
    Prom["Prometheus"]
    Loki["Loki"]
    Tempo["Tempo"]
    Grafana["πŸ“Š Grafana"]
    AM["Alertmanager"]
    Notify["πŸ“’ Slack / PagerDuty / Email"]

    Apps -- "OTLP" --> OTel
    OTel -- "remote write" --> Prom
    OTel -- "OTLP/HTTP" --> Loki
    OTel -. "traces" .-> Tempo

    K8s -- "scrape" --> Prom

    Prom -- "query" --> Grafana
    Loki -- "query" --> Grafana
    Tempo -. "query" .-> Grafana

    Prom -- "alerts" --> AM
    AM --> Notify
Loading

Signal flow:

  1. Metrics (infrastructure) β€” Prometheus scrapes kubelet, kube-state-metrics, and node-exporter endpoints.
  2. Metrics (application) β€” Apps send OTLP metrics to the OTel Collector, which writes them to Prometheus via remote write.
  3. Logs β€” The OTel Collector collects container logs from each node and receives application logs via OTLP. All logs are forwarded to Loki.
  4. Alerts β€” Prometheus evaluates k8scope's curated alerting rules and sends notifications through Alertmanager.
  5. Visualization β€” Grafana comes pre-wired to both Prometheus and Loki datasources, with 5 curated dashboards ready out of the box.

🎯 Deployment Modes

k8scope provides opinionated defaults for every stage of your infrastructure:

Mode Target Replicas Storage Retention OTel Collector Auth
dev Local testing 1 Ephemeral Session No None
startup Small clusters (≀10 nodes) 1 10Gi PVC 7 days DaemonSet Basic
production Growing teams (10-50 nodes) 2-3 (HA) 50Gi PVC 30 days DaemonSet + Gateway Basic
enterprise Large orgs (50+ nodes) 2-3 (HA) External (S3/GCS) 90 days Gateway (multi-tenant) OIDC/SSO

How to choose:

  • startup β€” You're a small team, want reliable metrics and logs with minimal overhead, and don't need HA. This is the default and the recommended starting point.
  • production β€” You need high availability, longer retention, and your team is growing. Alertmanager routes to real notification channels.
  • enterprise β€” You need SSO, multi-tenant isolation, and external durable storage for compliance or cost optimization.

⚑ Quick Start

Installation

Binary (macOS / Linux / Windows):

Download the latest release from the releases page:

# macOS (Apple Silicon)
curl -sL https://github.com/y0s3ph/k8scope/releases/latest/download/k8scope_$(curl -s https://api.github.com/repos/y0s3ph/k8scope/releases/latest | grep tag_name | cut -d '"' -f4 | sed 's/v//')_darwin_arm64.tar.gz | tar xz
sudo mv k8scope /usr/local/bin/

# Linux (amd64)
curl -sL https://github.com/y0s3ph/k8scope/releases/latest/download/k8scope_$(curl -s https://api.github.com/repos/y0s3ph/k8scope/releases/latest | grep tag_name | cut -d '"' -f4 | sed 's/v//')_linux_amd64.tar.gz | tar xz
sudo mv k8scope /usr/local/bin/

Container image:

docker pull ghcr.io/y0s3ph/k8scope:latest

Basic Usage

# Install the lightweight stack for a small cluster
k8scope install --mode startup

# Preview what would be installed (no changes applied)
k8scope install --mode production --dry-run

# Install in a custom namespace
k8scope install --mode startup --namespace monitoring

# Check stack health
k8scope status

# Remove everything
k8scope uninstall

πŸ“¦ What Gets Installed

Core Components

Component Chart Purpose Modes
Prometheus kube-prometheus-stack Metrics collection, storage, and alerting engine All
Grafana grafana Dashboards and visualization with pre-wired datasources All
Loki loki Log aggregation and querying All
Alertmanager Bundled in kube-prometheus-stack Alert routing, grouping, and deduplication startup+
OTel Collector opentelemetry-collector Unified telemetry pipeline (OTLP ingest, log collection) startup+
Node Exporter Bundled in kube-prometheus-stack Host-level metrics (CPU, memory, disk, network) All
kube-state-metrics Bundled in kube-prometheus-stack Kubernetes object metrics (deployments, pods, nodes) All

All Helm charts are embedded in the k8scope binary β€” no internet connection or Helm repositories required at install time.

OpenTelemetry Collector Modes

The OTel Collector deployment topology scales with your needs:

Mode Deployment Role
startup DaemonSet Collects node logs, receives app OTLP data, exports to Prometheus and Loki
production DaemonSet + Gateway DaemonSet collects per-node data, Gateway centralizes processing and routing
enterprise Gateway (HA) Tenant-aware routing, sampling, rate limiting, and filtering

Applications instrumented with OpenTelemetry SDKs can send metrics, logs, and traces to the Collector's OTLP endpoint out of the box β€” no additional configuration needed.


πŸ“Š Curated Dashboards

k8scope ships with 5 focused dashboards instead of dozens of generic ones. Each dashboard includes template variables for namespace filtering and auto-configured datasource references:

Dashboard Description Key Panels
Cluster Overview Bird's-eye view of the entire cluster Node count, pod status, CPU/memory usage, top namespaces by resource, cluster events
Node Resources Per-node hardware utilization CPU usage per core, memory usage vs available, disk I/O, network bandwidth per interface
Pod Resources Per-pod resource consumption CPU/memory requests vs actual usage, restart count, container status, OOMKill history
Networking Network traffic and errors Bytes received/transmitted per pod, packet drop rate, DNS lookup latency, connection states
Logs Overview Centralized log exploration Log volume by namespace/pod, error rate over time, log stream browser, full-text search

All dashboards are stored as JSON files and provisioned automatically via Grafana's sidecar mechanism with the k8scope-dashboard label.


🚨 Curated Alerting Rules

k8scope ships with 17 alerting rules organized by severity. Each alert includes a descriptive summary, detailed description, and a runbook_url for remediation guidance. All rules carry the k8scope: "true" label for easy identification.

Critical Alerts

These alerts require immediate action and should be routed to paging systems (PagerDuty, Opsgenie, phone calls):

Alert Condition for
KubeNodeNotReady A node has been in NotReady state 5m
KubePodCrashLooping Pod has restarted more than 5 times in 15 minutes 0m
KubePVCFillingUp PVC is >90% full and predicted to fill in 4 hours 5m
PrometheusDown A Prometheus target is unreachable 3m
ClusterCPUOvercommit CPU requests exceed 90% of total allocatable 10m
ClusterMemoryOvercommit Memory requests exceed 90% of total allocatable 10m

Warning Alerts

These alerts require attention on the next business day and should be routed to Slack/email:

Alert Condition for
KubePodNotReady Pod has been not-ready for an extended period 15m
KubeDeploymentReplicasMismatch Available replicas don't match desired count 15m
KubeContainerOOMKilled Container was OOMKilled in the last hour 0m
KubeHPAMaxedOut HPA has been at max replicas for 30 minutes 30m
NodeHighCPU Node CPU usage above 80% 30m
NodeHighMemory Node memory usage above 85% 30m
NodeDiskPressure Root filesystem usage above 80% 15m
LokiIngestionErrors Loki is experiencing append failures 10m

Info Alerts

These alerts are informational β€” useful for audit trails and dashboards, but don't require notification:

Alert Condition for
KubeDeploymentRollingUpdate A deployment rollout is in progress 0m
KubeNamespaceTerminating A namespace has been terminating (possibly stuck finalizers) 5m
PrometheusRuleFailures Prometheus is failing to evaluate some rules 10m

βš™οΈ Configuration

k8scope accepts configuration via CLI flags, a YAML config file, or both. When both are provided, CLI flags take precedence over values in the config file.

CLI Flags

k8scope install [flags]

Flags:
  --mode string        Deployment mode: startup, production, enterprise (default "startup")
  --namespace string   Target Kubernetes namespace (default "k8scope")
  --kubeconfig string  Path to kubeconfig file (default: $KUBECONFIG or ~/.kube/config)
  --dry-run            Preview installation plan without applying changes
  --config string      Path to k8scope configuration file

Configuration File

Config file auto-discovery order: .k8scope.yaml (current dir) > $HOME/.k8scope.yaml.

mode: production
namespace: monitoring
kubeconfig: ~/.kube/config

components:
  prometheus:
    replicas: 2
    storage: 50Gi
    retention: 30d
  grafana:
    replicas: 2
  loki:
    replicas: 3
    storage: 50Gi
    retention: 30d
  otelCollector:
    mode: daemonset+gateway

gateway:
  enabled: true
  className: traefik  # any Gateway API-compatible controller (traefik, envoy, cilium, istio)
  domain: observability.company.com
  tls:
    enabled: true
    source: cert-manager
    issuer: letsencrypt-prod
  hosts:
    grafana: grafana.observability.company.com
    prometheus: prometheus.observability.company.com   # optional
    alertmanager: alertmanager.observability.company.com  # optional

Preflight Checks

Before installing, k8scope automatically verifies:

Check Description
Cluster connectivity Validates the kubeconfig and confirms the API server is reachable
Kubernetes version Ensures the cluster runs a supported version (β‰₯1.25)
Resource availability Checks that the cluster has enough CPU and memory to run the selected mode
Storage class Verifies that a default StorageClass exists for PVC provisioning
Namespace conflict Warns if the target namespace already contains observability components

If any check fails, k8scope reports the issue and suggests remediation steps before proceeding.


πŸ”Œ Connecting Your Applications

Once k8scope is installed, your applications can send telemetry using the OpenTelemetry Protocol (OTLP). The OTel Collector exposes endpoints inside the cluster automatically.

Sending Metrics via OTLP

Configure your OpenTelemetry SDK to export metrics to the collector:

# Environment variables for any OTel SDK
OTEL_EXPORTER_OTLP_ENDPOINT: "http://k8scope-otel-opentelemetry-collector.k8scope.svc:4317"
OTEL_EXPORTER_OTLP_PROTOCOL: "grpc"

Metrics are forwarded from the OTel Collector to Prometheus via remote write and become queryable in Grafana immediately.

Sending Logs via OTLP

Application logs sent via OTLP are automatically forwarded to Loki:

OTEL_EXPORTER_OTLP_ENDPOINT: "http://k8scope-otel-opentelemetry-collector.k8scope.svc:4317"
OTEL_EXPORTER_OTLP_PROTOCOL: "grpc"
OTEL_LOGS_EXPORTER: "otlp"

Container logs (stdout/stderr) are collected automatically by the OTel Collector DaemonSet β€” no application changes needed for basic log collection.


πŸ”§ Component Defaults by Mode

Below are the concrete values k8scope applies for the startup mode. Production and enterprise modes build upon these with additional replicas, storage, and features.

Prometheus

Setting Startup
Replicas 1
CPU request / limit 250m / 1000m
Memory request / limit 512Mi / 2Gi
Storage 10Gi PVC (ReadWriteOnce)
Retention 7 days
Service name k8scope-prom-prometheus

Prometheus is configured with serviceMonitorSelectorNilUsesHelmValues: false, meaning it will scrape all ServiceMonitors in the cluster regardless of labels.

Grafana

Setting Startup
Replicas 1
CPU request / limit 100m / 500m
Memory request / limit 256Mi / 512Mi
Storage 5Gi PVC
Admin password Auto-generated (retrieve via kubectl get secret)
Pre-configured datasources Prometheus, Loki
Dashboard provisioning Sidecar with k8scope-dashboard label

Loki

Setting Startup
Deployment mode SingleBinary
Replicas 1
CPU request / limit 100m / 500m
Memory request / limit 256Mi / 1Gi
Storage 10Gi filesystem PVC
Retention 7 days (168h)
Auth Disabled (single-tenant)
Schema v13 with TSDB store

Alertmanager

Setting Startup
Replicas 1
CPU request / limit 50m / 200m
Memory request / limit 64Mi / 128Mi
Storage 1Gi PVC
Group by alertname, namespace, severity
Group wait / interval 30s / 5m
Repeat interval 4h (critical: 1h)

Alertmanager is deployed as part of kube-prometheus-stack with three receivers pre-configured: default, critical, and null (for silencing Watchdog alerts). Inhibition rules prevent warning alerts from firing when a matching critical alert is already active.

OpenTelemetry Collector

Setting Startup
Deployment mode DaemonSet
CPU request / limit 100m / 500m
Memory request / limit 128Mi / 512Mi
Memory limiter 400 MiB (spike: 100 MiB)
Receivers OTLP (gRPC :4317, HTTP :4318)
Exporters Prometheus remote write, Loki OTLP/HTTP
Presets Log collection, Kubernetes attributes
Batch size / timeout 1024 / 5s

🚒 For GitOps Users

k8scope also publishes its Helm chart for direct use with ArgoCD, Flux, or plain Helm:

helm install k8scope oci://ghcr.io/y0s3ph/k8scope --values custom-values.yaml

This is useful if you prefer declarative GitOps workflows over imperative CLI commands, or if you want to manage the observability stack alongside your other Helm releases.


πŸ› οΈ Building from Source

# Clone the repository
git clone https://github.com/y0s3ph/k8scope.git
cd k8scope

# Build the binary
go build -o k8scope ./cmd/k8scope

# Run tests
go test ./...

# Build with version info (as goreleaser does)
go build -ldflags "-X github.com/y0s3ph/k8scope/internal/cli.version=dev" -o k8scope ./cmd/k8scope

Requirements:

  • Go 1.25+
  • Access to a Kubernetes cluster (for E2E tests)

πŸ§ͺ Testing

k8scope has two levels of testing:

Unit tests β€” Fast, no cluster required:

go test ./internal/... ./embed/...

End-to-End tests β€” Require a Kind cluster:

# Create a Kind cluster
kind create cluster --config test/e2e/testdata/kind-config.yaml --name k8scope-e2e

# Run E2E tests
go test -v -tags e2e -timeout 12m ./test/e2e/...

# Clean up
kind delete cluster --name k8scope-e2e

E2E tests validate the full installation lifecycle: binary build, install --mode startup, pod readiness, --dry-run behavior, and error handling for invalid modes.


πŸ—ΊοΈ Roadmap

v0.1.0 β€” Core Engine βœ…

  • CLI scaffolding with mode-based installation plans
  • Helm SDK installer engine (install, upgrade, uninstall, status)
  • Preflight checks (cluster, version, resources, storage class)
  • Configuration file loading with CLI flag override and merge semantics
  • CI/CD pipeline with cross-platform builds and container images

v0.2.0 β€” Startup MVP βœ…

  • Embed and deploy Prometheus (kube-prometheus-stack)
  • Embed and deploy Grafana with auto-configured Prometheus and Loki datasources
  • Embed and deploy Loki for log aggregation (SingleBinary mode)
  • Enable Alertmanager with basic routing and severity-based receivers
  • Deploy OTel Collector DaemonSet with log collection and OTLP receivers
  • Create 5 curated Grafana dashboards (Cluster, Node, Pod, Networking, Logs)
  • Create 17 curated alerting rules by severity (Critical, Warning, Info)
  • E2E test suite with Kind for startup mode validation

v0.3.0 β€” Production Mode

  • High availability configuration (multi-replica Prometheus, Grafana, Alertmanager)
  • Retention policies and storage management (50Gi, 30-day retention)
  • OTel Collector Gateway deployment for centralized processing

v0.4.0 β€” Developer Experience

  • status command with component health checks
  • upgrade command with safe rollouts and rollback
  • uninstall command with cleanup confirmation
  • dev mode with Docker Compose for local development

v0.5.0 β€” Enterprise Mode

  • OIDC/SSO authentication for Grafana
  • External storage backends (S3, GCS, Azure Blob)
  • Multi-tenant isolation with per-tenant OTel pipelines

v1.0.0 β€” Distribution & Ecosystem

  • Gateway API integration with TLS for all components (replaces deprecated ingress-nginx)
  • Helm chart publication to OCI registry
  • Tempo integration for distributed tracing
  • Dedicated documentation site

πŸ“„ License

Apache License 2.0 β€” see LICENSE for details.

About

Opinionated observability stack for Kubernetes. Deploy Prometheus, Grafana, Loki, and Alertmanager with battle-tested defaults in one command.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors