Opinionated observability stack for Kubernetes.
Deploy Prometheus, Grafana, Loki, Alertmanager, and OpenTelemetry Collector with battle-tested defaults in one command. Stop spending days configuring YAML β start observing your cluster in minutes.
Warning
k8scope is in early development. APIs and configuration may change.
Setting up observability on Kubernetes means:
- Configuring
kube-prometheus-stack(4000+ lines of values.yaml) - Adding Loki separately with its own Helm chart
- Setting up OpenTelemetry Collector pipelines for app telemetry
- Connecting datasources in Grafana manually
- Importing dashboards that may or may not work
- Writing alerting rules from scratch (or living with 200 noisy defaults)
- Figuring out retention, storage, ingress, and auth
This takes 2-5 days for an experienced SRE. k8scope reduces it to one command.
Table of Contents
k8scope deploys a hybrid observability architecture where Prometheus handles Kubernetes infrastructure metrics via scraping, while OpenTelemetry Collector acts as the unified ingestion layer for application telemetry via OTLP:
flowchart LR
Apps["π₯οΈ Applications"]
K8s["βΈοΈ K8s Infrastructure"]
OTel["OpenTelemetry Collector"]
Prom["Prometheus"]
Loki["Loki"]
Tempo["Tempo"]
Grafana["π Grafana"]
AM["Alertmanager"]
Notify["π’ Slack / PagerDuty / Email"]
Apps -- "OTLP" --> OTel
OTel -- "remote write" --> Prom
OTel -- "OTLP/HTTP" --> Loki
OTel -. "traces" .-> Tempo
K8s -- "scrape" --> Prom
Prom -- "query" --> Grafana
Loki -- "query" --> Grafana
Tempo -. "query" .-> Grafana
Prom -- "alerts" --> AM
AM --> Notify
Signal flow:
- Metrics (infrastructure) β Prometheus scrapes kubelet, kube-state-metrics, and node-exporter endpoints.
- Metrics (application) β Apps send OTLP metrics to the OTel Collector, which writes them to Prometheus via remote write.
- Logs β The OTel Collector collects container logs from each node and receives application logs via OTLP. All logs are forwarded to Loki.
- Alerts β Prometheus evaluates k8scope's curated alerting rules and sends notifications through Alertmanager.
- Visualization β Grafana comes pre-wired to both Prometheus and Loki datasources, with 5 curated dashboards ready out of the box.
k8scope provides opinionated defaults for every stage of your infrastructure:
| Mode | Target | Replicas | Storage | Retention | OTel Collector | Auth |
|---|---|---|---|---|---|---|
dev |
Local testing | 1 | Ephemeral | Session | No | None |
startup |
Small clusters (β€10 nodes) | 1 | 10Gi PVC | 7 days | DaemonSet | Basic |
production |
Growing teams (10-50 nodes) | 2-3 (HA) | 50Gi PVC | 30 days | DaemonSet + Gateway | Basic |
enterprise |
Large orgs (50+ nodes) | 2-3 (HA) | External (S3/GCS) | 90 days | Gateway (multi-tenant) | OIDC/SSO |
How to choose:
startupβ You're a small team, want reliable metrics and logs with minimal overhead, and don't need HA. This is the default and the recommended starting point.productionβ You need high availability, longer retention, and your team is growing. Alertmanager routes to real notification channels.enterpriseβ You need SSO, multi-tenant isolation, and external durable storage for compliance or cost optimization.
Binary (macOS / Linux / Windows):
Download the latest release from the releases page:
# macOS (Apple Silicon)
curl -sL https://github.com/y0s3ph/k8scope/releases/latest/download/k8scope_$(curl -s https://api.github.com/repos/y0s3ph/k8scope/releases/latest | grep tag_name | cut -d '"' -f4 | sed 's/v//')_darwin_arm64.tar.gz | tar xz
sudo mv k8scope /usr/local/bin/
# Linux (amd64)
curl -sL https://github.com/y0s3ph/k8scope/releases/latest/download/k8scope_$(curl -s https://api.github.com/repos/y0s3ph/k8scope/releases/latest | grep tag_name | cut -d '"' -f4 | sed 's/v//')_linux_amd64.tar.gz | tar xz
sudo mv k8scope /usr/local/bin/Container image:
docker pull ghcr.io/y0s3ph/k8scope:latest# Install the lightweight stack for a small cluster
k8scope install --mode startup
# Preview what would be installed (no changes applied)
k8scope install --mode production --dry-run
# Install in a custom namespace
k8scope install --mode startup --namespace monitoring
# Check stack health
k8scope status
# Remove everything
k8scope uninstall| Component | Chart | Purpose | Modes |
|---|---|---|---|
| Prometheus | kube-prometheus-stack |
Metrics collection, storage, and alerting engine | All |
| Grafana | grafana |
Dashboards and visualization with pre-wired datasources | All |
| Loki | loki |
Log aggregation and querying | All |
| Alertmanager | Bundled in kube-prometheus-stack |
Alert routing, grouping, and deduplication | startup+ |
| OTel Collector | opentelemetry-collector |
Unified telemetry pipeline (OTLP ingest, log collection) | startup+ |
| Node Exporter | Bundled in kube-prometheus-stack |
Host-level metrics (CPU, memory, disk, network) | All |
| kube-state-metrics | Bundled in kube-prometheus-stack |
Kubernetes object metrics (deployments, pods, nodes) | All |
All Helm charts are embedded in the k8scope binary β no internet connection or Helm repositories required at install time.
The OTel Collector deployment topology scales with your needs:
| Mode | Deployment | Role |
|---|---|---|
startup |
DaemonSet | Collects node logs, receives app OTLP data, exports to Prometheus and Loki |
production |
DaemonSet + Gateway | DaemonSet collects per-node data, Gateway centralizes processing and routing |
enterprise |
Gateway (HA) | Tenant-aware routing, sampling, rate limiting, and filtering |
Applications instrumented with OpenTelemetry SDKs can send metrics, logs, and traces to the Collector's OTLP endpoint out of the box β no additional configuration needed.
k8scope ships with 5 focused dashboards instead of dozens of generic ones. Each dashboard includes template variables for namespace filtering and auto-configured datasource references:
| Dashboard | Description | Key Panels |
|---|---|---|
| Cluster Overview | Bird's-eye view of the entire cluster | Node count, pod status, CPU/memory usage, top namespaces by resource, cluster events |
| Node Resources | Per-node hardware utilization | CPU usage per core, memory usage vs available, disk I/O, network bandwidth per interface |
| Pod Resources | Per-pod resource consumption | CPU/memory requests vs actual usage, restart count, container status, OOMKill history |
| Networking | Network traffic and errors | Bytes received/transmitted per pod, packet drop rate, DNS lookup latency, connection states |
| Logs Overview | Centralized log exploration | Log volume by namespace/pod, error rate over time, log stream browser, full-text search |
All dashboards are stored as JSON files and provisioned automatically via Grafana's sidecar mechanism with the k8scope-dashboard label.
k8scope ships with 17 alerting rules organized by severity. Each alert includes a descriptive summary, detailed description, and a runbook_url for remediation guidance. All rules carry the k8scope: "true" label for easy identification.
These alerts require immediate action and should be routed to paging systems (PagerDuty, Opsgenie, phone calls):
| Alert | Condition | for |
|---|---|---|
KubeNodeNotReady |
A node has been in NotReady state | 5m |
KubePodCrashLooping |
Pod has restarted more than 5 times in 15 minutes | 0m |
KubePVCFillingUp |
PVC is >90% full and predicted to fill in 4 hours | 5m |
PrometheusDown |
A Prometheus target is unreachable | 3m |
ClusterCPUOvercommit |
CPU requests exceed 90% of total allocatable | 10m |
ClusterMemoryOvercommit |
Memory requests exceed 90% of total allocatable | 10m |
These alerts require attention on the next business day and should be routed to Slack/email:
| Alert | Condition | for |
|---|---|---|
KubePodNotReady |
Pod has been not-ready for an extended period | 15m |
KubeDeploymentReplicasMismatch |
Available replicas don't match desired count | 15m |
KubeContainerOOMKilled |
Container was OOMKilled in the last hour | 0m |
KubeHPAMaxedOut |
HPA has been at max replicas for 30 minutes | 30m |
NodeHighCPU |
Node CPU usage above 80% | 30m |
NodeHighMemory |
Node memory usage above 85% | 30m |
NodeDiskPressure |
Root filesystem usage above 80% | 15m |
LokiIngestionErrors |
Loki is experiencing append failures | 10m |
These alerts are informational β useful for audit trails and dashboards, but don't require notification:
| Alert | Condition | for |
|---|---|---|
KubeDeploymentRollingUpdate |
A deployment rollout is in progress | 0m |
KubeNamespaceTerminating |
A namespace has been terminating (possibly stuck finalizers) | 5m |
PrometheusRuleFailures |
Prometheus is failing to evaluate some rules | 10m |
k8scope accepts configuration via CLI flags, a YAML config file, or both. When both are provided, CLI flags take precedence over values in the config file.
k8scope install [flags]
Flags:
--mode string Deployment mode: startup, production, enterprise (default "startup")
--namespace string Target Kubernetes namespace (default "k8scope")
--kubeconfig string Path to kubeconfig file (default: $KUBECONFIG or ~/.kube/config)
--dry-run Preview installation plan without applying changes
--config string Path to k8scope configuration fileConfig file auto-discovery order: .k8scope.yaml (current dir) > $HOME/.k8scope.yaml.
mode: production
namespace: monitoring
kubeconfig: ~/.kube/config
components:
prometheus:
replicas: 2
storage: 50Gi
retention: 30d
grafana:
replicas: 2
loki:
replicas: 3
storage: 50Gi
retention: 30d
otelCollector:
mode: daemonset+gateway
gateway:
enabled: true
className: traefik # any Gateway API-compatible controller (traefik, envoy, cilium, istio)
domain: observability.company.com
tls:
enabled: true
source: cert-manager
issuer: letsencrypt-prod
hosts:
grafana: grafana.observability.company.com
prometheus: prometheus.observability.company.com # optional
alertmanager: alertmanager.observability.company.com # optionalBefore installing, k8scope automatically verifies:
| Check | Description |
|---|---|
| Cluster connectivity | Validates the kubeconfig and confirms the API server is reachable |
| Kubernetes version | Ensures the cluster runs a supported version (β₯1.25) |
| Resource availability | Checks that the cluster has enough CPU and memory to run the selected mode |
| Storage class | Verifies that a default StorageClass exists for PVC provisioning |
| Namespace conflict | Warns if the target namespace already contains observability components |
If any check fails, k8scope reports the issue and suggests remediation steps before proceeding.
Once k8scope is installed, your applications can send telemetry using the OpenTelemetry Protocol (OTLP). The OTel Collector exposes endpoints inside the cluster automatically.
Configure your OpenTelemetry SDK to export metrics to the collector:
# Environment variables for any OTel SDK
OTEL_EXPORTER_OTLP_ENDPOINT: "http://k8scope-otel-opentelemetry-collector.k8scope.svc:4317"
OTEL_EXPORTER_OTLP_PROTOCOL: "grpc"Metrics are forwarded from the OTel Collector to Prometheus via remote write and become queryable in Grafana immediately.
Application logs sent via OTLP are automatically forwarded to Loki:
OTEL_EXPORTER_OTLP_ENDPOINT: "http://k8scope-otel-opentelemetry-collector.k8scope.svc:4317"
OTEL_EXPORTER_OTLP_PROTOCOL: "grpc"
OTEL_LOGS_EXPORTER: "otlp"Container logs (stdout/stderr) are collected automatically by the OTel Collector DaemonSet β no application changes needed for basic log collection.
Below are the concrete values k8scope applies for the startup mode. Production and enterprise modes build upon these with additional replicas, storage, and features.
| Setting | Startup |
|---|---|
| Replicas | 1 |
| CPU request / limit | 250m / 1000m |
| Memory request / limit | 512Mi / 2Gi |
| Storage | 10Gi PVC (ReadWriteOnce) |
| Retention | 7 days |
| Service name | k8scope-prom-prometheus |
Prometheus is configured with serviceMonitorSelectorNilUsesHelmValues: false, meaning it will scrape all ServiceMonitors in the cluster regardless of labels.
| Setting | Startup |
|---|---|
| Replicas | 1 |
| CPU request / limit | 100m / 500m |
| Memory request / limit | 256Mi / 512Mi |
| Storage | 5Gi PVC |
| Admin password | Auto-generated (retrieve via kubectl get secret) |
| Pre-configured datasources | Prometheus, Loki |
| Dashboard provisioning | Sidecar with k8scope-dashboard label |
| Setting | Startup |
|---|---|
| Deployment mode | SingleBinary |
| Replicas | 1 |
| CPU request / limit | 100m / 500m |
| Memory request / limit | 256Mi / 1Gi |
| Storage | 10Gi filesystem PVC |
| Retention | 7 days (168h) |
| Auth | Disabled (single-tenant) |
| Schema | v13 with TSDB store |
| Setting | Startup |
|---|---|
| Replicas | 1 |
| CPU request / limit | 50m / 200m |
| Memory request / limit | 64Mi / 128Mi |
| Storage | 1Gi PVC |
| Group by | alertname, namespace, severity |
| Group wait / interval | 30s / 5m |
| Repeat interval | 4h (critical: 1h) |
Alertmanager is deployed as part of kube-prometheus-stack with three receivers pre-configured: default, critical, and null (for silencing Watchdog alerts). Inhibition rules prevent warning alerts from firing when a matching critical alert is already active.
| Setting | Startup |
|---|---|
| Deployment mode | DaemonSet |
| CPU request / limit | 100m / 500m |
| Memory request / limit | 128Mi / 512Mi |
| Memory limiter | 400 MiB (spike: 100 MiB) |
| Receivers | OTLP (gRPC :4317, HTTP :4318) |
| Exporters | Prometheus remote write, Loki OTLP/HTTP |
| Presets | Log collection, Kubernetes attributes |
| Batch size / timeout | 1024 / 5s |
k8scope also publishes its Helm chart for direct use with ArgoCD, Flux, or plain Helm:
helm install k8scope oci://ghcr.io/y0s3ph/k8scope --values custom-values.yamlThis is useful if you prefer declarative GitOps workflows over imperative CLI commands, or if you want to manage the observability stack alongside your other Helm releases.
# Clone the repository
git clone https://github.com/y0s3ph/k8scope.git
cd k8scope
# Build the binary
go build -o k8scope ./cmd/k8scope
# Run tests
go test ./...
# Build with version info (as goreleaser does)
go build -ldflags "-X github.com/y0s3ph/k8scope/internal/cli.version=dev" -o k8scope ./cmd/k8scopeRequirements:
- Go 1.25+
- Access to a Kubernetes cluster (for E2E tests)
k8scope has two levels of testing:
Unit tests β Fast, no cluster required:
go test ./internal/... ./embed/...End-to-End tests β Require a Kind cluster:
# Create a Kind cluster
kind create cluster --config test/e2e/testdata/kind-config.yaml --name k8scope-e2e
# Run E2E tests
go test -v -tags e2e -timeout 12m ./test/e2e/...
# Clean up
kind delete cluster --name k8scope-e2eE2E tests validate the full installation lifecycle: binary build, install --mode startup, pod readiness, --dry-run behavior, and error handling for invalid modes.
- CLI scaffolding with mode-based installation plans
- Helm SDK installer engine (install, upgrade, uninstall, status)
- Preflight checks (cluster, version, resources, storage class)
- Configuration file loading with CLI flag override and merge semantics
- CI/CD pipeline with cross-platform builds and container images
- Embed and deploy Prometheus (
kube-prometheus-stack) - Embed and deploy Grafana with auto-configured Prometheus and Loki datasources
- Embed and deploy Loki for log aggregation (SingleBinary mode)
- Enable Alertmanager with basic routing and severity-based receivers
- Deploy OTel Collector DaemonSet with log collection and OTLP receivers
- Create 5 curated Grafana dashboards (Cluster, Node, Pod, Networking, Logs)
- Create 17 curated alerting rules by severity (Critical, Warning, Info)
- E2E test suite with Kind for startup mode validation
- High availability configuration (multi-replica Prometheus, Grafana, Alertmanager)
- Retention policies and storage management (50Gi, 30-day retention)
- OTel Collector Gateway deployment for centralized processing
-
statuscommand with component health checks -
upgradecommand with safe rollouts and rollback -
uninstallcommand with cleanup confirmation -
devmode with Docker Compose for local development
- OIDC/SSO authentication for Grafana
- External storage backends (S3, GCS, Azure Blob)
- Multi-tenant isolation with per-tenant OTel pipelines
- Gateway API integration with TLS for all components (replaces deprecated ingress-nginx)
- Helm chart publication to OCI registry
- Tempo integration for distributed tracing
- Dedicated documentation site
Apache License 2.0 β see LICENSE for details.