Skip to content

feat: add helm charts kubernetes#704

Merged
matiwinnetou merged 50 commits intodevelopfrom
feat/add-helm-charts-kubernetes
Mar 24, 2026
Merged

feat: add helm charts kubernetes#704
matiwinnetou merged 50 commits intodevelopfrom
feat/add-helm-charts-kubernetes

Conversation

@Sotatek-DucPhung
Copy link
Copy Markdown
Collaborator

@Sotatek-DucPhung Sotatek-DucPhung commented Feb 24, 2026

#694

What this PR does

Introduces a production-ready Kubernetes deployment for Cardano Rosetta Java using Helm
charts. The project previously shipped only a Docker Compose stack. This PR adds an
umbrella Helm chart with 5 subcharts, 3 pre-built value overlays (mainnet, preprod, K3s),
and full operator documentation — enabling deployment on K3s single-host servers and
managed cloud clusters (EKS, GKE, AKS).

All chart defaults are derived from the Docker Compose stack (.env.docker-compose*) as
the canonical source of truth. The only intentional deviations from Compose behavior are
technically required by Kubernetes (documented below).


Chart Structure

helm/cardano-rosetta-java/              ← umbrella chart
├── Chart.yaml
├── values.yaml                         ← mainnet defaults (matches .env.docker-compose)
├── values-preprod.yaml                 ← preprod overrides (matches .env.docker-compose-preprod)
├── values-k3s.yaml                     ← single-host K3s overrides
├── templates/
│   ├── namespace.yaml                  ← creates cardano namespace with Helm labels
│   ├── serviceaccount.yaml             ← shared SA for all pods
│   ├── role.yaml + rolebinding.yaml    ← read-only access to pods/jobs (for init containers)
│   ├── mithril-job.yaml                ← one-shot Mithril snapshot download
│   ├── index-applier-job.yaml          ← DB index builder (automatic or hook mode)
│   ├── configmap-db-indexes.yaml       ← db-indexes.yaml config
│   ├── configmap-grafana-dashboards.yaml
│   ├── configmap-load-tests.yaml       ← k6 smoke/load test scripts
│   ├── stress-test-job.yaml            ← optional k6 load test job
│   └── helm-test-pod.yaml              ← helm test: validates API endpoints
└── charts/
    ├── cardano-node/                   ← StatefulSet: node + socat sidecar + submit-api
    ├── postgresql/                     ← StatefulSet: PostgreSQL with blockchain tuning
    ├── yaci-indexer/                   ← Deployment: blockchain data indexer
    ├── rosetta-api/                    ← Deployment: Rosetta HTTP API
    └── monitoring/                     ← Deployment: Prometheus + Grafana + exporters

Startup / Dependency Chain

Mirrors the Docker Compose depends_on chain via init containers:

Docker Compose                               Kubernetes
──────────────────────────────────────────────────────────────────────
mithril (service_completed_successfully)  →  Mithril Job (job.succeeded)
cardano-node (after mithril)              →  cardano-node (wait-for-mithril initContainer)
cardano-sync-waiter (wait for 100% sync)  →  postgresql (wait-for-node-sync initContainer)
db (service_healthy)                      →  postgresql readinessProbe (pg_isready)
yaci-indexer (depends on db healthy)      →  yaci-indexer (wait-for-postgres initContainer)
api (depends on db + yaci started)        →  rosetta-api (wait-for-postgres + wait-for-indexer)
index-applier (after api healthy)         →  index-applier Job (wait-for-api initContainer)

The cardano-node pod runs three containers: the Cardano node process, a socat
sidecar (TCP port 3002 → UNIX socket bridge), and cardano-submit-api. READY 3/3 means
all three are up.


Key Design Decisions

UNIX socket bridging via socat

Docker Compose shares the node's UNIX socket via volume mounts. Kubernetes pods cannot
share UNIX sockets across pod boundaries. A socat sidecar inside cardano-node exposes
the socket as a TCP service on port 3002. Downstream pods (yaci-indexer, postgresql
init container) connect via TCP and use the n2c-socat Spring profile.

Node sync wait — script reuse

The postgresql wait-for-node-sync init container sets up a reverse socat bridge
(TCP → /tmp/node.socket), then delegates to /sbin/wait-for-node-sync.sh from the
cardano-node image — the same script used by Docker Compose's cardano-sync-waiter
service. This ensures identical sync detection logic (progress bar output, slot
calculation from genesis files).

Automatic index-applier

The index-applier runs as a plain Kubernetes Job by default (indexApplier.mode: automatic).
It starts with the release and self-waits (via initContainer) until the API is ready.
ttlSecondsAfterFinished: 86400 cleans up the completed Job within 24 hours, making
upgrades idempotent without Job immutability errors.

Operators who need explicit control can set indexApplier.mode: hook to revert to the
Helm post-install/post-upgrade hook behavior.

Hardware profiles

Three built-in profiles scale all resources proportionally:

Profile Use case Total RAM Total vCPU
entry Preprod / single-host K3s 32 GB 4 cores
mid Mainnet production (default) 48 GB 8 cores
advanced High-throughput production 94 GB 16 cores

Each profile configures CPU/memory for all 4 workloads, PostgreSQL tuning parameters
(shared_buffers, work_mem, max_connections, etc.), and HikariCP connection pool sizes.

Data persistence

PVCs for cardano-node-data and postgresql-data carry helm.sh/resource-policy: keep
so they survive helm uninstall. The Mithril Job also carries this annotation to avoid
re-downloading the full snapshot on reinstall. The index-applier Job does not — it
recreates cleanly on each deploy.


Configuration Defaults vs Docker Compose

All values in values.yaml are derived from .env.docker-compose:

Helm value Default Docker Compose source
global.releaseVersion "2.1.0" RELEASE_VERSION
global.cardanoNodeVersion "10.5.4" CARDANO_NODE_VERSION
global.protocolMagic "764824073" PROTOCOL_MAGIC (quoted to prevent scientific notation)
global.network mainnet NETWORK
global.sync true SYNC
global.mithrilVersion 2543.1-hotfix MITHRIL_VERSION
yaci-indexer.env.searchLimit 100 SEARCH_LIMIT
yaci-indexer.env.logLevel error LOG
yaci-indexer.env.removeSpentUtxos true REMOVE_SPENT_UTXOS
rosetta-api.env.syncGraceSlotsCount 100 SYNC_GRACE_SLOTS_COUNT
rosetta-api.env.tokenRegistryEnabled false TOKEN_REGISTRY_ENABLED

Preprod overrides in values-preprod.yaml mirror .env.docker-compose-preprod:

Helm value Preprod override Docker Compose preprod source
global.network preprod NETWORK=preprod
global.protocolMagic "1" PROTOCOL_MAGIC=1
global.profile entry
yaci-indexer.env.removeSpentUtxos false REMOVE_SPENT_UTXOS=false
yaci-indexer.env.peerDiscovery true PEER_DISCOVERY=true
yaci-indexer.env.logLevel INFO LOG=INFO
rosetta-api.env.tokenRegistryLogoFetch true TOKEN_REGISTRY_LOGO_FETCH=true
rosetta-api.env.tokenRegistryCacheTtlHours 1 TOKEN_REGISTRY_CACHE_TTL_HOURS=1

Intentional K8s-only differences (kept, not aligned)

Value K8s Docker Compose Reason
YACI_SPRING_PROFILES postgres,n2c-socat postgres,n2c-socket UNIX sockets can't cross pod boundaries
BLOCK_TRANSACTION_API_TIMEOUT_SECS 120 5 K8s inter-pod network latency
HikariCP pool config per-profile not set K8s-specific DB connection management

Monitoring Stack

The optional monitoring subchart deploys:

  • Prometheus — scrapes cardano-node (port 12798), yaci-indexer (/actuator/prometheus),
    rosetta-api (/actuator/prometheus), postgres-exporter, and node-exporter
  • Grafana — pre-provisioned dashboards for Rosetta API metrics and server health
  • postgres-exporter — PostgreSQL stats including WAL and system metrics
    (pg_monitor role granted automatically)
  • node-exporter — host-level CPU/RAM/disk metrics via DaemonSet

Enable with monitoring.enabled: true (default). Access via kubectl port-forward.

Files Changed

New: Helm chart

Path Description
helm/cardano-rosetta-java/Chart.yaml Umbrella chart definition, 5 subchart dependencies
helm/cardano-rosetta-java/values.yaml Mainnet defaults, hardware profiles, storage config
helm/cardano-rosetta-java/values-preprod.yaml Preprod network overrides
helm/cardano-rosetta-java/values-k3s.yaml Single-host K3s overrides (hostPath storage)
helm/cardano-rosetta-java/templates/ 12 templates: namespace, RBAC, Mithril Job, index-applier, monitoring configmaps, Helm test
helm/cardano-rosetta-java/charts/cardano-node/ StatefulSet, Services, PVC
helm/cardano-rosetta-java/charts/postgresql/ StatefulSet, Services, PVC, Secret
helm/cardano-rosetta-java/charts/yaci-indexer/ Deployment, Service
helm/cardano-rosetta-java/charts/rosetta-api/ Deployment, Service, Ingress
helm/cardano-rosetta-java/charts/monitoring/ Prometheus, Grafana, postgres-exporter, node-exporter
helm/cardano-rosetta-java/files/ db-indexes.yaml, Grafana dashboard JSONs, k6 load test scripts
helm/cardano-rosetta-java/.helmignore Excludes docs, tests from packaged chart

New: Documentation

Path Description
docs/docs/install-and-deploy/kubernetes/overview.md Architecture, startup chain, hardware profiles, prerequisites
docs/docs/install-and-deploy/kubernetes/deployment.md Step-by-step runbook: preprod and mainnet deploy, all operational phases, common operations, troubleshooting, disaster recovery
docs/docs/install-and-deploy/kubernetes/helm-values.md Full values reference with Docker Compose equivalents for every configurable parameter
docs/docs/install-and-deploy/kubernetes/_category_.json Docusaurus sidebar category config

Modified

Path Description
.gitignore Added helm/**/charts/*.tgz and helm/**/Chart.lock to prevent build artifacts being committed
docker/dockerfiles/postgres/entrypoint.sh Minor fix required for K8s compatibility
docs/docs/development/monitoring_setup_guide.md Updated to reference K8s Grafana dashboards

@Sotatek-DucPhung Sotatek-DucPhung marked this pull request as draft February 24, 2026 10:32
@github-actions
Copy link
Copy Markdown
Contributor

💥 Preprod Tests: DEPLOYMENT FAILED

🔗 Action Run #214

Tests run against preprod network with live blockchain data

@Sotatek-DucPhung Sotatek-DucPhung changed the base branch from main to develop February 24, 2026 10:38
@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #214

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #215

Tests run against preprod network with live blockchain data

- Added `global.configHostPath` to specify host directory for network-specific config files.
- Updated deployment documentation to pre-create the namespace and use `--no-hooks` during Helm upgrades.
- Removed unnecessary config map template for node configuration files.
- Deleted outdated genesis and configuration files from Helm chart.
@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #216

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #217

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

❌ Preprod Tests: FAILED

📊 View Detailed Test Report

🔗 Action Run #218

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #219

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #220

Tests run against preprod network with live blockchain data

@linconvidal linconvidal moved this from Backlog to In progress in Rosetta Java Kanban Feb 26, 2026
@linconvidal linconvidal added this to the 2.2.0 milestone Feb 26, 2026
Sotatek-DucPhung and others added 5 commits March 12, 2026 17:58
- activeDeadlineSeconds: 86400 (24h) -> 259200 (72h) for index-applier Job
- cardano-node startupProbe failureThreshold default: 720 (3h) -> 1920 (8h)

The 24h deadline caused the index-applier pod to be killed by the garbage collector before sync completed. The 3h startup probe was insufficient for ImmutableDB replay on mainnet.
…#716)

## Summary

- Index-applier Job `activeDeadlineSeconds`: 24h -> 72h
- Cardano-node startup probe `failureThreshold` default: 720 (3h) ->
1920 (8h)

The 24h deadline caused the index-applier pod to be killed by the
garbage collector before sync completed. The 3h startup probe was
insufficient for ImmutableDB replay on mainnet.
@github-actions
Copy link
Copy Markdown
Contributor

❌ Preprod Tests: FAILED

📊 View Detailed Test Report

🔗 Action Run #242

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

❌ Preprod Tests: FAILED

📊 View Detailed Test Report

🔗 Action Run #243

Tests run against preprod network with live blockchain data

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

❌ Preprod Tests: FAILED

📊 View Detailed Test Report

🔗 Action Run #243

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

❌ Preprod Tests: FAILED

📊 View Detailed Test Report

🔗 Action Run #243

Tests run against preprod network with live blockchain data

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ printf "%s-rosetta-api" .Release.Name }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name: {{ include "rosetta-api.fullname" . }}

apiVersion: v1
kind: ConfigMap
metadata:
name: {{ printf "%s-load-tests" .Release.Name }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure 63 characters limit

apiVersion: batch/v1
kind: Job
metadata:
name: {{ printf "%s-index-applier" .Release.Name }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure 63 characters limit

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to remove this file completely to avoid any confusion.

component: mithril
spec:
restartPolicy: OnFailure
serviceAccountName: {{ include "cardano-rosetta-java.saName" . }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the job needs a ServiceAccount since it is not accessing the Kubernetes API.

Comment thread helm/cardano-rosetta-java/charts/postgresql/templates/secret.yaml
Comment thread helm/cardano-rosetta-java/values.yaml Outdated
## Global configuration shared across all subcharts
## -----------------------------------------------------------------------
global:
namespace: cardano
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used anymore and we don't want to predefine any namespace.

- name: node-data
persistentVolumeClaim:
claimName: {{ include "cardano-node.pvcName" . }}
- name: node-config
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node config needs to be a ConfigMap. hostPath cannot reliably be used in multi node setups.


volumes:
- name: node-config
hostPath:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node config needs to be a ConfigMap. hostPath cannot reliably be used in multi node setups.

@github-actions
Copy link
Copy Markdown
Contributor

💥 Preprod Tests: DEPLOYMENT FAILED

🔗 Action Run #243

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #243

Tests run against preprod network with live blockchain data

…emplates

Use include calls (with trunc 63) instead of inline printf patterns to
avoid code duplication and prevent failures when release names are long.
Added missing helpers: rosettaApiName, nodeDataPvcName, pgDataPvcName,
roleName, rolebindingName, testConnectionName.
Replace standalone Mithril Job + wait-for-mithril K8s API polling with a
direct mithril-download init container on cardano-node. This eliminates
the need for ServiceAccount, Role, and RoleBinding since no pod accesses
the Kubernetes API anymore.
Replace standalone Helm-managed PVCs with native Kubernetes
volumeClaimTemplates on both cardano-node and postgresql StatefulSets.
This lets K8s manage PVC lifecycle and eliminates the need for
helm.sh/resource-policy annotations. Also removed unused
global.namespace from values.yaml.
@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #244

Tests run against preprod network with live blockchain data

@github-actions
Copy link
Copy Markdown
Contributor

✅ Preprod Tests: PASSED

📊 View Detailed Test Report

🔗 Action Run #245

Tests run against preprod network with live blockchain data

Copy link
Copy Markdown

@flowftw flowftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✨ Thanks!

@matiwinnetou matiwinnetou changed the title Feat/add helm charts kubernetes feat: add helm charts kubernetes Mar 24, 2026
@matiwinnetou matiwinnetou merged commit f1f76bb into develop Mar 24, 2026
3 checks passed
@github-project-automation github-project-automation Bot moved this from In review to QA (next release) in Rosetta Java Kanban Mar 24, 2026
@matiwinnetou matiwinnetou deleted the feat/add-helm-charts-kubernetes branch March 24, 2026 13:47
@linconvidal linconvidal moved this from QA (next release) to Done in Rosetta Java Kanban Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants