Skip to content

mhjacks/ramendr-analysis

Repository files navigation

Towards Ramen as a More Appealing Kubernetes Community Standard

Rationale

Ramen is currently the premier community Disaster Recovery orchestrator for Kubernetes. It is, however, very clearly slanted towards the Red Hat/IBM "ecosystem" within Kubernetes, currently having siginificant dependencies on both Ceph/ODF and OCM/RHACM. We believe that it would be beneficial for the whole Kubernetes community -- especially the parts that are not running OpenShift and Red Hat Advanced Cluster Management -- to have standards for Disaster Recovery, and Ramen is the current incumbent in that space. At the same time, other orchestrators can make use of a standardized set of replication objects, and compete with Ramen in this space as desired. The landscape is especially challenging for storage array vendors, whose products feature sophisticated replication capabilities, but Kubernetes currently lacks the standardized mechanisms to allow their use. One of the main goals of this effort is to create standard API objects that storage array vendors can provide in their drivers; these objects can then be operated on by Ramen directly.

Goals

What we are proposing will take place in two phases. The first phase will include introducing abstractions into Ramen to allow for an alternative path to follow besides OCM. This is collectively the "Inter-Cluster State API" referred to in this diagram. This will allow for someone with sufficient motivation to implement an Object Transport System as an alternative to OCM.

Meanwhile, the Inter-Cluster State API will be contributed to Ramen, and the CSI Replication Add-Ons (aka CSI Repl API) will be proposed and standardized upstream in Kubernetes.

Phase 1 - Create API for OCM Abstraction

Phase 1 - Create API for OCM Abstraction

Phase 2 - Motivated 3rd Party Implements API for OCM Abstraction

Phase 2 - Motivated 3rd Party Implements API for OCM Abstraction

  1. Standardize Ramen's CSI Add-ons as part of the Kubernetes CSI specification. Today, Kubernetes does not have a standardized way of talking about replication status between clusters. A pre-requisite for having a standardized orchestrator is having a vendor-independent way of operating on storage objects, which Ramen has defined. This effort is now underway in the Kubernetes Storage SIG and Data Protection Working Group as of February 2026. Previous attempts to standardize these add-ons failed because there was no quorum or agreement from storage solution vendors/implementors; this proposal has in part been prompted by a higher level of interest in such a standardization process from different storage solution providers.

  2. Weaken (but do not eliminate) Ramen's dependency on ACM/OCM for orchestration. Clearly a significant part of the Kubernetes community (especially those who use ACM/OCM today) are well served by ACM-style orchestration, which Ramen currently uses. Other customers and users have expressed desires to not require ACM, but to be able to use it if needed. This has significant architectural ramifications for the project, which will be discussed below. This proposal can and should be characterized as "ACM First, but not ACM only." This will consist in documenting Ramen's existing ACM/OCM dependencies and providing concrete advice and a proposal on how to create space for an alternative orchestration implementation to control Ramen and the DR process.

  3. Strengthen Ramen Upstream. We believe that creating this architectural space within the Ramen project will make Ramen more appealing to a broader audience, especially storage solution providers; in particular any who may have initially evaluated Ramen and rejected it for being too Red Hat or IBM-specific. Historically, the Ramen technology stack has focused on Red Hat and OpenShift-centric technologies (specifically ACM and ODF), though the Ramen upstream tests use OCM as opposed to ACM, minikube as opposed to OpenShift, MinIO as opposed to Ceph. Still, without a path to the standardization of the CSI add-ons (thereby showing a path for other users for the orchestration part of Ramen), or without any expressed interest from the project in accepting work towards making the ACM dependency optional, potential contributors might have considered other options before trying to contribute to Ramen.

Non-Goals

  1. GitOps integration/Subscription applications are out of scope. Ramen's current GitOps/Subscription Applications feature depends on some very specific integrations OCM has with ArgoCD/OpenShift GitOps. As a practical matter, anecdotes suggest that this is not the way most users are doing Disaster Recovery, especially with Virtual Machines.

  2. UI/UX Presentation is out of scope. It is absolutely the intention of this proposal to provide all the plumbing and status reporting necessary to implement a UI/UX for Disaster Recovery. However, the framework or location for that UI/UX is out of scope of this proposal. Ramen's current UI/UX layer is implemented as an ACM extension, which runs in the OpenShift Console.

  3. Secure transport of secrets is out of scope. This proposal will not specify how to transport secrets; either how to retrieve them from managed clusters or how to create or manage them on DR-protected clusters.

  4. Management of ManagedClusterAddOns is out of scope. Ramen can install and configure plugins like volsync for OADP/Velero. This depends on ManagedClusterAddOns, an ACM/OCM feature. As such, users of Ramen without ACM/OCM will need to orchestrate the installation of those components another way.

  5. Determination of where to run the control plane functions for an alternative orchestrator are out of scope. Today, ACM/OCM requires a hub separate from the managed DR clusters to run the control plane. It is up to the alternative orchestrator (referred to as Object Transport System in these documents) where to run the control plane functions.

The proposal contains more details, including specific APIs. Abstraction APIs will be provided that wraps the functionality of critical ACM/OCM APIs (mainly ManifestWork and ManagedClusterView). Most interaction with RamenDR itself will be done through the DisasterRecoveryPlacementControl object as it currently works. Any interested party can then provide (outside of ACM/OCM) functionality to securely transport configuration and secrets between clusters.

Supporting Documents and Diagrams

These documents were prepared with the assistance of Cursor. The diagrams are in Mermaid format; and in addition there are graphviz "dot" files provided as well.

This document describes the APIs that Ramen itself provides.

This document describes how Ramen interacts with the CSI Replication Addons.

Architecture and Workflows GraphViz dot files

Thils document describes Ramen's failover and state machine operations in detail.

Diagrams GraphViz dot files

These documents list the actual dependencies on ACM/OCM in Ramen.

Analysis Diagrams Isolation "Straw Man" Proposal GraphViz dot files

About

Cursor-assisted RamenDR analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors