Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions content/blog/lfx-chaos-testing-yash-agarwal/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
title: "Chaos testing the CloudNativePG project"
date: 2025-12-16
draft: false
image:
url: yash.jpeg
attribution:
authors:
- fdrees
tags:
- lfx
- mentorship
- kubernetes
- postgresql
- litmus
- devops
summary: "Meet the mentee: Yash Agarwal worked with the project maintainers on adding chaos testing to CloudNativePG, as part of the LFX mentorship program."
---

In the summer we wrote about how CloudNativePG was back for the September-October-November LFX term with [several projects for mentoring](https://cloudnative-pg.io/blog/2025-term3-lfx-cncf-mentorship/). One of them was around Chaos Testing.

Yash Agarwal worked with mentors and CloudNativePG maintainers Gabriele Bartolini, Marco Nenciarini, Francesco Canovai, and Jonathan Gonzalez, to enhance the project's test coverage. Introducing LitmusChaos, a comprehensive chaos testing framework, the team designed automated chaos experiments for common failure scenarios, integrated them into CI/CD workflows, and collected observability metrics like failover time and data consistency. I had a chat with Yash about his work, and about how he got into Tech in the first place.

## Start at the beginning

Yash's venture into programming started when he got introduced to Python in 11th grade. He was always fascinated by technology, and whenever he and his cousin Amit (now a software developer as well) met, he asked him a lot of questions "about everything".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, instead of this, you can write something like "I was inspired by my cousin brother Amit, who is a software developer"


Today Yash is a full stack developer intern at Seeqlo, where he, among other things, focuses on streamlining cloud operations and optimizing performance. Based in Bengaluru, India, Yash is a member of Point Blank, a student-run tech community dedicated to learning together.

He looks back at working with the CloudNativePG team as a "great learning experience". They met twice a week for 30 minutes to discuss the progress of the project. One thing that Yash says he learned from Jonathan is to have more patience. When he was ready to give up on gaining access to the Litmus Chaos Slack workspace, Jonathan hand-held him through the process.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed, it was not like that, but he was very patient and helpful throughout the 3 months.


## Chaos testing

The new [chaos-testing repository](https://github.com/cloudnative-pg/chaos-testing) Yash worked on provides automated tools to validate PostgreSQL cluster resilience under failure conditions. It combines two testing approaches:

* Jepsen Consistency Testing - Uses the famous Jepsen framework to perform mathematical proofs of database consistency. It continuously runs database operations (50 ops/sec) and validates that no data is lost or corrupted during failures.
* LitmusChaos Fault Injection - Uses LitmusChaos to simulate real-world failures by repeatedly deleting the PostgreSQL primary pod (every 60-180 seconds), forcing CloudNativePG to perform automatic failover.

You can read more about the project in the repository's [README](https://github.com/cloudnative-pg/chaos-testing/blob/main/README.md). And, in case you're curious, here's Yash's PR: https://github.com/cloudnative-pg/chaos-testing/pull/3


## Contributing to Litmus itself

Yash wasn't able to find how to get the chaos engine to target the primary pods since the appKind CloudNativePG uses isn't natively supported by Litmus. "I tried many things, but when I tried AppKind as "Cluster" with capital C it worked! I read the Litmus code and found that there were some validations which prevented "cluster" from working. This behavior was not described in Litmus' documentation, which meant I could submit a PR and prevent the next person from running into the same issue!"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was the other way around, small c worked


## What's next?

In the second half of his 3rd year, Yash is exploring opportunities in the field of backend and DevOps. "I will surely try to contribute more towards CloudNativePG when time permits!" You can follow Yash's work on [GitHub](https://github.com/XploY04).
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading