Fix audit log writing errors for rollover-enabled alias indices by pCastq · Pull Request #5900 · opensearch-project/security

pCastq · 2026-01-12T00:51:05Z

Description

[Bug fix, Enhancement , Test fix]

the audit log cannot be written to an index that already exists when using an alias with a rollover policy. This causes errors and prevents audit events from being stored.

Old behavior:
When an audit log index already exists under an alias, OpenSearch throws an error and does not insert audit events.
New behavior:
The audit log now detects if the target index already exists and inserts new events into it instead of failing.

Issues Resolved

[#5878 + integration test]

Testing

Integration tests confirming that rollover policies are respected and no errors occur when writing to an existing index.

Check List

New functionality includes testing
New functionality has been documented
New Roles/Permissions have a corresponding security dashboards plugin PR
API changes companion pull request created
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

pCastq · 2026-01-12T02:24:26Z

The part of integration test is to hard.
(I've been racking my brains for three days trying to figure out the right parts to use)

We need separate test classes because they test distinct code paths that require different cluster configurations.
InternalOpenSearchSinkIntegrationTest -> validates the default date-based index creation (metadata.hasIndex branch), while
InternalOpenSearchSinkIntegrationTest_AuditAlias -> validates write alias support (metadata.hasAlias branch).
These configurations are mutually exclusive and cannot coexist in a single @ClassRule cluster.

Cleaning indices between tests would require cluster restarts or expensive index deletion, significantly slowing CI/CD pipelines. Instead, we use delta-based assertions (before/after counts) that ensure test isolation without cleanup overhead.

And there are other stuff that need to look carefully throught the object
For example when we use LocalCluster, if we use a public Builder audit(AuditConfiguration auditConfiguration) method , we using always TestRuleAuditLogSink and have some pre-setting or fixed setting...

There are a lot of comment and javadoc.

nibix

Thank you for this. I left a couple of comments.

Could you also please:

Check the CI errors regarding code hygiene? maybe it is just a missing spotlessApply
Write a changelog entry
Change the name of this PR to something more descriptive

Thank you :)

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

nibix · 2026-01-14T14:14:49Z

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

+     */
+    @ClassRule
+    public static final LocalCluster cluster = new LocalCluster.Builder().clusterManager(ClusterManager.SINGLENODE)
+        .anonymousAuth(true)


do we need that here?

I am still not quite sure if we need anonymous auth?

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

Removed unnecessary comments from audit log integration tests. Signed-off-by: pCastq <131659139+pCastq@users.noreply.github.com>

Signed-off-by: pCastq <131659139+pCastq@users.noreply.github.com>

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

pCastq · 2026-01-16T00:38:12Z

I intentionally did not unify these tests with InternalAuditLogTest because they validate different responsibilities and operate under fundamentally different assumptions.

InternalAuditLogTest is a cluster-level smoke test: it verifies that, in a fully secured single-node cluster, the audit index is created, shards are allocated, and the index reaches green health. Its purpose is infrastructure readiness, not application logic.

The tests in this PR instead focus on InternalOpenSearchSink behavior: index creation when absent, alias detection, and write routing for date-based vs alias-based configurations. They explicitly cover sink-level code paths that InternalAuditLogTest does not exercise at all.

Unifying the tests would also force a much heavier configuration (admin users, HTTP Basic auth, compliance logging, transport events). This would introduce unnecessary overhead, extra audit noise, and additional failure modes, without increasing coverage for the sink logic being tested.

Keeping these tests separate preserves clarity, focus, and maintainability. Each test suite remains aligned with its specific goal, and failures remain easy to diagnose. Combining them would add complexity without providing real value, in my opinion.

However , as @nibix suggest, I use Awaitility instead of Thread.sleep() because audit events are generated asynchronously allows the test to wait until a meaningful functional condition is met (i.e. a new audit event becomes visible) and to fail deterministically after a bounded timeout.

This approach makes the test more robust, avoids flakiness in CI, and ensures that we only proceed once the audit data has actually been indexed and refreshed.

nibix · 2026-01-20T10:23:17Z

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

+            generateAuditEvent("_cluster/health");
+
+            await().atMost(3, SECONDS).pollInterval(100, MILLISECONDS).untilAsserted(() -> {
+                refreshAuditIndices(client); // refresh prima di verificare


si prega di preferire l'uso dell'inglese :-)

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

nibix · 2026-01-20T10:30:26Z

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

+     * and midnight rollover timing.</p>
+     */
+    @Test
+    public void testMultipleRequestTypesGenerateAuditEvents() {


Could you elaborate why this is something related to the OpenSearch sink? Isnt this a behavior that occurs at a higher level?

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

nibix · 2026-01-20T10:34:33Z

...a/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTestAuditAlias.java

+     * test created a new event.</p>
+     */
+    @Test
+    public void testAuditDocumentsViaAliasContainMandatoryFields() {


Is this a test which covers the alias logic? It seems to be this covers logic from a higher level.

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

pCastq · 2026-01-21T22:49:18Z

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

-     * previous tests. Uses delta assertion to ensure a new event was created.</p>
     */
    @Test
    public void testAuditDocumentContainsMandatoryFields() {


This test is intentionally at the integration level because it verifies the real effect of the original bug: before the fix, writing via an alias failed and no document was created.
Even though it doesn’t call metadata.hasAlias() directly, it confirms that alias writes behave like direct index writes, including all required fields, and it protects against future regressions, for example if the audit log writing logic is refactored.
Removing it would remove this critical check.

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

nibix · 2026-01-30T11:11:13Z

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

+
+            generateAuditEvent("_cluster/health");
+
+            await().atMost(10, SECONDS).pollInterval(100, MILLISECONDS).untilAsserted(() -> {


10 seconds is actually the default timeout constraint, so this can be dropped. Same holds for the poll interval.

nibix · 2026-01-30T11:12:40Z

...a/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTestAuditAlias.java

+            long before = countAuditDocs(client);
+            generateAuditEvent("_cluster/health");
+
+            await().atMost(3, SECONDS).pollInterval(100, MILLISECONDS).until(() -> countAuditDocs(client) > before);


can we also use default timeout and poll interval here?

nibix · 2026-01-30T11:18:10Z

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java

+            assertThat("Missing mandatory field: audit_request_origin", auditDoc.containsKey("audit_request_origin"), is(true));
+            assertThat("Missing mandatory field: @timestamp", auditDoc.containsKey("@timestamp"), is(true));
+            assertThat("Missing REST field: audit_rest_request_method", auditDoc.containsKey("audit_rest_request_method"), is(true));
+            assertThat("Missing REST field: audit_rest_request_path", auditDoc.containsKey("audit_rest_request_path"), is(true));


The hamcrest way of doing such assertions would be

assertThat(auditDoc, hasKey("@timestamp")); assertThat(auditDoc, hasKey("audit_rest_request_method")); assertThat(auditDoc, hasKey("audit_rest_request_path"));

This will yield much richer assertion errors in the case of test failures.

nibix · 2026-01-30T11:18:53Z

...a/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTestAuditAlias.java

+            new SearchRequest(AUDIT_ALIAS).source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).size(0))
+        ).actionGet();
+
+        return Objects.requireNonNull(response.getHits().getTotalHits()).value();


This Objects.requireNonNull is redundant, with our without you will get a NPE in case getTotalHits() returns null.

nibix · 2026-01-30T11:22:09Z

Thank you, looks much better! I have added a few minor comments. Also there are still a few leftover comments from the previous round. Could you please have a look at these?

pCastq requested review from DarshitChanpura, RyanL1997, cwperks, derek-ho, nibix, reta, shikharj05 and willyborankin as code owners January 12, 2026 00:51

nibix reviewed Jan 14, 2026

View reviewed changes

pCastq and others added 5 commits January 16, 2026 00:07

Fix bug opensearch-project#5878

edeb979

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

Clean up comments in InternalOpenSearchSinkIntegrationTest

ecad5b3

Removed unnecessary comments from audit log integration tests. Signed-off-by: pCastq <131659139+pCastq@users.noreply.github.com>

Refactor comments in InternalOpenSearchSinkIntegrationTest

a78fee4

Signed-off-by: pCastq <131659139+pCastq@users.noreply.github.com>

Fix audit log writing errors for rollover-enabled alias indices

06dc41e

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

update changelog for bug fix opensearch-project#5878

537a974

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

pCastq force-pushed the bug-#5878 branch from 85932bc to 537a974 Compare January 15, 2026 23:16

nibix changed the title ~~Fix bug #5878~~ Fix audit log writing errors for rollover-enabled alias indices Jan 20, 2026

nibix reviewed Jan 20, 2026

View reviewed changes

cwperks reviewed Jan 20, 2026

View reviewed changes

...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java Show resolved Hide resolved

refactoring integration tests, removing redundant test cases

9e75f4e

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

pCastq commented Jan 21, 2026

View reviewed changes

checkstyle refact

ac2cf33

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>

nibix reviewed Jan 30, 2026

View reviewed changes


		generateAuditEvent("_cluster/health");

		await().atMost(10, SECONDS).pollInterval(100, MILLISECONDS).untilAsserted(() -> {

Conversation

pCastq commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Resolved

Testing

Check List

Uh oh!

pCastq commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nibix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nibix Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

nibix Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pCastq commented Jan 16, 2026

Uh oh!

nibix Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nibix Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nibix Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pCastq Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nibix Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nibix Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

nibix Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

nibix Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nibix commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pCastq commented Jan 12, 2026 •

edited

Loading

pCastq commented Jan 12, 2026 •

edited

Loading

pCastq Jan 21, 2026 •

edited

Loading

nibix Jan 30, 2026 •

edited

Loading

nibix Jan 30, 2026 •

edited

Loading