Fix audit log writing errors for rollover-enabled alias indices#5900
Fix audit log writing errors for rollover-enabled alias indices#5900pCastq wants to merge 7 commits intoopensearch-project:mainfrom
Conversation
|
The part of integration test is to hard. We need separate test classes because they test distinct code paths that require different cluster configurations. Cleaning indices between tests would require cluster restarts or expensive index deletion, significantly slowing CI/CD pipelines. Instead, we use delta-based assertions (before/after counts) that ensure test isolation without cleanup overhead. And there are other stuff that need to look carefully throught the object There are a lot of comment and javadoc. |
nibix
left a comment
There was a problem hiding this comment.
Thank you for this. I left a couple of comments.
Could you also please:
- Check the CI errors regarding code hygiene? maybe it is just a missing spotlessApply
- Write a changelog entry
- Change the name of this PR to something more descriptive
Thank you :)
...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java
Outdated
Show resolved
Hide resolved
...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java
Outdated
Show resolved
Hide resolved
...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java
Outdated
Show resolved
Hide resolved
| */ | ||
| @ClassRule | ||
| public static final LocalCluster cluster = new LocalCluster.Builder().clusterManager(ClusterManager.SINGLENODE) | ||
| .anonymousAuth(true) |
There was a problem hiding this comment.
I am still not quite sure if we need anonymous auth?
...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java
Show resolved
Hide resolved
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
Removed unnecessary comments from audit log integration tests. Signed-off-by: pCastq <131659139+pCastq@users.noreply.github.com>
Signed-off-by: pCastq <131659139+pCastq@users.noreply.github.com>
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
|
I intentionally did not unify these tests with InternalAuditLogTest because they validate different responsibilities and operate under fundamentally different assumptions. InternalAuditLogTest is a cluster-level smoke test: it verifies that, in a fully secured single-node cluster, the audit index is created, shards are allocated, and the index reaches green health. Its purpose is infrastructure readiness, not application logic. The tests in this PR instead focus on InternalOpenSearchSink behavior: index creation when absent, alias detection, and write routing for date-based vs alias-based configurations. They explicitly cover sink-level code paths that InternalAuditLogTest does not exercise at all. Unifying the tests would also force a much heavier configuration (admin users, HTTP Basic auth, compliance logging, transport events). This would introduce unnecessary overhead, extra audit noise, and additional failure modes, without increasing coverage for the sink logic being tested. Keeping these tests separate preserves clarity, focus, and maintainability. Each test suite remains aligned with its specific goal, and failures remain easy to diagnose. Combining them would add complexity without providing real value, in my opinion. However , as @nibix suggest, I use Awaitility instead of Thread.sleep() because audit events are generated asynchronously allows the test to wait until a meaningful functional condition is met (i.e. a new audit event becomes visible) and to fail deterministically after a bounded timeout. This approach makes the test more robust, avoids flakiness in CI, and ensures that we only proceed once the audit data has actually been indexed and refreshed. |
| generateAuditEvent("_cluster/health"); | ||
|
|
||
| await().atMost(3, SECONDS).pollInterval(100, MILLISECONDS).untilAsserted(() -> { | ||
| refreshAuditIndices(client); // refresh prima di verificare |
There was a problem hiding this comment.
si prega di preferire l'uso dell'inglese :-)
...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java
Outdated
Show resolved
Hide resolved
...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java
Outdated
Show resolved
Hide resolved
| * and midnight rollover timing.</p> | ||
| */ | ||
| @Test | ||
| public void testMultipleRequestTypesGenerateAuditEvents() { |
There was a problem hiding this comment.
Could you elaborate why this is something related to the OpenSearch sink? Isnt this a behavior that occurs at a higher level?
...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java
Outdated
Show resolved
Hide resolved
| * test created a new event.</p> | ||
| */ | ||
| @Test | ||
| public void testAuditDocumentsViaAliasContainMandatoryFields() { |
There was a problem hiding this comment.
Is this a test which covers the alias logic? It seems to be this covers logic from a higher level.
...onTest/java/org/opensearch/security/auditlog/sink/InternalOpenSearchSinkIntegrationTest.java
Show resolved
Hide resolved
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
| * previous tests. Uses delta assertion to ensure a new event was created.</p> | ||
| */ | ||
| @Test | ||
| public void testAuditDocumentContainsMandatoryFields() { |
There was a problem hiding this comment.
This test is intentionally at the integration level because it verifies the real effect of the original bug: before the fix, writing via an alias failed and no document was created.
Even though it doesn’t call metadata.hasAlias() directly, it confirms that alias writes behave like direct index writes, including all required fields, and it protects against future regressions, for example if the audit log writing logic is refactored.
Removing it would remove this critical check.
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
|
|
||
| generateAuditEvent("_cluster/health"); | ||
|
|
||
| await().atMost(10, SECONDS).pollInterval(100, MILLISECONDS).untilAsserted(() -> { |
There was a problem hiding this comment.
10 seconds is actually the default timeout constraint, so this can be dropped. Same holds for the poll interval.
| long before = countAuditDocs(client); | ||
| generateAuditEvent("_cluster/health"); | ||
|
|
||
| await().atMost(3, SECONDS).pollInterval(100, MILLISECONDS).until(() -> countAuditDocs(client) > before); |
There was a problem hiding this comment.
can we also use default timeout and poll interval here?
| assertThat("Missing mandatory field: audit_request_origin", auditDoc.containsKey("audit_request_origin"), is(true)); | ||
| assertThat("Missing mandatory field: @timestamp", auditDoc.containsKey("@timestamp"), is(true)); | ||
| assertThat("Missing REST field: audit_rest_request_method", auditDoc.containsKey("audit_rest_request_method"), is(true)); | ||
| assertThat("Missing REST field: audit_rest_request_path", auditDoc.containsKey("audit_rest_request_path"), is(true)); |
There was a problem hiding this comment.
The hamcrest way of doing such assertions would be
assertThat(auditDoc, hasKey("@timestamp"));
assertThat(auditDoc, hasKey("audit_rest_request_method"));
assertThat(auditDoc, hasKey("audit_rest_request_path"));
This will yield much richer assertion errors in the case of test failures.
| new SearchRequest(AUDIT_ALIAS).source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()).size(0)) | ||
| ).actionGet(); | ||
|
|
||
| return Objects.requireNonNull(response.getHits().getTotalHits()).value(); |
There was a problem hiding this comment.
This Objects.requireNonNull is redundant, with our without you will get a NPE in case getTotalHits() returns null.
|
Thank you, looks much better! I have added a few minor comments. Also there are still a few leftover comments from the previous round. Could you please have a look at these? |
Description
[Bug fix, Enhancement , Test fix]
the audit log cannot be written to an index that already exists when using an alias with a rollover policy. This causes errors and prevents audit events from being stored.
When an audit log index already exists under an alias, OpenSearch throws an error and does not insert audit events.
The audit log now detects if the target index already exists and inserts new events into it instead of failing.
Issues Resolved
[#5878 + integration test]
Testing
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.