Skip to content

OBSDA-1383: Make Splunk output sourcetype configurable in CLF#3251

Open
marpears wants to merge 10 commits into
openshift:masterfrom
marpears:splunk-sourcetype
Open

OBSDA-1383: Make Splunk output sourcetype configurable in CLF#3251
marpears wants to merge 10 commits into
openshift:masterfrom
marpears:splunk-sourcetype

Conversation

@marpears
Copy link
Copy Markdown

@marpears marpears commented Apr 13, 2026

Description

This PR allows for configuration of the Splunk output source type in the CLF using a new sourceType field. This is so that we can support users who have defined custom source types in Splunk.

The sourceType field can be used only when payloadKey is defined, and allows for a templated value so that it can be retrieved from a pod label. If sourceType is not defined, the current logic is preserved where it defaults to _json. If using payloadKey without sourceType, the source type used will be either _json or generic_single_line, depending on the structure of the final event payload.

CLF configuration example:

  outputs:
    - name: splunk-app-team-1
      splunk:
        authentication:
          token:
            key: hecToken
            secretName: splunk-app-team-1
        sourceType: '{.kubernetes.labels.splunk/sourcetype||"generic_single_line"}'
        url: 'https://splunk.customer.com:8088'
      type: splunk

/cc @Clee2691 @cahartma
/assign @jcantrill

Links

Summary by CodeRabbit

Release Notes

  • New Features

    • Added sourceType configuration option for Splunk outputs, allowing explicit specification of the Splunk HEC source type. Supports static values and dynamic templating from Kubernetes labels.
    • Added validation requiring sourceType to be set only when payloadKey is also defined.
  • Documentation

    • Enhanced Splunk forwarding documentation with sourceType examples, default behavior guidance, and best practices.

@jcantrill
Copy link
Copy Markdown
Contributor

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026
Copy link
Copy Markdown
Contributor

@jcantrill jcantrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no functional test to support this addition. Also note we do not backport features to earlier releases

Comment thread api/observability/v1/output_types.go Outdated
Copy link
Copy Markdown
Contributor

@jcantrill jcantrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add e2e validation tests and at least one functional test

Comment thread api/observability/v1/output_types.go
Comment thread docs/features/logforwarding/outputs/splunk-forwarding.adoc Outdated
Comment thread docs/features/logforwarding/outputs/splunk-forwarding.adoc
Comment thread docs/features/logforwarding/outputs/splunk-forwarding.adoc
Comment thread docs/features/logforwarding/outputs/splunk-forwarding.adoc Outdated
@marpears marpears changed the title Make Splunk output sourcetype configurable in CLF WIP - Make Splunk output sourcetype configurable in CLF Apr 17, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 17, 2026
@jcantrill
Copy link
Copy Markdown
Contributor

/ok-to-test

@openshift-ci openshift-ci Bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 20, 2026
@jcantrill jcantrill changed the title WIP - Make Splunk output sourcetype configurable in CLF OBSDA-1383: Make Splunk output sourcetype configurable in CLF Apr 20, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 20, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 20, 2026

@marpears: This pull request references OBSDA-1383 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature to target the "4.8.0" version, but no target version was set.

Details

In response to this:

Description

This PR allows for configuration of the Splunk output source type in the CLF using a new sourceType field. This is so that we can support users who have defined custom source types in Splunk.

If sourceType is not defined, then the current behavior is preserved where _json is the default and can be overridden if payloadKey is set based on the type of the final event payload.

CLF configuration example:

 outputs:
   - name: splunk-app-team-1
     splunk:
       authentication:
         token:
           key: hecToken
           secretName: splunk-app-team-1
       sourceType: 'my:custom:sourcetype'
       url: 'https://splunk.customer.com:8088'
     type: splunk

/cc @Clee2691 @cahartma
/assign @jcantrill

/cherrypick release-6.4
/cherrypick release-6.5

Links

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 27, 2026
@marpears
Copy link
Copy Markdown
Author

/retest

Comment thread docs/features/logforwarding/outputs/splunk-forwarding.adoc Outdated
Comment thread internal/generator/vector/output/splunk/splunk.go
Comment thread test/e2e/collection/apivalidations/api_validations_test.go
@jcantrill
Copy link
Copy Markdown
Contributor

@marpears You will need to rebase and force push.

@marpears marpears force-pushed the splunk-sourcetype branch 2 times, most recently from 1fc8d94 to b04ac06 Compare April 29, 2026 11:44
@vparfonov
Copy link
Copy Markdown
Contributor

/retest

@jcantrill
Copy link
Copy Markdown
Contributor

/label tide/squash-merge-method

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

@jcantrill: The label(s) /label tide/squash-merge-method cannot be applied. These labels are supported: acknowledge-critical-fixes-only, platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, ux-approved, no-qe, rebase/manual, cluster-config-api-changed, run-integration-tests, verified, ready-for-human-review, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/skip-dependent-bug-check, jira/valid-bug, ok-to-test, stability-fix-approved, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

Details

In response to this:

/label tide/squash-merge-method

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jcantrill
Copy link
Copy Markdown
Contributor

/label tide/merge-method-squash

@openshift-ci openshift-ci Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 29, 2026
Comment thread bundle/manifests/cluster-logging.clusterserviceversion.yaml Outdated
@marpears marpears force-pushed the splunk-sourcetype branch from b04ac06 to c55ba44 Compare April 30, 2026 08:29
@jcantrill
Copy link
Copy Markdown
Contributor

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 30, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, marpears

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2026
@marpears marpears force-pushed the splunk-sourcetype branch from c55ba44 to 48c7394 Compare May 5, 2026 13:51
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@marpears has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 48 minutes and 8 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8b89a8d9-7a44-4163-8111-b86ab14e0d7a

📥 Commits

Reviewing files that changed from the base of the PR and between 0b9d504 and 22bf7fe.

📒 Files selected for processing (1)
  • docs/features/logforwarding/outputs/splunk-forwarding.adoc

Walkthrough

This PR adds support for configurable sourceType in Splunk HTTP Event Collector outputs within ClusterLogForwarder, including cross-field validation that sourceType can only be set when payloadKey is defined, alongside updated CRD schemas, Vector configuration templates, documentation, and comprehensive test coverage.

Changes

Splunk SourceType Support

Layer / File(s) Summary
API type definition and schema validation
api/observability/v1/output_types.go, bundle/manifests/cluster-logging.clusterserviceversion.yaml, bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml, config/crd/bases/observability.openshift.io_clusterlogforwarders.yaml, config/manifests/bases/cluster-logging.clusterserviceversion.yaml
Splunk struct gains a new optional SourceType field with a Kubebuilder validation rule ensuring sourceType can only be set when payloadKey is also defined. CRD schemas across all manifests are updated to include the new field, pattern constraints for templating, and cross-field validation rules; splunk.source pattern is broadened to allow : characters.
Vector configuration generation
internal/generator/vector/output/splunk/splunk.go
Generator logic branches on whether PayloadKey and SourceType are set: when both are configured, it emits payloadKeysourceTypeTmpl (a new VRL remap template that sets ._internal.splunk.sourcetype to the user-provided value); when only PayloadKey is set, it uses the existing payloadKeyTmpl; otherwise, it defaults ._internal.splunk.sourcetype to "_json".
Vector sink configuration templates
internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml, internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.toml, internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.toml
New TOML templates define remap transforms for timestamp parsing and metadata generation, including conditional logic to derive ._internal.splunk.sourcetype from Kubernetes labels (when dynamic) or static values (when user-provided), and Splunk HEC sinks that template the source and sourcetype fields from ._internal.splunk.* while excluding internal fields from the payload.
Test case definitions
internal/generator/vector/output/splunk/splunk_test.go, test/functional/outputs/splunk/forward_to_splunk_metadata_test.go
Generator test suite adds two new test entries validating generated config against the new payloadKey + sourceType templates (both static and dynamic label-based). Functional metadata test adds a new "with user defined payloadKey and sourcetype" DescribeTable that forces payloadKey = ".message", validates the Splunk event contains only the message payload and the correct sourcetype value, and updates the "with user defined source" test to use : separator syntax.
E2E validation fixtures and test cases
test/e2e/collection/apivalidations/api_validations_test.go, test/e2e/collection/apivalidations/splunk-payloadkey.yaml, test/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yaml, test/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-*.yaml, test/e2e/collection/apivalidations/splunk-source.yaml, test/e2e/collection/apivalidations/splunk-sourcetype.yaml, test/e2e/collection/apivalidations/splunk-templated-source-*.yaml, test/e2e/collection/apivalidations/splunk-templated-sourcetype.yaml
E2E API validation test is extended with new table entries validating successful creation when payloadKey is present (with or without sourceType) and failure when sourceType is set without payloadKey; the deployment method is updated to use oc create -f - with stdin piping. New fixture YAMLs define ClusterLogForwarder resources covering: payloadKey only, static sourceType only, payloadKey + static sourceType, payloadKey + templated sourceType (with label fallbacks), templated source only, and combined configurations for comprehensive validation coverage.
Documentation and examples
docs/features/logforwarding/outputs/splunk-forwarding.adoc
Splunk forwarding documentation is updated with a new YAML example showing sourceType templated from a pod label with fallback, and detailed property descriptions including default behavior (_json when omitted, automatic nomination based on payload type when unset), template/static syntax support, guidance on payloadKey dependency, and warnings for administrator configuration matching.

Sequence Diagram

sequenceDiagram
    participant User as User/CRD
    participant Gen as Generator
    participant Vec as Vector
    participant Splunk as Splunk HEC

    User->>Gen: ClusterLogForwarder with sourceType + payloadKey
    Gen->>Gen: Evaluate config: sourceType and payloadKey both set?
    alt Both sourceType and payloadKey set
        Gen->>Vec: Generate payloadKeysourceTypeTmpl remap
    else Only payloadKey set
        Gen->>Vec: Generate payloadKeyTmpl remap
    else Neither set
        Gen->>Vec: Generate default sourcetype="_json"
    end
    
    Vec->>Vec: Apply transforms:<br/>1. Parse timestamp<br/>2. Resolve sourceType (label/static)<br/>3. Extract payloadKey to top-level<br/>4. Set ._internal.splunk.sourcetype
    Vec->>Splunk: JSON event with templated<br/>source/sourcetype from ._internal
    Splunk->>Splunk: HEC receives sourcetype field
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Microshift Test Compatibility ⚠️ Warning New Ginkgo e2e tests use observability.openshift.io ClusterLogForwarder API, unavailable on MicroShift. Tests lack protective markers. Add [apigroup:observability.openshift.io] tags to test entries or [Skipped:MicroShift] labels. MicroShift only supports route.openshift.io and security.openshift.io APIs per official documentation.
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main change: making Splunk output sourcetype configurable in the CLF, which aligns with the changeset's primary objective.
Description check ✅ Passed The PR description is comprehensive and well-structured. It explains the intent, provides context (custom source types in Splunk), includes a configuration example, addresses the mandatory review/assignment requirements, and provides relevant JIRA links.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names use static string literals with no dynamic content (generated suffixes, timestamps, UUIDs, or IPs). Test data is properly in test bodies, not titles.
Test Structure And Quality ✅ Passed Tests follow all quality requirements: single responsibility per Entry, proper BeforeEach/AfterEach cleanup, meaningful assertion messages, appropriate timeouts, and consistent with existing patterns.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Tests do not make multi-node assumptions. API validation tests are schema-only validation. Functional test adds metadata handling validation without topology dependencies.
Topology-Aware Scheduling Compatibility ✅ Passed No operator deployment manifests, pod affinity rules, or topology assumptions changed. PR only adds sourceType field to Splunk API, CRD schemas, and config templates.
Ote Binary Stdout Contract ✅ Passed No process-level code writes to stdout. All changes are API definitions, YAML configs, documentation, TOML, or test code with stdout confined to test blocks.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No IPv4 assumptions or external connectivity requirements detected. Tests use cluster-internal DNS names and url.URL struct (IPv6-safe). All tests compatible with IPv6-only and disconnected networks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/observability/v1/output_types.go`:
- Around line 1359-1362: The kubebuilder validation pattern on the SourceType
field (the struct tag on SourceType string `json:"sourceType,omitempty"`) omits
':' in the static character class causing valid colon-separated sourcetypes to
be rejected; update the +kubebuilder:validation:Pattern annotation to include
':' inside the static class (e.g. add : to `[a-zA-Z0-9-_.\/]`) so values like
"my:custom:sourcetype" validate, and also correct the doc comment/example that
shows `"log4j"` (remove the stray quotes) so examples match the intended valid
values.

In `@bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml`:
- Line 3912: The example value currently shows a quoted string `"log4j"` which
violates the declared regex expecting an unquoted literal; update the example
list item to use an unquoted entry (log4j) so it matches the validation pattern
and remove the surrounding quotes from the value shown in the manifest example
for the ClusterLogForwarder configuration.
- Line 3919: The sourceType validation pattern is too restrictive and disallows
colons (:) causing valid Splunk sourcetypes like "my:custom:sourcetype" to be
rejected; update the regex used for the sourceType field (the pattern shown) to
include ":" in the allowed character class (i.e., add ":" to the character class
that currently contains a-zA-Z0-9-_.\/) so static values with colons pass API
validation while preserving existing escaping/grouping logic.

In `@config/crd/bases/observability.openshift.io_clusterlogforwarders.yaml`:
- Around line 3910-3913: The example value for the sourceType field contains
quotes which violate the field's validation regex; update the example in the
ClusterLogForwarder CRD (the `sourceType` example lines shown as `2. "log4j"`)
to remove the quotes so it matches the pattern (e.g., change `2. "log4j"` to `2.
log4j`) and ensure any other static-literal examples in that `sourceType`
example block follow the same unquoted format.

In `@docs/features/logforwarding/outputs/splunk-forwarding.adoc`:
- Around line 57-60: Update the splunk-forwarding docs to clarify that
sourceType defaults to `_json` only when payloadKey is not set and otherwise
resolves to `generic_single_line` (adjust text near `sourceType` and the
conditional description around `payloadKey`), standardize the naming to use
`splunk/sourcetype` everywhere (replace `splunk_sourcetype` occurrences), and
fix the grammar in the `compression` line to read “available are: `none`,
`gzip`.” Ensure references to `payloadKey`, `sourceType`, and `compression` are
consistent across the other referenced sections (lines 76–78, 97–98, 245–249).

In `@test/e2e/collection/apivalidations/splunk-sourcetype.yaml`:
- Around line 22-23: The fixture in splunk-sourcetype.yaml sets sourceType
(sourceType: log4j) but omits the required cross-field payloadKey, causing CRD
validation to fail; update the manifest to include a payloadKey entry paired
with the existing sourceType (i.e., add a payloadKey field alongside sourceType)
so it satisfies the Splunk cross-field validation rule.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 03c4129b-d332-454e-b2ff-da63fd0e1a1f

📥 Commits

Reviewing files that changed from the base of the PR and between ff06c7a and 48c7394.

📒 Files selected for processing (18)
  • api/observability/v1/output_types.go
  • bundle/manifests/cluster-logging.clusterserviceversion.yaml
  • bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml
  • config/crd/bases/observability.openshift.io_clusterlogforwarders.yaml
  • config/manifests/bases/cluster-logging.clusterserviceversion.yaml
  • docs/features/logforwarding/outputs/splunk-forwarding.adoc
  • internal/generator/vector/output/splunk/splunk.go
  • internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml
  • internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.toml
  • internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.toml
  • internal/generator/vector/output/splunk/splunk_test.go
  • test/e2e/collection/apivalidations/api_validations_test.go
  • test/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yaml
  • test/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype.yaml
  • test/e2e/collection/apivalidations/splunk-payloadkey.yaml
  • test/e2e/collection/apivalidations/splunk-sourcetype.yaml
  • test/e2e/collection/apivalidations/splunk-templated-sourcetype.yaml
  • test/functional/outputs/splunk/forward_to_splunk_metadata_test.go
💤 Files with no reviewable changes (1)
  • internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml

Comment on lines +1359 to +1362
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"}
SourceType string `json:"sourceType,omitempty"`
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SourceType pattern validation rejects colon — blocks the primary Splunk naming convention.

The pattern's static character class [a-zA-Z0-9-_.\/] does not include : (colon). The old Splunk style uses underscore separators (e.g., access_combined) while the new style uses colon separators (e.g., ibm:ldap:audit), and the tradition is to use a single colon to denote the hierarchical levels from least specific to most specific — the software product is listed first, then the specific component of the product. Real-world examples like cisco:esa:textmail, zeek:conn:json, and the PR description's own example my:custom:sourcetype would all be rejected by the current pattern.

A user wanting a static colon-separated sourcetype has no clean workaround: my:custom:sourcetype fails validation, and the template syntax cannot encode a bare static string with colons either.

🐛 Proposed fix — add : to the static character class
-	// +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`
+	// +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.:\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`

Also note that example 2 in the doc comment ("log4j" with literal double-quotes) would not match the pattern's static portion either. The intended value is simply log4j (no quotes), which does pass. The comment is misleading and should be corrected.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"}
SourceType string `json:"sourceType,omitempty"`
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.:\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"}
SourceType string `json:"sourceType,omitempty"`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/observability/v1/output_types.go` around lines 1359 - 1362, The
kubebuilder validation pattern on the SourceType field (the struct tag on
SourceType string `json:"sourceType,omitempty"`) omits ':' in the static
character class causing valid colon-separated sourcetypes to be rejected; update
the +kubebuilder:validation:Pattern annotation to include ':' inside the static
class (e.g. add : to `[a-zA-Z0-9-_.\/]`) so values like "my:custom:sourcetype"
validate, and also correct the doc comment/example that shows `"log4j"` (remove
the stray quotes) so examples match the intended valid values.

✅ Addressed in commit 17bcba0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vparfonov @Clee2691 This pattern matcher is restricted by VRL. Do we have any idea if VRL would allow us to add a colon for this type? Given our template utilities it may complicate the other fields and we may need to think about how we would accept this suggestion?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment thread bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml Outdated
Comment thread bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml Outdated
Comment thread docs/features/logforwarding/outputs/splunk-forwarding.adoc Outdated
Comment thread test/e2e/collection/apivalidations/splunk-sourcetype.yaml Outdated
@marpears
Copy link
Copy Markdown
Author

marpears commented May 6, 2026

/retest

@marpears
Copy link
Copy Markdown
Author

marpears commented May 6, 2026

Hi @jcantrill, I think it would be worth extending this feature with a special option of sourceType: none which does not set sourcetype in the Splunk sink config in Vector. This is to handle a case where a user may wish for a source type to be enforced at the Splunk side.

According to the Vector docs, it does not mandate sourcetype. Splunk will use a sourcetype to httpevent if not set, so using sourceType: none will come with a warning that the user must configure this suitably at the Splunk side.

Do you agree?

@jcantrill
Copy link
Copy Markdown
Contributor

Hi @jcantrill, I think it would be worth extending this feature with a special option of sourceType: none which does not set sourcetype in the Splunk sink config in Vector. This is to handle a case where a user may wish for a source type to be enforced at the Splunk side.

What does it mean to enforce it on the the receiver side? What is the receiver behavior? Does it still accept the logs forwarded by the collector?

@marpears
Copy link
Copy Markdown
Author

marpears commented May 7, 2026

Hi @jcantrill, I think it would be worth extending this feature with a special option of sourceType: none which does not set sourcetype in the Splunk sink config in Vector. This is to handle a case where a user may wish for a source type to be enforced at the Splunk side.

What does it mean to enforce it on the the receiver side? What is the receiver behavior? Does it still accept the logs forwarded by the collector?

Hi @jcantrill, I've done a test which confirmed that when the collector doesn't define a source type, Splunk accepts the log event and uses a default source type of httpevent.

But on reflection, I think the complexities and use case of a special option to not set the source type in the collector needs some further thought. I do not want to detract from the core objective of this PR, which was to allow for a user-defined source type in the collector, so I'll park this idea.

@jcantrill
Copy link
Copy Markdown
Contributor

Hi @jcantrill, I think it would be worth extending this feature with a special option of sourceType: none which does not set sourcetype in the Splunk sink config in Vector. This is to handle a case where a user may wish for a source type to be enforced at the Splunk side.

But on reflection, I think the complexities and use case of a special option to not set the source type in the collector needs some further thought. I do not want to detract from the core objective of this PR, which was to allow for a user-defined source type in the collector, so I'll park this idea.

We discussed this in a team meeting and I do also don't like the idea of a "magic" word. My suggested alternative was to test an empty string. What does that mean to splunk? I am assuming that vector will send an empty string if configured which is different than configurating nothing and getting 'httpevent'

@marpears
Copy link
Copy Markdown
Author

marpears commented May 8, 2026

Hi @jcantrill, I think it would be worth extending this feature with a special option of sourceType: none which does not set sourcetype in the Splunk sink config in Vector. This is to handle a case where a user may wish for a source type to be enforced at the Splunk side.

But on reflection, I think the complexities and use case of a special option to not set the source type in the collector needs some further thought. I do not want to detract from the core objective of this PR, which was to allow for a user-defined source type in the collector, so I'll park this idea.

We discussed this in a team meeting and I do also don't like the idea of a "magic" word. My suggested alternative was to test an empty string. What does that mean to splunk? I am assuming that vector will send an empty string if configured which is different than configurating nothing and getting 'httpevent'

I've given that scenario a try, with the source type set to an empty string in the collector, it resulted in Splunk using a source type of an empty string.

I did another test to confirm the source type needs to be completely omitted from the sink output config in the collector for Splunk to assign a source type. My test was done with a simple unstructured log message, and Splunk assigned the source type of httpevent. I may repeat that using a log4j style log message to see if Splunk's automatic source type matching assigns it to its pretrained log4j source type rather than using httpevent.

@jcantrill
Copy link
Copy Markdown
Contributor

I did another test to confirm the source type needs to be completely omitted from the sink output config in the collector for Splunk to assign a source type. My test was done with a simple unstructured log message, and Splunk assigned the source type of httpevent. I may repeat that using a log4j style log message to see if Splunk's automatic source type matching assigns it to its pretrained log4j source type rather than using httpevent.

I believe I interpret your test as:

  • exclude sourceType config and vector sends 'httpEvent'
  • config sourceType to "" and vector sends ""

I am having difficulty finding reference documentation that may explicitly state whether sourceType is required but did find some documentation that advises to use the enterprise interface to modify the receiver config. My interpretation is an admin can fix it after the fact. Given my opinions regarding the need to set this value in context of various workloads on a kubernetes cluster, I would lean towards disregarding the collector configuration you proposed regarding 'none'. Admins have several options with the existing ClusterLogForwarder solution that should meet the broader needs:

  • Set it to an explicit value
  • Set it a templated value that is populated based on pod label
  • Set it to "" and figure out how to adjust it on the receiver side

The more urgent issue is determining if we can allow : in the sourceType pattern since this seems to be a pattern that is common for splunk. I believe we can but it requires testing to ensure that is allowed using VRL. It may also require adjustment for some of our helper functions.

@marpears marpears force-pushed the splunk-sourcetype branch from 4e7fcdf to 0b9d504 Compare May 13, 2026 16:26
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/features/logforwarding/outputs/splunk-forwarding.adoc`:
- Line 76: Fix the typo "stuctured" to "structured" in the sentence that
describes the `sourceType` field being defined in the CLF as a templated field
(the line mentioning "The `sourceType` field can be defined in the CLF as a
templated field to get its value from the stuctured log event, such as a pod
label."). Update that single word so the sentence reads "structured log event".

In `@test/e2e/collection/apivalidations/splunk-templated-source-1.yaml`:
- Line 22: Update the test fixture to use the new Splunk output field name:
replace the templated key "source" with "sourceType" (i.e., change source:
'{.kubernetes.labels."splunk/source"||"generic_single_line"}' to use sourceType)
and update the label reference to match the intended field ("splunk/sourcetype")
so the fixture validates the templated sourceType behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 12db8fff-31b0-44fb-ac08-6209b61c0ac0

📥 Commits

Reviewing files that changed from the base of the PR and between 48c7394 and 0b9d504.

📒 Files selected for processing (22)
  • api/observability/v1/output_types.go
  • bundle/manifests/cluster-logging.clusterserviceversion.yaml
  • bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml
  • config/crd/bases/observability.openshift.io_clusterlogforwarders.yaml
  • config/manifests/bases/cluster-logging.clusterserviceversion.yaml
  • docs/features/logforwarding/outputs/splunk-forwarding.adoc
  • internal/generator/vector/output/splunk/splunk.go
  • internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml
  • internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.toml
  • internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.toml
  • internal/generator/vector/output/splunk/splunk_test.go
  • test/e2e/collection/apivalidations/api_validations_test.go
  • test/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yaml
  • test/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-1.yaml
  • test/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-2.yaml
  • test/e2e/collection/apivalidations/splunk-payloadkey.yaml
  • test/e2e/collection/apivalidations/splunk-source.yaml
  • test/e2e/collection/apivalidations/splunk-sourcetype.yaml
  • test/e2e/collection/apivalidations/splunk-templated-source-1.yaml
  • test/e2e/collection/apivalidations/splunk-templated-source-2.yaml
  • test/e2e/collection/apivalidations/splunk-templated-sourcetype.yaml
  • test/functional/outputs/splunk/forward_to_splunk_metadata_test.go
💤 Files with no reviewable changes (1)
  • internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml
🚧 Files skipped from review as they are similar to previous changes (12)
  • internal/generator/vector/output/splunk/splunk_test.go
  • bundle/manifests/cluster-logging.clusterserviceversion.yaml
  • test/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yaml
  • config/manifests/bases/cluster-logging.clusterserviceversion.yaml
  • test/e2e/collection/apivalidations/api_validations_test.go
  • api/observability/v1/output_types.go
  • test/e2e/collection/apivalidations/splunk-payloadkey.yaml
  • internal/generator/vector/output/splunk/splunk.go
  • test/e2e/collection/apivalidations/splunk-templated-sourcetype.yaml
  • internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.toml
  • internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.toml
  • bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml

Comment thread docs/features/logforwarding/outputs/splunk-forwarding.adoc Outdated
token:
key: hecToken
secretName: splunk-myapp
source: '{.kubernetes.labels."splunk/source"||"generic_single_line"}'
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Incorrect field name: should be sourceType not source.

The PR introduces a new sourceType field for Splunk outputs, but this test fixture uses source instead. In Splunk terminology, source (event source/origin) and sourceType (data format type) are distinct fields. Since this test fixture is meant to validate the new templated sourceType functionality, the field name must be corrected.

🔧 Proposed fix
-        source: '{.kubernetes.labels."splunk/source"||"generic_single_line"}'
+        sourceType: '{.kubernetes.labels."splunk/sourcetype"||"generic_single_line"}'

Also consider updating the label reference from "splunk/source" to "splunk/sourcetype" to match the intended field being tested.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
source: '{.kubernetes.labels."splunk/source"||"generic_single_line"}'
sourceType: '{.kubernetes.labels."splunk/sourcetype"||"generic_single_line"}'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/collection/apivalidations/splunk-templated-source-1.yaml` at line
22, Update the test fixture to use the new Splunk output field name: replace the
templated key "source" with "sourceType" (i.e., change source:
'{.kubernetes.labels."splunk/source"||"generic_single_line"}' to use sourceType)
and update the label reference to match the intended field ("splunk/sourcetype")
so the fixture validates the templated sourceType behavior.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new api validation test to fill a gap where the current "source" field didn't have any api validation tests.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marpears, thanks for the clarification! You're right — this fixture is intentionally covering the existing source field with API validation tests that were previously missing. My comment was incorrect; the use of source here is valid and the test serves a distinct purpose from the sourceType fixtures. Apologies for the confusion!

@marpears
Copy link
Copy Markdown
Author

I did another test to confirm the source type needs to be completely omitted from the sink output config in the collector for Splunk to assign a source type. My test was done with a simple unstructured log message, and Splunk assigned the source type of httpevent. I may repeat that using a log4j style log message to see if Splunk's automatic source type matching assigns it to its pretrained log4j source type rather than using httpevent.

I believe I interpret your test as:

  • exclude sourceType config and vector sends 'httpEvent'
  • config sourceType to "" and vector sends ""

I am having difficulty finding reference documentation that may explicitly state whether sourceType is required but did find some documentation that advises to use the enterprise interface to modify the receiver config. My interpretation is an admin can fix it after the fact. Given my opinions regarding the need to set this value in context of various workloads on a kubernetes cluster, I would lean towards disregarding the collector configuration you proposed regarding 'none'. Admins have several options with the existing ClusterLogForwarder solution that should meet the broader needs:

  • Set it to an explicit value
  • Set it a templated value that is populated based on pod label
  • Set it to "" and figure out how to adjust it on the receiver side

The more urgent issue is determining if we can allow : in the sourceType pattern since this seems to be a pattern that is common for splunk. I believe we can but it requires testing to ensure that is allowed using VRL. It may also require adjustment for some of our helper functions.

Hi @jcantrill,

Just to clarify the previous test:

  • exclude sourcetype config and Splunk defaults to 'httpevent'. According to the Vector docs, Splunk defaults to 'httpevent' if sourcetype is not defined by Vector, rather than Vector sending 'httpevent'.
  • config sourcetype to "" and Vector sends ""

That aside, I have made further changes to allow : in sourceType and also in source so a consistent separator pattern is permitted in these fields. Along with this I have created/updated api validation and functional tests. I didn't encounter any VRL issues with colon chars when exercising the change with the functional tests which cover a range of scenarios. The use of : is not permitted by Kubernetes in pod label values, so any instances of : have to be set directly in the CLF within the sourceType or source fields. I've added a note in the docs.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@marpears: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants