OBSDA-1383: Make Splunk output sourcetype configurable in CLF#3251
OBSDA-1383: Make Splunk output sourcetype configurable in CLF#3251marpears wants to merge 10 commits into
Conversation
|
/hold |
jcantrill
left a comment
There was a problem hiding this comment.
There is no functional test to support this addition. Also note we do not backport features to earlier releases
jcantrill
left a comment
There was a problem hiding this comment.
Please add e2e validation tests and at least one functional test
|
/ok-to-test |
|
@marpears: This pull request references OBSDA-1383 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature to target the "4.8.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
@marpears You will need to rebase and force push. |
1fc8d94 to
b04ac06
Compare
|
/retest |
|
/label tide/squash-merge-method |
|
@jcantrill: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/label tide/merge-method-squash |
b04ac06 to
c55ba44
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcantrill, marpears The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c55ba44 to
48c7394
Compare
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR adds support for configurable ChangesSplunk SourceType Support
Sequence DiagramsequenceDiagram
participant User as User/CRD
participant Gen as Generator
participant Vec as Vector
participant Splunk as Splunk HEC
User->>Gen: ClusterLogForwarder with sourceType + payloadKey
Gen->>Gen: Evaluate config: sourceType and payloadKey both set?
alt Both sourceType and payloadKey set
Gen->>Vec: Generate payloadKeysourceTypeTmpl remap
else Only payloadKey set
Gen->>Vec: Generate payloadKeyTmpl remap
else Neither set
Gen->>Vec: Generate default sourcetype="_json"
end
Vec->>Vec: Apply transforms:<br/>1. Parse timestamp<br/>2. Resolve sourceType (label/static)<br/>3. Extract payloadKey to top-level<br/>4. Set ._internal.splunk.sourcetype
Vec->>Splunk: JSON event with templated<br/>source/sourcetype from ._internal
Splunk->>Splunk: HEC receives sourcetype field
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/observability/v1/output_types.go`:
- Around line 1359-1362: The kubebuilder validation pattern on the SourceType
field (the struct tag on SourceType string `json:"sourceType,omitempty"`) omits
':' in the static character class causing valid colon-separated sourcetypes to
be rejected; update the +kubebuilder:validation:Pattern annotation to include
':' inside the static class (e.g. add : to `[a-zA-Z0-9-_.\/]`) so values like
"my:custom:sourcetype" validate, and also correct the doc comment/example that
shows `"log4j"` (remove the stray quotes) so examples match the intended valid
values.
In `@bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml`:
- Line 3912: The example value currently shows a quoted string `"log4j"` which
violates the declared regex expecting an unquoted literal; update the example
list item to use an unquoted entry (log4j) so it matches the validation pattern
and remove the surrounding quotes from the value shown in the manifest example
for the ClusterLogForwarder configuration.
- Line 3919: The sourceType validation pattern is too restrictive and disallows
colons (:) causing valid Splunk sourcetypes like "my:custom:sourcetype" to be
rejected; update the regex used for the sourceType field (the pattern shown) to
include ":" in the allowed character class (i.e., add ":" to the character class
that currently contains a-zA-Z0-9-_.\/) so static values with colons pass API
validation while preserving existing escaping/grouping logic.
In `@config/crd/bases/observability.openshift.io_clusterlogforwarders.yaml`:
- Around line 3910-3913: The example value for the sourceType field contains
quotes which violate the field's validation regex; update the example in the
ClusterLogForwarder CRD (the `sourceType` example lines shown as `2. "log4j"`)
to remove the quotes so it matches the pattern (e.g., change `2. "log4j"` to `2.
log4j`) and ensure any other static-literal examples in that `sourceType`
example block follow the same unquoted format.
In `@docs/features/logforwarding/outputs/splunk-forwarding.adoc`:
- Around line 57-60: Update the splunk-forwarding docs to clarify that
sourceType defaults to `_json` only when payloadKey is not set and otherwise
resolves to `generic_single_line` (adjust text near `sourceType` and the
conditional description around `payloadKey`), standardize the naming to use
`splunk/sourcetype` everywhere (replace `splunk_sourcetype` occurrences), and
fix the grammar in the `compression` line to read “available are: `none`,
`gzip`.” Ensure references to `payloadKey`, `sourceType`, and `compression` are
consistent across the other referenced sections (lines 76–78, 97–98, 245–249).
In `@test/e2e/collection/apivalidations/splunk-sourcetype.yaml`:
- Around line 22-23: The fixture in splunk-sourcetype.yaml sets sourceType
(sourceType: log4j) but omits the required cross-field payloadKey, causing CRD
validation to fail; update the manifest to include a payloadKey entry paired
with the existing sourceType (i.e., add a payloadKey field alongside sourceType)
so it satisfies the Splunk cross-field validation rule.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 03c4129b-d332-454e-b2ff-da63fd0e1a1f
📒 Files selected for processing (18)
api/observability/v1/output_types.gobundle/manifests/cluster-logging.clusterserviceversion.yamlbundle/manifests/observability.openshift.io_clusterlogforwarders.yamlconfig/crd/bases/observability.openshift.io_clusterlogforwarders.yamlconfig/manifests/bases/cluster-logging.clusterserviceversion.yamldocs/features/logforwarding/outputs/splunk-forwarding.adocinternal/generator/vector/output/splunk/splunk.gointernal/generator/vector/output/splunk/splunk_sink_payloadkey.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_test.gotest/e2e/collection/apivalidations/api_validations_test.gotest/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yamltest/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype.yamltest/e2e/collection/apivalidations/splunk-payloadkey.yamltest/e2e/collection/apivalidations/splunk-sourcetype.yamltest/e2e/collection/apivalidations/splunk-templated-sourcetype.yamltest/functional/outputs/splunk/forward_to_splunk_metadata_test.go
💤 Files with no reviewable changes (1)
- internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml
| // +kubebuilder:validation:Optional | ||
| // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$` | ||
| // +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | ||
| SourceType string `json:"sourceType,omitempty"` |
There was a problem hiding this comment.
SourceType pattern validation rejects colon — blocks the primary Splunk naming convention.
The pattern's static character class [a-zA-Z0-9-_.\/] does not include : (colon). The old Splunk style uses underscore separators (e.g., access_combined) while the new style uses colon separators (e.g., ibm:ldap:audit), and the tradition is to use a single colon to denote the hierarchical levels from least specific to most specific — the software product is listed first, then the specific component of the product. Real-world examples like cisco:esa:textmail, zeek:conn:json, and the PR description's own example my:custom:sourcetype would all be rejected by the current pattern.
A user wanting a static colon-separated sourcetype has no clean workaround: my:custom:sourcetype fails validation, and the template syntax cannot encode a bare static string with colons either.
🐛 Proposed fix — add : to the static character class
- // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`
+ // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.:\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$`Also note that example 2 in the doc comment ("log4j" with literal double-quotes) would not match the pattern's static portion either. The intended value is simply log4j (no quotes), which does pass. The comment is misleading and should be corrected.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // +kubebuilder:validation:Optional | |
| // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$` | |
| // +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | |
| SourceType string `json:"sourceType,omitempty"` | |
| // +kubebuilder:validation:Optional | |
| // +kubebuilder:validation:Pattern:=`^(([a-zA-Z0-9-_.:\/])*(\{(\.[a-zA-Z0-9_]+|\."[^"]+")+((\|\|)(\.[a-zA-Z0-9_]+|\.?"[^"]+")+)*\|\|"[^"]*"\})*)*$` | |
| // +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | |
| SourceType string `json:"sourceType,omitempty"` |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/observability/v1/output_types.go` around lines 1359 - 1362, The
kubebuilder validation pattern on the SourceType field (the struct tag on
SourceType string `json:"sourceType,omitempty"`) omits ':' in the static
character class causing valid colon-separated sourcetypes to be rejected; update
the +kubebuilder:validation:Pattern annotation to include ':' inside the static
class (e.g. add : to `[a-zA-Z0-9-_.\/]`) so values like "my:custom:sourcetype"
validate, and also correct the doc comment/example that shows `"log4j"` (remove
the stray quotes) so examples match the intended valid values.
✅ Addressed in commit 17bcba0
There was a problem hiding this comment.
@vparfonov @Clee2691 This pattern matcher is restricted by VRL. Do we have any idea if VRL would allow us to add a colon for this type? Given our template utilities it may complicate the other fields and we may need to think about how we would accept this suggestion?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
|
/retest |
|
Hi @jcantrill, I think it would be worth extending this feature with a special option of According to the Vector docs, it does not mandate sourcetype. Splunk will use a sourcetype to httpevent if not set, so using Do you agree? |
What does it mean to enforce it on the the receiver side? What is the receiver behavior? Does it still accept the logs forwarded by the collector? |
Hi @jcantrill, I've done a test which confirmed that when the collector doesn't define a source type, Splunk accepts the log event and uses a default source type of But on reflection, I think the complexities and use case of a special option to not set the source type in the collector needs some further thought. I do not want to detract from the core objective of this PR, which was to allow for a user-defined source type in the collector, so I'll park this idea. |
We discussed this in a team meeting and I do also don't like the idea of a "magic" word. My suggested alternative was to test an empty string. What does that mean to splunk? I am assuming that vector will send an empty string if configured which is different than configurating nothing and getting 'httpevent' |
I've given that scenario a try, with the source type set to an empty string in the collector, it resulted in Splunk using a source type of an empty string. I did another test to confirm the source type needs to be completely omitted from the sink output config in the collector for Splunk to assign a source type. My test was done with a simple unstructured log message, and Splunk assigned the source type of |
I believe I interpret your test as:
I am having difficulty finding reference documentation that may explicitly state whether sourceType is required but did find some documentation that advises to use the enterprise interface to modify the receiver config. My interpretation is an admin can fix it after the fact. Given my opinions regarding the need to set this value in context of various workloads on a kubernetes cluster, I would lean towards disregarding the collector configuration you proposed regarding 'none'. Admins have several options with the existing ClusterLogForwarder solution that should meet the broader needs:
The more urgent issue is determining if we can allow |
… of sourceType use to docs
…e defined. Docs updated. Added api validation, e2e and functional tests.
…abel. Added an API validation test where sourceType is templated, and fixed single quote bug in API validation tests during oc apply.
…api validation and functional tests
4e7fcdf to
0b9d504
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/features/logforwarding/outputs/splunk-forwarding.adoc`:
- Line 76: Fix the typo "stuctured" to "structured" in the sentence that
describes the `sourceType` field being defined in the CLF as a templated field
(the line mentioning "The `sourceType` field can be defined in the CLF as a
templated field to get its value from the stuctured log event, such as a pod
label."). Update that single word so the sentence reads "structured log event".
In `@test/e2e/collection/apivalidations/splunk-templated-source-1.yaml`:
- Line 22: Update the test fixture to use the new Splunk output field name:
replace the templated key "source" with "sourceType" (i.e., change source:
'{.kubernetes.labels."splunk/source"||"generic_single_line"}' to use sourceType)
and update the label reference to match the intended field ("splunk/sourcetype")
so the fixture validates the templated sourceType behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 12db8fff-31b0-44fb-ac08-6209b61c0ac0
📒 Files selected for processing (22)
api/observability/v1/output_types.gobundle/manifests/cluster-logging.clusterserviceversion.yamlbundle/manifests/observability.openshift.io_clusterlogforwarders.yamlconfig/crd/bases/observability.openshift.io_clusterlogforwarders.yamlconfig/manifests/bases/cluster-logging.clusterserviceversion.yamldocs/features/logforwarding/outputs/splunk-forwarding.adocinternal/generator/vector/output/splunk/splunk.gointernal/generator/vector/output/splunk/splunk_sink_payloadkey.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.tomlinternal/generator/vector/output/splunk/splunk_test.gotest/e2e/collection/apivalidations/api_validations_test.gotest/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yamltest/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-1.yamltest/e2e/collection/apivalidations/splunk-payloadkey-and-templated-sourcetype-2.yamltest/e2e/collection/apivalidations/splunk-payloadkey.yamltest/e2e/collection/apivalidations/splunk-source.yamltest/e2e/collection/apivalidations/splunk-sourcetype.yamltest/e2e/collection/apivalidations/splunk-templated-source-1.yamltest/e2e/collection/apivalidations/splunk-templated-source-2.yamltest/e2e/collection/apivalidations/splunk-templated-sourcetype.yamltest/functional/outputs/splunk/forward_to_splunk_metadata_test.go
💤 Files with no reviewable changes (1)
- internal/generator/vector/output/splunk/splunk_sink_payloadkey.toml
🚧 Files skipped from review as they are similar to previous changes (12)
- internal/generator/vector/output/splunk/splunk_test.go
- bundle/manifests/cluster-logging.clusterserviceversion.yaml
- test/e2e/collection/apivalidations/splunk-payloadkey-and-sourcetype.yaml
- config/manifests/bases/cluster-logging.clusterserviceversion.yaml
- test/e2e/collection/apivalidations/api_validations_test.go
- api/observability/v1/output_types.go
- test/e2e/collection/apivalidations/splunk-payloadkey.yaml
- internal/generator/vector/output/splunk/splunk.go
- test/e2e/collection/apivalidations/splunk-templated-sourcetype.yaml
- internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_static_sourcetype.toml
- internal/generator/vector/output/splunk/splunk_sink_with_payloadkey_and_sourcetype.toml
- bundle/manifests/observability.openshift.io_clusterlogforwarders.yaml
| token: | ||
| key: hecToken | ||
| secretName: splunk-myapp | ||
| source: '{.kubernetes.labels."splunk/source"||"generic_single_line"}' |
There was a problem hiding this comment.
Incorrect field name: should be sourceType not source.
The PR introduces a new sourceType field for Splunk outputs, but this test fixture uses source instead. In Splunk terminology, source (event source/origin) and sourceType (data format type) are distinct fields. Since this test fixture is meant to validate the new templated sourceType functionality, the field name must be corrected.
🔧 Proposed fix
- source: '{.kubernetes.labels."splunk/source"||"generic_single_line"}'
+ sourceType: '{.kubernetes.labels."splunk/sourcetype"||"generic_single_line"}'Also consider updating the label reference from "splunk/source" to "splunk/sourcetype" to match the intended field being tested.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| source: '{.kubernetes.labels."splunk/source"||"generic_single_line"}' | |
| sourceType: '{.kubernetes.labels."splunk/sourcetype"||"generic_single_line"}' |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/collection/apivalidations/splunk-templated-source-1.yaml` at line
22, Update the test fixture to use the new Splunk output field name: replace the
templated key "source" with "sourceType" (i.e., change source:
'{.kubernetes.labels."splunk/source"||"generic_single_line"}' to use sourceType)
and update the label reference to match the intended field ("splunk/sourcetype")
so the fixture validates the templated sourceType behavior.
There was a problem hiding this comment.
This is a new api validation test to fill a gap where the current "source" field didn't have any api validation tests.
There was a problem hiding this comment.
@marpears, thanks for the clarification! You're right — this fixture is intentionally covering the existing source field with API validation tests that were previously missing. My comment was incorrect; the use of source here is valid and the test serves a distinct purpose from the sourceType fixtures. Apologies for the confusion!
Hi @jcantrill, Just to clarify the previous test:
That aside, I have made further changes to allow |
|
@marpears: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
This PR allows for configuration of the Splunk output source type in the CLF using a new
sourceTypefield. This is so that we can support users who have defined custom source types in Splunk.The
sourceTypefield can be used only whenpayloadKeyis defined, and allows for a templated value so that it can be retrieved from a pod label. IfsourceTypeis not defined, the current logic is preserved where it defaults to _json. If usingpayloadKeywithoutsourceType, the source type used will be either _json or generic_single_line, depending on the structure of the final event payload.CLF configuration example:
/cc @Clee2691 @cahartma
/assign @jcantrill
Links
Summary by CodeRabbit
Release Notes
New Features
sourceTypeconfiguration option for Splunk outputs, allowing explicit specification of the Splunk HEC source type. Supports static values and dynamic templating from Kubernetes labels.sourceTypeto be set only whenpayloadKeyis also defined.Documentation
sourceTypeexamples, default behavior guidance, and best practices.