Skip to content

Conversation

@franciscovalentecastro
Copy link
Contributor

@franciscovalentecastro franciscovalentecastro commented Jan 21, 2026

Description

Create otel.OTelLogs exporter to have a separate configuration for Otel Logging.

Details

Why do we need a separate exporter ?

  • The sending_queue configurations set in this PR and followup changes for "Log Buffering" (b/394636880) are incompatible with metrics exporting (for example, we can't sent point out of order from the queue).

Why put the BatchProcessor at the end of the pipeline ?

  • We are deferring all details that are "logs exporter" specific and removing them from "ReceiverPipeline" logic. This enables us to swap the exporter without having to configure ReceiverPipelines separately.

Why do we set timeout = 60s to the OTelLogs exporter ?

Why choose grpc_pool_size = 10, num_consumers = 20 and send_batch_max_size = 1000?

  • I chose similar values from the "Performance Tuning" guide in [3] while also validating the exporter will reach 40K/s throughput. We can discuss further.

[1] https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/main/exporter/collector/logs.go#L385

[2] https://github.com/googleapis/google-cloud-go/blob/main/logging/logging.go#L294

[3] https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/tree/main/exporter/collector#performance-tuning

Related issue

b/477325045

How has this been tested?

Existing confgenerator, transformation and integration tests.

Checklist:

  • Unit tests
    • Unit tests do not apply.
    • Unit tests have been added/modified and passed for this PR.
  • Integration tests
    • Integration tests do not apply.
    • Integration tests have been added/modified and passed for this PR.
  • Documentation
    • This PR introduces no user visible changes.
    • This PR introduces user visible changes and the corresponding documentation change has been made.
  • Minor version bump
    • This PR introduces no new features.
    • This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
    • This PR bumps the version.

@franciscovalentecastro franciscovalentecastro changed the title [confgenerator] Create otel.Logs exporter to have a separate configuration for logs. [confgenerator] Create otel.Logs exporter to configure separately for Otel Logging. Jan 21, 2026
@franciscovalentecastro franciscovalentecastro changed the title [confgenerator] Create otel.Logs exporter to configure separately for Otel Logging. [confgenerator] Create otel.OTelLogs exporter to configure separately for Otel Logging. Jan 21, 2026
@franciscovalentecastro franciscovalentecastro changed the title [confgenerator] Create otel.OTelLogs exporter to configure separately for Otel Logging. [confgenerator] Create otel.OTelLogs exporter for Otel Logging. Jan 21, 2026
}

func Batch() Component {
func BatchLogsExporter() Component {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called BatchLogsExporter instead of BatchLogsProcessor? (Ditto for the other function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think BatchLogsProcessor is better, i'll change it. I guess i was just overly focused on this processor being related to the Logs Exporter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created only one BatchProcessor(sendBatchSize, sendBatchMaxSize int, timeout string) now.

@@ -158,10 +168,11 @@ func (uc *UnifiedConfig) GenerateOtelConfig(ctx context.Context, outDir string)
Pipelines: pipelines,
Extensions: extensions,
Exporters: map[otel.ExporterType]otel.Component{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is where we should be putting the processors, instead of adding magic conditionals inside Generate. That will also make it easy to apply different batching to different signal types. Maybe we can use a type like:

type ExporterComponents struct {
  ProcessorsByType map[string][]otel.Component
  Exporter otel.Component
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the following type and updated ModularConfig accordingly.

type ExporterComponent struct {
	Exporter   Component
	Processors []Component
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting the type be called ExporterComponents (because it is the set of components needed to implement the exporter), not that the map be renamed to ExporterComponents.

Also, you have a single slice of processors, instead of a different slice of processors for each type. This means you can't apply different batching for metrics, logs, and trace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding more details, now its clear for me the goal of your first ExporterComponents proposed idea. This is now the implemented ExporterComponents :

type ExporterComponents struct {
	Exporter         Component
	ProcessorsByType map[string][]Component
}

Comment on lines 55 to 64
if logsExporter {
config["log"] = map[string]any{
"grpc_pool_size": 20,
}
config["sending_queue"] = map[string]any{
"enabled": true,
"num_consumers": 40,
}
config["timeout"] = "60s"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance tuning guide says "There is no guaranteed one-size-fits-all combination of settings, and for the majority of use cases the defaults will provide more than sufficient throughput."

Why are the defaults insufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default config for the exporter only reaches 15K/s logs throughput in our Soak Tests. Fluent-bit reaches 40K/s logs throughput. The configs grpc_pool_size = 20 and num_consumers = 40 achieve this goal. I used the values suggested in the "Performance tuning guide".

timeout = 60s is to give the GCL Go SDK a longer timeout to do retries. More details in the PR description.

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the parameters to grpc_pool_size = 10 and num_consumers = 20 since those also work to get 40K/s throughput.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you can get higher throughput out of both fluent-bit and otelopscol by changing the number of workers; why isn't 15K/s already sufficent? When we looked at this in April 2024 we concluded that both fluent-bit and otelopscol were constrained by worker count/batch size/API latency, but that both 15K/s and 40K/s were well above the target throughput, so we didn't need to make any changes here.

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quentinmit After syncing offline we decided the target throughput will be 10 K/s logs which can be achieved with the current defaults. I removed grpc_pool_size and num_consumers config and only kept timeout.

As we discussed, if we want a higher default throughput we can configure the defaults upstream 1.

Footnotes

  1. https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.144.0/exporter/googlecloudexporter/README.md?plain=1#L214

"resource_filters": []map[string]interface{}{},
},
func googleCloudExporter(userAgent string, instrumentationLabels, serviceResourceLabels, logsExporter bool) otel.Component {
config := map[string]interface{}{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two boolean parameters was already confusing enough; since this is purely additive config, can we just use a separate function that calls googleCloudExporter and then adds the additional config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized the "Logging Exporter" config is not only additive. Keeping the "metrics" and "userAgent" is unnecessary. I created googleCloudLoggingExporter as a standalone function. Writing it as "derivative" function resulted in more complicated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified further the googleCloudLoggingExporter implementation to only set timeout = 60s.

otel.OTel: googleCloudExporter(userAgent, true, true, false),
otel.OTelLogs: googleCloudExporter(userAgent, true, true, true),
otel.GMP: googleManagedPrometheusExporter(userAgent),
otel.OTLP: otlpExporter(userAgent),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you also need to do this for the otel.OTLP exporter type?

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do.

@XuechunHou Is focusing on the OTLP logs exporter. We've synced and we probably are going to do a very similar set of configs for OTLPLogs which need to happen separately from OTLPMetrics. This distinction will probably happen in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a separate batching for OTLP "logs" and "metrics" in this PR. Done!

exporterTypeProcessors := map[ExporterType]Component{
OTelLogs: BatchLogsExporter(),
// The OTLP exporter doesn't batch by default like the googlecloud.* exporters. We need this to avoid the API point limits.
OTLP: BatchOLTPMetricExporter(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OTLP exporter type can get metrics, traces, and logs. We shouldn't be applying metrics-tuned batching to traces or logs.

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XuechunHou Is focusing on the OTLP logs exporter. We've synced and we probably are going to do a very similar set of configs for OTLPLogs which need to happen separately from OTLPMetrics. This distinction will probably happen in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're introducing the ExporterComponents struct, you are touching the code and you can fix this while you're there. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right :) I added a separate batching for OTLP "logs" and "metrics" in this PR. Done!

@XuechunHou XuechunHou self-requested a review January 22, 2026 20:56
},
otel.OTelLogs: {
Exporter: googleCloudLoggingExporter(userAgent),
Processors: otelLogsExporterProcessors(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inlining the processor list provides better readability; creating a separate function tends to disrupt the reading flow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, although I would probably resolve it by pushing it up into these functions and having them return the ExporterComponent struct directly, so you can them keep this map where it was in GenerateOTelConfig below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved everything to GenerateOTelConfig after the ExporterComponents struct update. PTAL.

Comment on lines 260 to 267
// Processors specific to an exporter type.
if exporterComponents, ok := c.ExporterComponents[exporterType]; ok {
for i, processor := range exporterComponents.Processors {
name := processor.name(fmt.Sprintf("%s_%d", prefix, i))
processorNames = append(processorNames, name)
processors[name] = processor.Config
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These processors should be added after the resource detection processors, not before. And they shouldn't have prefix in their names; they need to be shared across all pipelines that use the exporter (so that e.g. logs and metrics can be batched across receivers). You can do both of these by adding the processors like ~this:

exporter := c.ExporterComponents[exporterType]
if _, ok := exporterNames[exporterType]; !ok {
	name := exporter.Exporter.name(exporterType.Name())
	exporterNames[exporterType] = name
	exporters[name] = exporter.Config
}
for i, processor := range exporter.Processors[pipeline.Type] {
  name := processor.name(fmt.Sprintf("%s_%s_%d", exporterNames[exporterType], pipeline.Type, i))
  processorNames = append(processorNames, name)
  if _, ok := processors[name]; !ok {
    processors[name] = processor.Config
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to adding it after the resourcedetection. They should be appended to end of the pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}

func Batch() Component {
func BatchLogsProcessor() Component {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These only ever get used once, no? Why don't we have both of their definitions inline in modular.go instead of defining new functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the following processor that is reused in several places with different arguments.

func BatchProcessor(sendBatchSize, sendBatchMaxSize int, timeout string) Component {
	return Component{
		Type: "batch",
		Config: map[string]any{
			"send_batch_size":     sendBatchSize,
			"send_batch_max_size": sendBatchMaxSize,
			"timeout":             timeout,
		},
	}
}

},
otel.OTelLogs: {
Exporter: googleCloudLoggingExporter(userAgent),
Processors: otelLogsExporterProcessors(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, although I would probably resolve it by pushing it up into these functions and having them return the ExporterComponent struct directly, so you can them keep this map where it was in GenerateOTelConfig below.

@franciscovalentecastro franciscovalentecastro changed the title [confgenerator] Create otel.OTelLogs exporter for Otel Logging. [confgenerator] Create otel.Logging exporter for Otel Logging. Jan 29, 2026
@franciscovalentecastro franciscovalentecastro added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 30, 2026
@stackdriver-instrumentation-release stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 30, 2026
Copy link
Member

@quentinmit quentinmit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as approved assuming none of the remaining comments need another back-and-forth.

Config: map[string]any{
"project": "my-project",
"sending_queue": map[string]any{
"enabled": false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we disabling the sending queue? Won't this result in effectively num_workers = 1?

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we disabling the sending queue?

I didn't have a reason, since this was implemented in the initial Otel Logging support PR. I was mostly copying the exporter config to the new format.

One of my hypothesis is that "multiple workers" could potentially send the log entries to the mock server in different order, causing changing "goldens" in our transformation tests.

Won't this result in effectively num_workers = 1?

I think that is true.

Exporters: map[otel.ExporterType]otel.ExporterComponents{
otel.Logging: {
ProcessorsByType: map[string][]otel.Component{
// Batch with 2s timeout to accomodate for multiline parsing flush period.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. multiline's flush period should be entirely separate from the batch processor (it's not going to pull partially-parsed entries from within the multiline processor, regardless of what the timeout is set to).

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to keep all test log lines in the same logging request. Currently the "multiline parser" flush_timeout is set to 1s (which is configured the same way as the fluent-bit flush argument).

// Use the log file path to disambiguate if present.
"source_identifier": `attributes.__source_identifier`,
// Set time interval (same as fluent-bit "flush_timeout") to wait for secondary logs to be appended.
"force_flush_period": "1000ms",
},
{

This processor batches all the "last log entries" in a "multiline parser" that may be waiting (until the flush timeout) for a possibly new entry to be appended. This log lines are usually sent in a separate log request.

This is mostly for presentation purposes, but it can avoid "race condition" tests failures/flakiness with the transformation test goldens.

I will improve the comment.

// The OTLP exporter doesn't batch by default like the googlecloud.* exporters.
// We need this to avoid the API point limits.
"metrics": {
otel.BatchProcessor(200, 200, "200ms"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this also going to result in waiting until there are 200 points before we send our first batch? I think that might cause some write errors if we include multiple points for the same time series in the same batch as a result. If I understand the config right, I think 1, 200 would be closer to the existing behavior of the existing googlecloudexporter.

(Feel free to punt this discussion later since the problem already exists.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, 200, 200 will wait for > Isn't this also going to result in waiting until there are 200 points before we send our first batch?

Yes, send_batch_size = send_batch_max_size = 200 will wait until 200 points are added to the batch or 200ms are passed for the batch to be sent.

If I understand the config right, I think 1, 200 would be closer to the existing behavior of the existing googlecloudexporter.

I'm not fully sure the current behaviour of the exporter without batching. send_batch_size = 1 will trigger the "batch to be sent" (but possibly add more points during a time window), while send_batch_max_size = 200 sets a hard limit.

Feel free to punt this discussion later since the problem already exists.

Sure, i'll punt this dicussion, i'll sync with @rafaelwestphal about this.

@franciscovalentecastro franciscovalentecastro added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 31, 2026
@stackdriver-instrumentation-release stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants