-
Notifications
You must be signed in to change notification settings - Fork 80
[confgenerator] Create otel.Logging exporter for Otel Logging.
#2185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…xporter from pipeline.
otel.Logs exporter to have a separate configuration for logs.otel.Logs exporter to configure separately for Otel Logging.
otel.Logs exporter to configure separately for Otel Logging.otel.OTelLogs exporter to configure separately for Otel Logging.
otel.OTelLogs exporter to configure separately for Otel Logging.otel.OTelLogs exporter for Otel Logging.
confgenerator/otel/processors.go
Outdated
| } | ||
|
|
||
| func Batch() Component { | ||
| func BatchLogsExporter() Component { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this called BatchLogsExporter instead of BatchLogsProcessor? (Ditto for the other function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think BatchLogsProcessor is better, i'll change it. I guess i was just overly focused on this processor being related to the Logs Exporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created only one BatchProcessor(sendBatchSize, sendBatchMaxSize int, timeout string) now.
confgenerator/confgenerator.go
Outdated
| @@ -158,10 +168,11 @@ func (uc *UnifiedConfig) GenerateOtelConfig(ctx context.Context, outDir string) | |||
| Pipelines: pipelines, | |||
| Extensions: extensions, | |||
| Exporters: map[otel.ExporterType]otel.Component{ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is where we should be putting the processors, instead of adding magic conditionals inside Generate. That will also make it easy to apply different batching to different signal types. Maybe we can use a type like:
type ExporterComponents struct {
ProcessorsByType map[string][]otel.Component
Exporter otel.Component
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the following type and updated ModularConfig accordingly.
type ExporterComponent struct {
Exporter Component
Processors []Component
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting the type be called ExporterComponents (because it is the set of components needed to implement the exporter), not that the map be renamed to ExporterComponents.
Also, you have a single slice of processors, instead of a different slice of processors for each type. This means you can't apply different batching for metrics, logs, and trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding more details, now its clear for me the goal of your first ExporterComponents proposed idea. This is now the implemented ExporterComponents :
type ExporterComponents struct {
Exporter Component
ProcessorsByType map[string][]Component
}
confgenerator/confgenerator.go
Outdated
| if logsExporter { | ||
| config["log"] = map[string]any{ | ||
| "grpc_pool_size": 20, | ||
| } | ||
| config["sending_queue"] = map[string]any{ | ||
| "enabled": true, | ||
| "num_consumers": 40, | ||
| } | ||
| config["timeout"] = "60s" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance tuning guide says "There is no guaranteed one-size-fits-all combination of settings, and for the majority of use cases the defaults will provide more than sufficient throughput."
Why are the defaults insufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default config for the exporter only reaches 15K/s logs throughput in our Soak Tests. Fluent-bit reaches 40K/s logs throughput. The configs grpc_pool_size = 20 and num_consumers = 40 achieve this goal. I used the values suggested in the "Performance tuning guide".
timeout = 60s is to give the GCL Go SDK a longer timeout to do retries. More details in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you can get higher throughput out of both fluent-bit and otelopscol by changing the number of workers; why isn't 15K/s already sufficent? When we looked at this in April 2024 we concluded that both fluent-bit and otelopscol were constrained by worker count/batch size/API latency, but that both 15K/s and 40K/s were well above the target throughput, so we didn't need to make any changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quentinmit After syncing offline we decided the target throughput will be 10 K/s logs which can be achieved with the current defaults. I removed grpc_pool_size and num_consumers config and only kept timeout.
As we discussed, if we want a higher default throughput we can configure the defaults upstream 1.
Footnotes
confgenerator/confgenerator.go
Outdated
| "resource_filters": []map[string]interface{}{}, | ||
| }, | ||
| func googleCloudExporter(userAgent string, instrumentationLabels, serviceResourceLabels, logsExporter bool) otel.Component { | ||
| config := map[string]interface{}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having two boolean parameters was already confusing enough; since this is purely additive config, can we just use a separate function that calls googleCloudExporter and then adds the additional config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized the "Logging Exporter" config is not only additive. Keeping the "metrics" and "userAgent" is unnecessary. I created googleCloudLoggingExporter as a standalone function. Writing it as "derivative" function resulted in more complicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified further the googleCloudLoggingExporter implementation to only set timeout = 60s.
confgenerator/confgenerator.go
Outdated
| otel.OTel: googleCloudExporter(userAgent, true, true, false), | ||
| otel.OTelLogs: googleCloudExporter(userAgent, true, true, true), | ||
| otel.GMP: googleManagedPrometheusExporter(userAgent), | ||
| otel.OTLP: otlpExporter(userAgent), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you also need to do this for the otel.OTLP exporter type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do.
@XuechunHou Is focusing on the OTLP logs exporter. We've synced and we probably are going to do a very similar set of configs for OTLPLogs which need to happen separately from OTLPMetrics. This distinction will probably happen in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a separate batching for OTLP "logs" and "metrics" in this PR. Done!
confgenerator/otel/modular.go
Outdated
| exporterTypeProcessors := map[ExporterType]Component{ | ||
| OTelLogs: BatchLogsExporter(), | ||
| // The OTLP exporter doesn't batch by default like the googlecloud.* exporters. We need this to avoid the API point limits. | ||
| OTLP: BatchOLTPMetricExporter(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OTLP exporter type can get metrics, traces, and logs. We shouldn't be applying metrics-tuned batching to traces or logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XuechunHou Is focusing on the OTLP logs exporter. We've synced and we probably are going to do a very similar set of configs for OTLPLogs which need to happen separately from OTLPMetrics. This distinction will probably happen in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're introducing the ExporterComponents struct, you are touching the code and you can fix this while you're there. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right :) I added a separate batching for OTLP "logs" and "metrics" in this PR. Done!
…cific `ExporterType`.
confgenerator/confgenerator.go
Outdated
| }, | ||
| otel.OTelLogs: { | ||
| Exporter: googleCloudLoggingExporter(userAgent), | ||
| Processors: otelLogsExporterProcessors(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inlining the processor list provides better readability; creating a separate function tends to disrupt the reading flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, although I would probably resolve it by pushing it up into these functions and having them return the ExporterComponent struct directly, so you can them keep this map where it was in GenerateOTelConfig below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved everything to GenerateOTelConfig after the ExporterComponents struct update. PTAL.
confgenerator/otel/modular.go
Outdated
| // Processors specific to an exporter type. | ||
| if exporterComponents, ok := c.ExporterComponents[exporterType]; ok { | ||
| for i, processor := range exporterComponents.Processors { | ||
| name := processor.name(fmt.Sprintf("%s_%d", prefix, i)) | ||
| processorNames = append(processorNames, name) | ||
| processors[name] = processor.Config | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These processors should be added after the resource detection processors, not before. And they shouldn't have prefix in their names; they need to be shared across all pipelines that use the exporter (so that e.g. logs and metrics can be batched across receivers). You can do both of these by adding the processors like ~this:
exporter := c.ExporterComponents[exporterType]
if _, ok := exporterNames[exporterType]; !ok {
name := exporter.Exporter.name(exporterType.Name())
exporterNames[exporterType] = name
exporters[name] = exporter.Config
}
for i, processor := range exporter.Processors[pipeline.Type] {
name := processor.name(fmt.Sprintf("%s_%s_%d", exporterNames[exporterType], pipeline.Type, i))
processorNames = append(processorNames, name)
if _, ok := processors[name]; !ok {
processors[name] = processor.Config
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding it after the resourcedetection. They should be appended to end of the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
confgenerator/otel/processors.go
Outdated
| } | ||
|
|
||
| func Batch() Component { | ||
| func BatchLogsProcessor() Component { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These only ever get used once, no? Why don't we have both of their definitions inline in modular.go instead of defining new functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the following processor that is reused in several places with different arguments.
func BatchProcessor(sendBatchSize, sendBatchMaxSize int, timeout string) Component {
return Component{
Type: "batch",
Config: map[string]any{
"send_batch_size": sendBatchSize,
"send_batch_max_size": sendBatchMaxSize,
"timeout": timeout,
},
}
}
confgenerator/confgenerator.go
Outdated
| }, | ||
| otel.OTelLogs: { | ||
| Exporter: googleCloudLoggingExporter(userAgent), | ||
| Processors: otelLogsExporterProcessors(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, although I would probably resolve it by pushing it up into these functions and having them return the ExporterComponent struct directly, so you can them keep this map where it was in GenerateOTelConfig below.
otel.OTelLogs exporter for Otel Logging.otel.Logging exporter for Otel Logging.
quentinmit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking as approved assuming none of the remaining comments need another back-and-forth.
| Config: map[string]any{ | ||
| "project": "my-project", | ||
| "sending_queue": map[string]any{ | ||
| "enabled": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we disabling the sending queue? Won't this result in effectively num_workers = 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we disabling the sending queue?
I didn't have a reason, since this was implemented in the initial Otel Logging support PR. I was mostly copying the exporter config to the new format.
One of my hypothesis is that "multiple workers" could potentially send the log entries to the mock server in different order, causing changing "goldens" in our transformation tests.
Won't this result in effectively num_workers = 1?
I think that is true.
| Exporters: map[otel.ExporterType]otel.ExporterComponents{ | ||
| otel.Logging: { | ||
| ProcessorsByType: map[string][]otel.Component{ | ||
| // Batch with 2s timeout to accomodate for multiline parsing flush period. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. multiline's flush period should be entirely separate from the batch processor (it's not going to pull partially-parsed entries from within the multiline processor, regardless of what the timeout is set to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to keep all test log lines in the same logging request. Currently the "multiline parser" flush_timeout is set to 1s (which is configured the same way as the fluent-bit flush argument).
ops-agent/confgenerator/logging_processors.go
Lines 487 to 492 in a6937e1
| // Use the log file path to disambiguate if present. | |
| "source_identifier": `attributes.__source_identifier`, | |
| // Set time interval (same as fluent-bit "flush_timeout") to wait for secondary logs to be appended. | |
| "force_flush_period": "1000ms", | |
| }, | |
| { |
This processor batches all the "last log entries" in a "multiline parser" that may be waiting (until the flush timeout) for a possibly new entry to be appended. This log lines are usually sent in a separate log request.
This is mostly for presentation purposes, but it can avoid "race condition" tests failures/flakiness with the transformation test goldens.
I will improve the comment.
| // The OTLP exporter doesn't batch by default like the googlecloud.* exporters. | ||
| // We need this to avoid the API point limits. | ||
| "metrics": { | ||
| otel.BatchProcessor(200, 200, "200ms"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this also going to result in waiting until there are 200 points before we send our first batch? I think that might cause some write errors if we include multiple points for the same time series in the same batch as a result. If I understand the config right, I think 1, 200 would be closer to the existing behavior of the existing googlecloudexporter.
(Feel free to punt this discussion later since the problem already exists.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, 200, 200 will wait for > Isn't this also going to result in waiting until there are 200 points before we send our first batch?
Yes, send_batch_size = send_batch_max_size = 200 will wait until 200 points are added to the batch or 200ms are passed for the batch to be sent.
If I understand the config right, I think
1, 200would be closer to the existing behavior of the existinggooglecloudexporter.
I'm not fully sure the current behaviour of the exporter without batching. send_batch_size = 1 will trigger the "batch to be sent" (but possibly add more points during a time window), while send_batch_max_size = 200 sets a hard limit.
Feel free to punt this discussion later since the problem already exists.
Sure, i'll punt this dicussion, i'll sync with @rafaelwestphal about this.

Description
Create
otel.OTelLogsexporter to have a separate configuration for Otel Logging.Details
Why do we need a separate exporter ?
sending_queueconfigurations set in this PR and followup changes for "Log Buffering" (b/394636880) are incompatible with metrics exporting (for example, we can't sent point out of order from the queue).Why put the
BatchProcessorat the end of the pipeline ?Why do we set
timeout = 60sto theOTelLogsexporter ?retry_on_failure(Remove retry_on_failure from the googlecloud exporters open-telemetry/opentelemetry-collector-contrib#25900), so the only "retry" logic we can use lives within theGo Cloud SDK Loglibraries.timeoutset is used aContext Timeout[1] for the underlying log request retry logic [2] which is configured with a60smax time limit for retries.Why choosegrpc_pool_size = 10,num_consumers = 20andsend_batch_max_size = 1000?I chose similar values from the "Performance Tuning" guide in [3] while also validating the exporter will reach 40K/s throughput. We can discuss further.[1] https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/main/exporter/collector/logs.go#L385
[2] https://github.com/googleapis/google-cloud-go/blob/main/logging/logging.go#L294
[3] https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/tree/main/exporter/collector#performance-tuning
Related issue
b/477325045
How has this been tested?
Existing confgenerator, transformation and integration tests.
Checklist: