Sync issues with d1_log_aggregation service
The d1 log aggregation service running on the MNs periodically syncs the event logs to CN Solr. We use these raw event logs to push to the ELK stack (running on the logproc machine) to compute metrics for our services.
There seems to be some issue with the logic for the event log syncing. Whenever this process encounters some transient errors (network outage, etc.) the logaggreation service retries sync a couple of times and after all unsuccessful attempts, it automatically turns off the d1NodeAggregateLogs flag.
At this point, the MN no longer syncs the event logs with the CN until the d1NodeAggregateLogs flag is turned back on manually.
Possibly the logic that needs refactoring to solve this issue. (code block)
Sync issues with d1_log_aggregation service
The d1 log aggregation service running on the MNs periodically syncs the event logs to CN Solr. We use these raw event logs to push to the ELK stack (running on the logproc machine) to compute metrics for our services.
There seems to be some issue with the logic for the event log syncing. Whenever this process encounters some transient errors (network outage, etc.) the logaggreation service retries sync a couple of times and after all unsuccessful attempts, it automatically turns off the
d1NodeAggregateLogsflag.At this point, the MN no longer syncs the event logs with the CN until the
d1NodeAggregateLogsflag is turned back on manually.Possibly the logic that needs refactoring to solve this issue. (code block)