Skip to content

Commit e4fac1c

Browse files
committed
redraft
1 parent fef44c0 commit e4fac1c

1 file changed

Lines changed: 100 additions & 80 deletions

File tree

docs/389ds/design/replication-lag-report-design.md

Lines changed: 100 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -1,113 +1,133 @@
11
---
2-
title: "Replication Lag Report Design"
2+
title: "Replication Log Analyzer Tool"
33
---
44

5-
# Replication Lag Report Design
6-
7-
{% include toc.md %}
5+
# Directory Server Replication Lag Analyzer Tool
86

97
## Document Version
108

11-
0.1
9+
1.0
1210

1311
## Revision History
1412

1513
| Version | Date | Description of Change |
1614
|---------|------------|-----------------------|
17-
| 1.0 | 10-15-2024 | First version |
15+
| 1.0 | 2025-01-22 | Initial design document |
1816

1917
## Executive Summary
2018

21-
The `ReplicationLagReport` class will consolidate the functionality of the existing `LogParser`, `ReplLag`, and `LagInfo` classes into a single, efficient, and reusable component. This design will allow for easy integration with both Ansible modules and standalone CLI tools, providing a comprehensive solution for analyzing and visualizing replication lag in the 389 Directory Server.
19+
The Directory Server Replication Lag Analyzer Tool is designed to analyze replication performance in 389 Directory Server deployments. It processes access logs from multiple directory servers, calculates replication lag times, and generates comprehensive reports in various formats (CSV, HTML, PNG). The system focuses on two key metrics:
20+
1. Global Replication Lag: Time difference between the earliest and latest appearance of a CSN across all servers
21+
2. Hop-by-Hop Replication Lag: Time delays between individual server pairs in the replication topology
2222

2323
## Architecture Overview
2424

25-
The `ReplicationLagReport` class will be the central component, handling log parsing, data processing, and report generation. It will encapsulate all necessary functionality without relying on additional helper classes or modules.
25+
The system consists of three main components:
26+
1. `DSLogParser`: Parses directory server access logs
27+
2. `ReplicationLogAnalyzer`: Coordinates log analysis and report generation
28+
3. `VisualizationHelper`: Handles data visualization and report formatting
29+
30+
## Replication Lag Calculation Technical Details
31+
32+
### Global Replication Lag
33+
- For each CSN (Change Sequence Number):
34+
1. Track timestamp of first appearance across all servers
35+
2. Track timestamp of last appearance across all servers
36+
3. Global lag = latest_timestamp - earliest_timestamp
37+
38+
### Hop Replication Lag
39+
- For each CSN:
40+
1. Sort server appearances by timestamp
41+
2. For consecutive server pairs (supplier → consumer):
42+
- Hop lag = consumer_timestamp - supplier_timestamp
43+
3. Track individual hop lags to identify bottlenecks
44+
45+
### Input Parameters
46+
1. Log Directories:
47+
List of paths to server log directories. Each directory represents one server in topology.
48+
49+
2. Filtering Parameters:
50+
- `suffixes`: List of DN suffixes to analyze
51+
- `time_range`: Optional start/end datetime range
52+
- `lag_time_lowest`: Minimum lag threshold
53+
- `etime_lowest`: Minimum operation execution time
54+
- `repl_lag_threshold`: Alert threshold for lag times
55+
56+
3. Analysis Options:
57+
- `anonymous`: Hide server names in reports
58+
- `only_fully_replicated`: Show only changes reaching all servers
59+
- `only_not_replicated`: Show only incomplete replication
60+
- `utc_offset`: Timezone handling
61+
62+
### Output Parameters
63+
1. Reports:
64+
- CSV: Detailed event log with global and hop lags
65+
- HTML: Interactive visualization with Plotly
66+
- PNG: Static visualization with matplotlib
67+
- JSON: Summary statistics and analysis
68+
69+
2. Metrics:
70+
- Global lag statistics (min/max/avg)
71+
- Hop lag statistics (min/max/avg)
72+
- Per-suffix update counts
73+
- Total updates processed
74+
- Server participation statistics
2675

2776
## Component Details
2877

29-
### ReplicationLagReport (Main Class)
30-
31-
- **Responsibilities**:
32-
- Log file parsing and data extraction
33-
- Data processing and analysis
34-
- Report generation (CSV, PNG, HTML)
35-
- **Key Methods**:
36-
- `__init__(self, config: Dict)`
37-
- `parse_logs(self)`
38-
- `process_data(self)`
39-
- `generate_report(self, report_type: str)`
78+
### DSLogParser
79+
- Purpose: Efficient log file parsing
80+
- Key Features:
81+
- Batch processing for memory efficiency
82+
- Timezone-aware timestamp handling
83+
- Regular expression-based log parsing
84+
85+
### ReplicationLogAnalyzer
86+
- Purpose: Analysis coordination and report generation
87+
- Key Features:
88+
- Multi-server log correlation
89+
- Flexible filtering options
90+
- Multiple report format support
91+
92+
### VisualizationHelper
93+
- Purpose: Data visualization
94+
- Key Features:
95+
- Interactive Plotly charts
96+
- Static matplotlib exports
97+
- Consistent color schemes
4098

4199
## Data Flow
42100

43-
1. `ReplicationLagReport` is initialized with configuration parameters.
44-
2. `parse_logs()` reads and processes input log files.
45-
3. `process_data()` analyzes the collected data.
46-
4. `generate_report()` creates the requested output (CSV, PNG, or HTML).
47-
48-
## API Definitions
49-
50-
### ReplicationLagReport
51-
52-
- `__init__(self, config: Dict)`
53-
- **Parameters**:
54-
- `input_files`: `List[str]`
55-
- `filters`: `Dict`
56-
- `timezone`: `str`
57-
- `parse_logs(self) -> None`
58-
- `process_data(self) -> None`
59-
- `generate_report(self, report_type: str) -> None`
60-
61-
## Database Changes
62-
63-
No database changes are required for this implementation.
64-
65-
## Performance Considerations
101+
1. Log Collection:
102+
```
103+
Server Logs → DSLogParser → Parsed Events
104+
```
66105

67-
- Implement lazy loading for log files to reduce memory usage.
68-
- Use generators for processing large log files.
106+
2. Analysis:
107+
```
108+
Parsed Events → ReplicationLogAnalyzer → Lag Calculations
109+
```
69110

70-
## Security Measures
71-
72-
- Implement input validation for all user-provided data.
73-
- Sanitize data before generating reports to prevent XSS attacks in HTML output.
111+
3. Reporting:
112+
```
113+
Lag Calculations → VisualizationHelper → Reports (CSV/HTML/PNG)
114+
```
74115

75116
## Challenges and Mitigations
76117

77-
- **Challenge**: Processing large log files
78-
**Mitigation**: Implement streaming processing and use generators.
79-
- **Challenge**: Maintaining compatibility with existing systems
80-
**Mitigation**: Design the API to be easily adaptable for Ansible modules and CLI tools.
81-
- **Challenge**: Ensuring accuracy of time-based calculations across different timezones
82-
**Mitigation**: Implement robust timezone handling using the `datetime` library.
83-
84-
## Implementation Roadmap
85-
86-
### Phase 1: Port Existing Code to lib389
87-
88-
1. Port existing Python code to use the `lib389` library.
89-
2. Implement tests for `lib389` code.
90-
91-
### Phase 2: Develop Command-Line Interface (CLI) in dsconf Tool
92-
93-
3. Design and develop the CLI.
94-
4. Implement tests for CLI features.
95-
5. Enhance dsconf CLI to consume logs and generate the report.
96-
5a. Add support for `.dsrc` files.
97-
98-
### Phase 3: Develop Web User Interface (WebUI) in Replication Monitoring Tab
99-
100-
6. Develop WebUI using CLI code.
101-
7. Add special reports in WebUI using Cockpit functionality.
102-
8. Implement tests for WebUI features (?)
118+
1. Large Log Files:
119+
- Challenge: Memory consumption
120+
- Mitigation: Batch processing, generators
103121

104-
### Phase 4: Finalization and Deployment
122+
2. Time Zone Handling:
123+
- Challenge: Accurate timestamp comparison
124+
- Mitigation: Consistent UTC conversion
105125

106-
9. Documentation.
107-
10. Feedback and iteration.
126+
3. Visualization Performance:
127+
- Challenge: Large datasets
128+
- Mitigation: Data sampling, efficient plotting
108129

130+
## Authors
109131

110-
Authors
111-
=======
132+
Simon Pichugin (@droideck)
112133

113-
Simon Pichugin (@droideck)

0 commit comments

Comments
 (0)