|
1 | 1 | --- |
2 | | -title: "Replication Lag Report Design" |
| 2 | +title: "Replication Log Analyzer Tool" |
3 | 3 | --- |
4 | 4 |
|
5 | | -# Replication Lag Report Design |
6 | | - |
7 | | -{% include toc.md %} |
| 5 | +# Directory Server Replication Lag Analyzer Tool |
8 | 6 |
|
9 | 7 | ## Document Version |
10 | 8 |
|
11 | | -0.1 |
| 9 | +1.0 |
12 | 10 |
|
13 | 11 | ## Revision History |
14 | 12 |
|
15 | 13 | | Version | Date | Description of Change | |
16 | 14 | |---------|------------|-----------------------| |
17 | | -| 1.0 | 10-15-2024 | First version | |
| 15 | +| 1.0 | 2025-01-22 | Initial design document | |
18 | 16 |
|
19 | 17 | ## Executive Summary |
20 | 18 |
|
21 | | -The `ReplicationLagReport` class will consolidate the functionality of the existing `LogParser`, `ReplLag`, and `LagInfo` classes into a single, efficient, and reusable component. This design will allow for easy integration with both Ansible modules and standalone CLI tools, providing a comprehensive solution for analyzing and visualizing replication lag in the 389 Directory Server. |
| 19 | +The Directory Server Replication Lag Analyzer Tool is designed to analyze replication performance in 389 Directory Server deployments. It processes access logs from multiple directory servers, calculates replication lag times, and generates comprehensive reports in various formats (CSV, HTML, PNG). The system focuses on two key metrics: |
| 20 | +1. Global Replication Lag: Time difference between the earliest and latest appearance of a CSN across all servers |
| 21 | +2. Hop-by-Hop Replication Lag: Time delays between individual server pairs in the replication topology |
22 | 22 |
|
23 | 23 | ## Architecture Overview |
24 | 24 |
|
25 | | -The `ReplicationLagReport` class will be the central component, handling log parsing, data processing, and report generation. It will encapsulate all necessary functionality without relying on additional helper classes or modules. |
| 25 | +The system consists of three main components: |
| 26 | +1. `DSLogParser`: Parses directory server access logs |
| 27 | +2. `ReplicationLogAnalyzer`: Coordinates log analysis and report generation |
| 28 | +3. `VisualizationHelper`: Handles data visualization and report formatting |
| 29 | + |
| 30 | +## Replication Lag Calculation Technical Details |
| 31 | + |
| 32 | +### Global Replication Lag |
| 33 | +- For each CSN (Change Sequence Number): |
| 34 | + 1. Track timestamp of first appearance across all servers |
| 35 | + 2. Track timestamp of last appearance across all servers |
| 36 | + 3. Global lag = latest_timestamp - earliest_timestamp |
| 37 | + |
| 38 | +### Hop Replication Lag |
| 39 | +- For each CSN: |
| 40 | + 1. Sort server appearances by timestamp |
| 41 | + 2. For consecutive server pairs (supplier → consumer): |
| 42 | + - Hop lag = consumer_timestamp - supplier_timestamp |
| 43 | + 3. Track individual hop lags to identify bottlenecks |
| 44 | + |
| 45 | +### Input Parameters |
| 46 | +1. Log Directories: |
| 47 | + List of paths to server log directories. Each directory represents one server in topology. |
| 48 | + |
| 49 | +2. Filtering Parameters: |
| 50 | + - `suffixes`: List of DN suffixes to analyze |
| 51 | + - `time_range`: Optional start/end datetime range |
| 52 | + - `lag_time_lowest`: Minimum lag threshold |
| 53 | + - `etime_lowest`: Minimum operation execution time |
| 54 | + - `repl_lag_threshold`: Alert threshold for lag times |
| 55 | + |
| 56 | +3. Analysis Options: |
| 57 | + - `anonymous`: Hide server names in reports |
| 58 | + - `only_fully_replicated`: Show only changes reaching all servers |
| 59 | + - `only_not_replicated`: Show only incomplete replication |
| 60 | + - `utc_offset`: Timezone handling |
| 61 | + |
| 62 | +### Output Parameters |
| 63 | +1. Reports: |
| 64 | + - CSV: Detailed event log with global and hop lags |
| 65 | + - HTML: Interactive visualization with Plotly |
| 66 | + - PNG: Static visualization with matplotlib |
| 67 | + - JSON: Summary statistics and analysis |
| 68 | + |
| 69 | +2. Metrics: |
| 70 | + - Global lag statistics (min/max/avg) |
| 71 | + - Hop lag statistics (min/max/avg) |
| 72 | + - Per-suffix update counts |
| 73 | + - Total updates processed |
| 74 | + - Server participation statistics |
26 | 75 |
|
27 | 76 | ## Component Details |
28 | 77 |
|
29 | | -### ReplicationLagReport (Main Class) |
30 | | - |
31 | | -- **Responsibilities**: |
32 | | - - Log file parsing and data extraction |
33 | | - - Data processing and analysis |
34 | | - - Report generation (CSV, PNG, HTML) |
35 | | -- **Key Methods**: |
36 | | - - `__init__(self, config: Dict)` |
37 | | - - `parse_logs(self)` |
38 | | - - `process_data(self)` |
39 | | - - `generate_report(self, report_type: str)` |
| 78 | +### DSLogParser |
| 79 | +- Purpose: Efficient log file parsing |
| 80 | +- Key Features: |
| 81 | + - Batch processing for memory efficiency |
| 82 | + - Timezone-aware timestamp handling |
| 83 | + - Regular expression-based log parsing |
| 84 | + |
| 85 | +### ReplicationLogAnalyzer |
| 86 | +- Purpose: Analysis coordination and report generation |
| 87 | +- Key Features: |
| 88 | + - Multi-server log correlation |
| 89 | + - Flexible filtering options |
| 90 | + - Multiple report format support |
| 91 | + |
| 92 | +### VisualizationHelper |
| 93 | +- Purpose: Data visualization |
| 94 | +- Key Features: |
| 95 | + - Interactive Plotly charts |
| 96 | + - Static matplotlib exports |
| 97 | + - Consistent color schemes |
40 | 98 |
|
41 | 99 | ## Data Flow |
42 | 100 |
|
43 | | -1. `ReplicationLagReport` is initialized with configuration parameters. |
44 | | -2. `parse_logs()` reads and processes input log files. |
45 | | -3. `process_data()` analyzes the collected data. |
46 | | -4. `generate_report()` creates the requested output (CSV, PNG, or HTML). |
47 | | - |
48 | | -## API Definitions |
49 | | - |
50 | | -### ReplicationLagReport |
51 | | - |
52 | | -- `__init__(self, config: Dict)` |
53 | | - - **Parameters**: |
54 | | - - `input_files`: `List[str]` |
55 | | - - `filters`: `Dict` |
56 | | - - `timezone`: `str` |
57 | | -- `parse_logs(self) -> None` |
58 | | -- `process_data(self) -> None` |
59 | | -- `generate_report(self, report_type: str) -> None` |
60 | | - |
61 | | -## Database Changes |
62 | | - |
63 | | -No database changes are required for this implementation. |
64 | | - |
65 | | -## Performance Considerations |
| 101 | +1. Log Collection: |
| 102 | + ``` |
| 103 | + Server Logs → DSLogParser → Parsed Events |
| 104 | + ``` |
66 | 105 |
|
67 | | -- Implement lazy loading for log files to reduce memory usage. |
68 | | -- Use generators for processing large log files. |
| 106 | +2. Analysis: |
| 107 | + ``` |
| 108 | + Parsed Events → ReplicationLogAnalyzer → Lag Calculations |
| 109 | + ``` |
69 | 110 |
|
70 | | -## Security Measures |
71 | | - |
72 | | -- Implement input validation for all user-provided data. |
73 | | -- Sanitize data before generating reports to prevent XSS attacks in HTML output. |
| 111 | +3. Reporting: |
| 112 | + ``` |
| 113 | + Lag Calculations → VisualizationHelper → Reports (CSV/HTML/PNG) |
| 114 | + ``` |
74 | 115 |
|
75 | 116 | ## Challenges and Mitigations |
76 | 117 |
|
77 | | -- **Challenge**: Processing large log files |
78 | | - **Mitigation**: Implement streaming processing and use generators. |
79 | | -- **Challenge**: Maintaining compatibility with existing systems |
80 | | - **Mitigation**: Design the API to be easily adaptable for Ansible modules and CLI tools. |
81 | | -- **Challenge**: Ensuring accuracy of time-based calculations across different timezones |
82 | | - **Mitigation**: Implement robust timezone handling using the `datetime` library. |
83 | | - |
84 | | -## Implementation Roadmap |
85 | | - |
86 | | -### Phase 1: Port Existing Code to lib389 |
87 | | - |
88 | | -1. Port existing Python code to use the `lib389` library. |
89 | | -2. Implement tests for `lib389` code. |
90 | | - |
91 | | -### Phase 2: Develop Command-Line Interface (CLI) in dsconf Tool |
92 | | - |
93 | | -3. Design and develop the CLI. |
94 | | -4. Implement tests for CLI features. |
95 | | -5. Enhance dsconf CLI to consume logs and generate the report. |
96 | | -5a. Add support for `.dsrc` files. |
97 | | - |
98 | | -### Phase 3: Develop Web User Interface (WebUI) in Replication Monitoring Tab |
99 | | - |
100 | | -6. Develop WebUI using CLI code. |
101 | | -7. Add special reports in WebUI using Cockpit functionality. |
102 | | -8. Implement tests for WebUI features (?) |
| 118 | +1. Large Log Files: |
| 119 | + - Challenge: Memory consumption |
| 120 | + - Mitigation: Batch processing, generators |
103 | 121 |
|
104 | | -### Phase 4: Finalization and Deployment |
| 122 | +2. Time Zone Handling: |
| 123 | + - Challenge: Accurate timestamp comparison |
| 124 | + - Mitigation: Consistent UTC conversion |
105 | 125 |
|
106 | | -9. Documentation. |
107 | | -10. Feedback and iteration. |
| 126 | +3. Visualization Performance: |
| 127 | + - Challenge: Large datasets |
| 128 | + - Mitigation: Data sampling, efficient plotting |
108 | 129 |
|
| 130 | +## Authors |
109 | 131 |
|
110 | | -Authors |
111 | | -======= |
| 132 | +Simon Pichugin (@droideck) |
112 | 133 |
|
113 | | -Simon Pichugin (@droideck) |
|
0 commit comments