Optimization: Parallelize GXS message deserialization using OpenMP (4x speedup) #246
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Code by Antigravity
This PR requires RetroShare pr/3136
Description
This PR significantly optimizes the loading performance of GXS services (Channels, Forums, Boards) by parallelizing the message deserialization process.
Profiling identified
RsGenExchange::getMsgDataas a major bottleneck during the loading phase, where deserialization was performed sequentially on a single thread. This PR introduces OpenMP to parallelize this workload across available CPU cores.Changes
Enabled
-fopenmpcompiler and linker flags globally for Linux and Windows (MSYS2) builds. This ensures consistent OpenMP support acrosslibretroshareand the GUI executable.libretroshare): Refactored the main loop inRsGenExchange::getMsgDatato use#pragma omp parallel for. This allows concurrent deserialization ofRsGxsMsgItemobjects.Performance Results
Benchmarks performed on an Intel Xeon E3-1230 v6 (4 cores / 8 threads) on Ubuntu 24.04, and on an Intel 4790K (4 cores / 8 threads) on Windows 10
Example: Deserialization time on Xeon:
| Dataset Size | Serial (Before) | Parallel (After) | Speedup |
| Large (~6000 items) | ~1900 ms | ~440 ms | 4.3x
| Medium (~3000 items) | ~265 ms | ~65 ms | 4.0x
| Small (~500 items) | ~388 ms | ~80 ms | 4.8x
Impact
Notes
std::sortoperations from the UI thread to a worker thread was tested but yielded negligible gains (~50ms) compared to Qt's rendering cost. Therefore, UI-specific changes were reverted to maintain code simplicity.