|
3243 | 3243 | </ul> |
3244 | 3244 | </nav> |
3245 | 3245 |
|
| 3246 | +</li> |
| 3247 | + |
| 3248 | + <li class="md-nav__item"> |
| 3249 | + <a href="#link-time-optimization-lto" class="md-nav__link"> |
| 3250 | + <span class="md-ellipsis"> |
| 3251 | + |
| 3252 | + Link Time Optimization (LTO) |
| 3253 | + |
| 3254 | + </span> |
| 3255 | + </a> |
| 3256 | + |
| 3257 | + <nav class="md-nav" aria-label="Link Time Optimization (LTO)"> |
| 3258 | + <ul class="md-nav__list"> |
| 3259 | + |
| 3260 | + <li class="md-nav__item"> |
| 3261 | + <a href="#why-lto-matters-for-rice" class="md-nav__link"> |
| 3262 | + <span class="md-ellipsis"> |
| 3263 | + |
| 3264 | + Why LTO Matters for Rice |
| 3265 | + |
| 3266 | + </span> |
| 3267 | + </a> |
| 3268 | + |
| 3269 | +</li> |
| 3270 | + |
| 3271 | + <li class="md-nav__item"> |
| 3272 | + <a href="#build-size-comparison" class="md-nav__link"> |
| 3273 | + <span class="md-ellipsis"> |
| 3274 | + |
| 3275 | + Build Size Comparison |
| 3276 | + |
| 3277 | + </span> |
| 3278 | + </a> |
| 3279 | + |
| 3280 | +</li> |
| 3281 | + |
| 3282 | + <li class="md-nav__item"> |
| 3283 | + <a href="#debug-symbol-splitting-gccclang" class="md-nav__link"> |
| 3284 | + <span class="md-ellipsis"> |
| 3285 | + |
| 3286 | + Debug Symbol Splitting (GCC/Clang) |
| 3287 | + |
| 3288 | + </span> |
| 3289 | + </a> |
| 3290 | + |
| 3291 | +</li> |
| 3292 | + |
| 3293 | + </ul> |
| 3294 | + </nav> |
| 3295 | + |
3246 | 3296 | </li> |
3247 | 3297 |
|
3248 | 3298 | </ul> |
@@ -3683,6 +3733,62 @@ <h3 id="linux">Linux<a class="headerlink" href="#linux" title="Permanent link">& |
3683 | 3733 | <p>This instructs the linker to ignore all unresolved symbols during linking. The symbols will be resolved when the Ruby interpreter loads the shared library.</p> |
3684 | 3734 | <h3 id="windows">Windows<a class="headerlink" href="#windows" title="Permanent link">¶</a></h3> |
3685 | 3735 | <p>On Windows, no special linker flags are required. The extension links directly against the Ruby library (e.g., <code>x64-vcruntime140-ruby320.lib</code>), which provides the necessary symbols at link time.</p> |
| 3736 | +<h2 id="link-time-optimization-lto">Link Time Optimization (LTO)<a class="headerlink" href="#link-time-optimization-lto" title="Permanent link">¶</a></h2> |
| 3737 | +<p>Link Time Optimization is <strong>highly recommended</strong> for Rice extensions, especially for release builds. Rice makes extensive use of C++ templates, which can result in significant code duplication across translation units. LTO allows the linker to deduplicate template instantiations and perform whole-program optimizations.</p> |
| 3738 | +<h3 id="why-lto-matters-for-rice">Why LTO Matters for Rice<a class="headerlink" href="#why-lto-matters-for-rice" title="Permanent link">¶</a></h3> |
| 3739 | +<p>Rice's template-heavy design means that each <code>.cpp</code> file that uses Rice generates its own copies of template instantiations. Without LTO, these duplicate instantiations remain in the final binary, dramatically increasing its size. LTO enables the linker to:</p> |
| 3740 | +<ul> |
| 3741 | +<li>Deduplicate identical template instantiations across object files</li> |
| 3742 | +<li>Inline functions across translation unit boundaries</li> |
| 3743 | +<li>Remove dead code more effectively</li> |
| 3744 | +</ul> |
| 3745 | +<h3 id="build-size-comparison">Build Size Comparison<a class="headerlink" href="#build-size-comparison" title="Permanent link">¶</a></h3> |
| 3746 | +<p>The following table shows the impact of LTO on a real-world Rice extension (opencv-ruby bindings):</p> |
| 3747 | +<table> |
| 3748 | +<thead> |
| 3749 | +<tr> |
| 3750 | +<th>Platform</th> |
| 3751 | +<th>Debug</th> |
| 3752 | +<th>Release (with LTO)</th> |
| 3753 | +<th>Size Reduction</th> |
| 3754 | +</tr> |
| 3755 | +</thead> |
| 3756 | +<tbody> |
| 3757 | +<tr> |
| 3758 | +<td>MSVC</td> |
| 3759 | +<td>140 MB</td> |
| 3760 | +<td>70 MB</td> |
| 3761 | +<td>50%</td> |
| 3762 | +</tr> |
| 3763 | +<tr> |
| 3764 | +<td>macOS</td> |
| 3765 | +<td>324 MB</td> |
| 3766 | +<td>164 MB</td> |
| 3767 | +<td>50%</td> |
| 3768 | +</tr> |
| 3769 | +<tr> |
| 3770 | +<td>MinGW</td> |
| 3771 | +<td>1.4 GB</td> |
| 3772 | +<td>200 MB</td> |
| 3773 | +<td>86%</td> |
| 3774 | +</tr> |
| 3775 | +</tbody> |
| 3776 | +</table> |
| 3777 | +<p>As shown, LTO provides substantial size reductions across all platforms, with MinGW benefiting the most dramatically.</p> |
| 3778 | +<p>Rice's <code>CMakePreset.json</code> automatically enables LTO by setting <code>CMAKE_INTERPROCEDURAL_OPTIMIZATION</code> to <code>ON</code>.</p> |
| 3779 | +<p>If you are using <code>extconf.rb (Mkmf)</code> then:</p> |
| 3780 | +<div class="highlight"><pre><span></span><code><a id="__codelineno-5-1" name="__codelineno-5-1" href="#__codelineno-5-1"></a><span class="vg">$CXXFLAGS</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s2">" -flto"</span> |
| 3781 | +<a id="__codelineno-5-2" name="__codelineno-5-2" href="#__codelineno-5-2"></a><span class="vg">$LDFLAGS</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s2">" -flto"</span> |
| 3782 | +</code></pre></div> |
| 3783 | +<p>For MSVC:</p> |
| 3784 | +<div class="highlight"><pre><span></span><code><a id="__codelineno-6-1" name="__codelineno-6-1" href="#__codelineno-6-1"></a><span class="vg">$CXXFLAGS</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s2">" /GL"</span> |
| 3785 | +<a id="__codelineno-6-2" name="__codelineno-6-2" href="#__codelineno-6-2"></a><span class="vg">$LDFLAGS</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s2">" /LTCG"</span> |
| 3786 | +</code></pre></div> |
| 3787 | +<h3 id="debug-symbol-splitting-gccclang">Debug Symbol Splitting (GCC/Clang)<a class="headerlink" href="#debug-symbol-splitting-gccclang" title="Permanent link">¶</a></h3> |
| 3788 | +<p>For debug builds with GCC or Clang, consider using <code>-gsplit-dwarf</code> to separate debug information into <code>.dwo</code> files. This keeps the main binary smaller while preserving full debug capability:</p> |
| 3789 | +<div class="highlight"><pre><span></span><code><a id="__codelineno-7-1" name="__codelineno-7-1" href="#__codelineno-7-1"></a><span class="nb">set</span><span class="p">(</span><span class="s">CMAKE_CXX_FLAGS_DEBUG</span><span class="w"> </span><span class="s2">"-g -gsplit-dwarf"</span><span class="p">)</span> |
| 3790 | +</code></pre></div> |
| 3791 | +<p>This is particularly useful for g++ where debug builds can exceed 1 GB without it.</p> |
3686 | 3792 |
|
3687 | 3793 |
|
3688 | 3794 |
|
|
0 commit comments