There exists increasing pressure for software to operate more efficiently with existing resources. Improvements in the compiler can yield big performance gains, which lead to big reductions in cost. However, the release process of the current solution, feedback directed optimization (FDO), is too difficult to maintain, because the traditional FDO follows a fixed three-step pattern, which does not fit well with the rapidly changing performance-critical portions of the data center applications.
AutoFDO aims to simplify real-world deployment of feedback-directed optimization (FDO) by sampling hardware performance monitors on production machines and applying the profiles to the next release process. Furthermore, to enable AutoFDO only requires some new flags on the compiler program. At Google, AutoFDO has largely increased the number of FDO users and has doubled the number of cycles spent in FDO-optimized binaries.
- The most performance-critical portions are often rapidly changing.
- For AutoFDO, imprecise profiles from real workloads are as good as precise profiles from training inputs.
- Instead of using
{file_name, line_number, discriminator}to represent the source location, AutoFDO uses{function_name, line_offset, discriminator}, which can eliminate the negative effects caused by any changes to the source codes. - From the point of view of software engineering, AutoFDO provides compiler supports and considers the practical scenario of deployment in a production environment. All of these design considerations greatly improve usability, which eliminates the limitations of the traditional FDO.
- AutoFDO can still expose latent bugs in applications because FDO can result in some substantially different optimization decisions for the compiler.
- AutoFDO does not address some instability problems because the collected profile may contain some wrong information, which can lead to cascading overload of the program.
- In the majority of cases, AutoFDO can achieve >90% of the FDO speedup. However, due to the inaccurate debug info, AutoFDO tends to be less effective where loop nesting is tight and complicated.
- While applying iterative AutoFDO, most application performance fluctuates every other iteration. After integrating indirect call promotion into the profile preparation, the iterative AutoFDO performance becomes much smoother.
- Using offset to the function start line to represent source location in profile works much better than using absolute line number. Besides, for different kinds of applications (i.e., stable project and rapid-changing project), using old profile incurs the different performance penalty. For a rapid-changing project, using old profile can still provide 50% of the speedup obtained from using fresh profile.
To further enhance the performance, researchers should improve profile quality. How to improve the profile quality is an interesting question. As well, better profile quality can also address the instability problems.