diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_index.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_index.md index b4fc50980a..2187c9270e 100644 --- a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/_index.md @@ -1,5 +1,5 @@ --- -title: Find CPU Cycle Hotspots with Arm Performix +title: Find Code Hotspots with Arm Performix draft: true cascade: @@ -7,10 +7,10 @@ cascade: minutes_to_complete: 30 -who_is_this_for: Software developers and performance engineers who want to identify CPU cycle hotspots in applications running on Arm Linux systems. +who_is_this_for: Software developers and performance engineers who want to identify code hotspots in applications running on Arm Linux systems. learning_objectives: - - Run the CPU Cycle Hotspot recipe in Arm Performix + - Run the Code Hotspots recipe in Arm Performix - Identify which functions consume the most CPU cycles and target them for optimization prerequisites: diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/code-hotspots-config.png b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/code-hotspots-config.png new file mode 100644 index 0000000000..c47b9908bd Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/code-hotspots-config.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/hotspot-config.jpg b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/hotspot-config.jpg deleted file mode 100644 index dab941e333..0000000000 Binary files a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/hotspot-config.jpg and /dev/null differ diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-1.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-1.md index 7fb18e685f..5c32687c14 100644 --- a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-1.md +++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-1.md @@ -26,4 +26,4 @@ This Learning Path does not cover flame graphs in depth. To learn more, see [Bre On Linux, flame graphs are commonly generated from samples collected with `perf`. perf periodically interrupts the running program and records a stack trace, then the collected stacks are converted into a folded format and rendered as the graph. Sampling frequency is important. If the frequency is too low you may miss short-lived hotspots, and if it is too high you may introduce overhead or skew the results. To make the output informative, compile with debug symbols and preserve frame pointers so stacks resolve to meaningful function names and unwind reliably. A typical build uses `-g` and `-fno-omit-frame-pointer`. -Arm has built a tool, Arm Performix that simplifies this workflow through the CPU Cycle hotspot recipe, making it easier to configure collection, run captures, and explore the resulting call hierarchies without manually stitching together the individual steps. This is the tooling solution you will use in this Learning Path. \ No newline at end of file +Arm has built a tool, Arm Performix that simplifies this workflow through the Code Hotspots recipe, making it easier to configure collection, run captures, and explore the resulting call hierarchies without manually stitching together the individual steps. This is the tooling solution you will use in this Learning Path. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-3.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-3.md index 25702f8bcd..7cee5fd77c 100644 --- a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-3.md +++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-3.md @@ -6,12 +6,12 @@ weight: 4 layout: learningpathall --- -## Run CPU Cycle Hotspot Recipe +## Run Code Hotspots Recipe -As shown in the `main.cpp` file below, the program generates a 1920×1080 bitmap image of the fractal. To identify performance bottlenecks, run the CPU Cycle Hotspot recipe in Arm Performix (APX). APX uses sampling to estimate where the CPU spends most of its time, allowing it to highlight the hottest functions—especially useful in larger applications where it isn't obvious ahead of time which functions will dominate runtime. +As shown in the `main.cpp` file below, the program generates a 1920×1080 bitmap image of the fractal. To identify performance bottlenecks, run the Code Hotspots recipe in Arm Performix (APX). APX uses sampling to estimate where the CPU spends most of its time, allowing it to highlight the hottest functions—especially useful in larger applications where it isn't obvious ahead of time which functions will dominate runtime. {{% notice Note %}} -The `myplot.draw()` call uses a relative path (`./images/green.bmp`). When APX launches the binary, it runs it from `/tmp/atperf/tools/atperf-agent`, so the image would be written there rather than to your project directory. Replace the first string argument with the absolute path to your `images` folder (for example, `/home/ec2-user/Mandelbrot-Example/green.bmp`) and rebuild the application before continuing. +The `myplot.draw()` call uses a relative path (`./images/green.bmp`). When APX launches the binary, it runs it from a temporary location, so the image would be written there rather than to your project directory. To ensure the output is saved where you expect it, update the first string argument in `main.cpp` to the absolute path of the output file, for example `/home/ec2-user/Mandelbrot-Example/images/green.bmp`. {{% /notice %}} ```cpp @@ -23,15 +23,20 @@ using namespace std; int main(){ Mandelbrot::Mandelbrot myplot(1920, 1080); - myplot.draw("./images/green.bmp", Mandelbrot::Mandelbrot::GREEN); + myplot.draw("/home/ec2-user/Mandelbrot-Example/images/green.bmp", Mandelbrot::Mandelbrot::GREEN); return 0; } ``` + Rebuild the application before continuing: -Open APX from the host machine. Select the **CPU Cycle Hotspot** recipe. If this is the first time running the recipe on this target machine you may need to select the install tools button. + ```bash + ./build.sh + ``` -![The Arm Performix recipe selection screen with the CPU Cycle Hotspot recipe highlighted#center](./install-tools.jpg "Selecting the CPU Cycle Hotspot recipe") +Open APX from the host machine. Select the **Code Hotspot** recipe. If this is the first time running the recipe on this target machine you may need to select the install tools button. + +![The Arm Performix recipe selection screen with the Code Hotspots recipe highlighted#center](./install-tools.jpg "Selecting the Code Hotspots recipe") Configure the recipe to launch a new process. APX will automatically start collecting metrics when the program starts and stop when the program exits. @@ -39,7 +44,7 @@ Provide the absolute path to the binary built in the previous step: `/home/ec2-u Use the default sampling rate of **Normal**. If your application is short-running, consider a higher sample rate, at the cost of more data to store and process. -![The Arm Performix CPU Cycle Hotspot recipe configuration screen showing launch settings, binary path, and sampling rate fields#center](./hotspot-config.jpg "CPU Cycle Hotspot recipe configuration") +![The Arm Performix Code Hotspots recipe configuration screen showing launch settings, binary path, and sampling rate fields#center](./code-hotspots-config.png "Code Hotspots recipe configuration") ## Analyse Results @@ -47,11 +52,11 @@ A flame graph is generated once the run completes. The default colour mode label ![A flame graph showing single-threaded Mandelbrot profiling results with __complex_abs__ as the dominant hotspot#center](./single-thread-flame-graph.jpg "Single-threaded flame graph showing __complex_abs__ as the hottest function") -To investigate further, you can map source code lines to the functions in the flame graph. Right-click on a specific function and select **View Source Code**. At the time of writing (ATP Engine 0.44.0), you may need to copy the source code onto your host machine. +To investigate further, you can map source code lines to the functions in the flame graph. Right-click on a specific function and select **View Source Code**. You may need to copy the source code onto your host machine to use this feature. ![The Arm Performix flame graph view showing source code annotations mapped to the selected hot function#center](./view-with-source-code.jpg "Flame graph with source code view") -Finally, check your `images` directory for the generated bitmap fractal. +Finally, check your `images` directory for the generated bitmap fractal `green.bmp` ![A rendered Mandelbrot set fractal in green, generated from the single-threaded build at maximum iterations#center](./plot-1-thread-max-iterations.jpg "Mandelbrot fractal output from single-threaded build") diff --git a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-4.md b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-4.md index 92edd8cd96..5e1ced249a 100644 --- a/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-4.md +++ b/content/learning-paths/servers-and-cloud-computing/cpu_hotspot_performix/how-to-4.md @@ -33,7 +33,7 @@ public: ... ``` -On the remote server, reduce `MAX_ITERATIONS` in `Mandelbrot.h` to `(1<<9)` (512), update the output filename in `main.cpp` to a different path so you can compare the output with the baseline image, then rebuild: +On the remote server, reduce `MAX_ITERATIONS` in `Mandelbrot.h` to `(1<<9)` (512), update the output image filename in `main.cpp` to a different name (for example: `green-512.bmp`)so you can compare the output with the baseline image, then rebuild: ```bash ./build.sh @@ -49,13 +49,20 @@ There is negligible difference in perceived image quality when halving `MAX_ITER The loop in `Mandelbrot::getIterations` has no loop-carried dependencies — each iteration's result is independent of any other. This means you can parallelize the hot function across multiple threads if your CPU has multiple cores. -The repository contains a parallel version in the `main` branch: +The repository contains a parallel version in the `main` branch. First stash the changes you made locally, then switch to the `main` branch. ```bash +git stash git checkout main ``` -This branch parallelizes the `Mandelbrot::draw` function, which is an earlier function in the call stack that eventually calls `__hypot`. Build the example. This creates a binary `./builds/mandelbrot-parallel` that takes a single numeric command-line argument to set the number of threads. +This branch parallelizes the `Mandelbrot::draw` function, which is an earlier function in the call stack that eventually calls `__hypot`. Before building, update the `myplot.draw()` call in `main.cpp` to use an absolute output path: + +```cpp +myplot.draw("/home/ec2-user/Mandelbrot-Example/images/Green-Parallel-512.bmp", Mandelbrot::Mandelbrot::GREEN); +``` + +Build the example. This creates a binary `./builds/mandelbrot-parallel` that takes a single numeric command-line argument to set the number of threads. ```bash ./build.sh @@ -84,6 +91,6 @@ The build script uses the `-O0` flag, which disables all compiler optimizations. In this Learning Path, you reduced the runtime of the Mandelbrot example by focusing on the hottest code paths—cutting execution time from around 1 minute to ~1 second through targeted optimization and parallelization. While this example is relatively simple and the optimizations are more obvious, the same principle applies to real-world workloads: optimize what matters most first, based on measurement. -The CPU Cycle Hotspot recipe is designed to quickly identify an application's most CPU-time-dominant functions, giving you a clear, evidence-based starting point for performance work. By surfacing where execution time is actually spent, it ensures your optimizations target the parts of the code most likely to deliver the largest gains. +The Code Hotspots recipe is designed to quickly identify an application's most CPU-time-dominant functions, giving you a clear, evidence-based starting point for performance work. By surfacing where execution time is actually spent, it ensures your optimizations target the parts of the code most likely to deliver the largest gains. This is often one of the first profiling steps to run when assessing an application's performance — especially to determine which functions dominate runtime and should be prioritized. Once hotspots are identified, you can follow up with deeper function-specific analysis, such as memory investigations or top-down studies, and build microbenchmarks around hot functions to explore lower-level bottlenecks and uncover additional optimization opportunities. \ No newline at end of file