A Practical Guide to Using `perf` on Linux

perf is a Linux performance analysis tool used to measure and profile the behavior of programs and systems. It can collect both high-level statistics and detailed sampling data, which makes it useful for performance tuning, debugging, and computer architecture analysis.

What `perf` Can Help You Study

CPU cycles
Instructions retired
Cache behavior
Branch prediction behavior
Function-level hotspots
System-wide performance issues

Environment on Our Server

On our server, perf has already been installed and can be used directly.

perf --version

Note: Restricted access to kernel symbols or kernel profiling scope is normal on this server. For our benchmark work, this is not a problem. We only need to analyze the user-space benchmark code, so limited kernel-space visibility does not affect the main results.

Basic Syntax

perf <subcommand> [options] [command]

Common Subcommands

Subcommand	Purpose
`perf list`	Show available events
`perf stat`	Collect summary statistics
`perf record`	Record sampling data
`perf report`	Explore recorded samples
`perf top`	Show live hotspots
`perf annotate`	Inspect hot source lines or assembly

Listing Available Events

To see all events supported by your system:

perf list

This usually includes:

Hardware events
Software events
Cache events
Tracepoints
Architecture-specific PMU events

Example:

perf list | less

Using `perf stat`

perf stat provides summary statistics for a command.

Example:

perf stat ls

Common metrics include:

cycles
instructions
cache-references
cache-misses
branches
branch-misses
task-clock
elapsed time

Useful patterns:

# Measure a program
perf stat ./my_program [args...]

# Measure selected events
perf stat -e cycles,instructions,cache-misses,branch-misses ./my_program [args...]

# Collect system-wide statistics for 5 seconds
perf stat -a sleep 5

If your benchmark takes command-line parameters, place them after ./my_program. For example, ./my_program <input_size> <num_threads>.

Using `perf record`

perf record collects sampling data while a program runs.

Example:

perf record ./my_program [args...]

This creates a file named perf.data.

More useful variants:

# Record call graphs
perf record -g ./my_program [args...]

# Record a specific event
perf record -e cycles -g ./my_program [args...]

Call graphs are especially useful for understanding which call paths consume the most time.

Using `perf report`

After recording data, use:

perf report

This opens an interactive report that shows:

Hot functions
Symbol names
Shared libraries
Sample percentages
Call relationships

Typical workflow:

perf record -g ./my_program [args...]
perf report

Profiling an Existing Process

If a process is already running, you can attach perf to it.

# Collect statistics from a running process
perf stat -p <pid> sleep 5

# Record samples from a running process
perf record -p <pid> -g sleep 10

Then view the results:

perf report

Important Events

Some frequently used events are:

Event	Meaning
`cycles`	Total CPU cycles
`instructions`	Total retired instructions
`cache-references`	Cache accesses
`cache-misses`	Cache misses
`branches`	Branch instructions
`branch-misses`	Branch mispredictions
`context-switches`	Context switch count
`page-faults`	Page fault count
`task-clock`	CPU time consumed

Example:

perf stat -e cycles,instructions,branches,branch-misses ./my_program [args...]

Cache Analysis

To study cache behavior:

perf stat -e cache-references,cache-misses ./my_program [args...]

A high cache miss rate may indicate:

Poor locality
Inefficient data layout
Large working sets
Memory-bound behavior

For more detailed analysis, use architecture-specific cache events from perf list.

Branch Prediction Analysis

To analyze branch behavior:

perf stat -e branches,branch-misses ./my_program [args...]

A high branch miss rate may suggest:

Unpredictable control flow
Many conditional branches
Poor branch locality

This is especially useful in performance-sensitive and architecture-focused work.

Source and Assembly Annotation

To inspect which instructions or source lines are hot:

perf annotate

This can help you understand:

Which instructions consume the most samples
Whether the compiler generated efficient code
Where bottlenecks appear in the instruction stream

You can also annotate a specific function:

perf annotate <function_name>

Typical Workflow

1. Get overall statistics

perf stat ./my_program [args...]

The result would be like:

2. Measure key architecture-related metrics

perf stat -e cycles,instructions,branch-misses ./my_program [args...]

3. Record detailed samples

perf record -g -F 999 ./my_program [args...]

If perf reports limited access to kernel symbols or kernel profiling data here, that is expected on our server. We can continue and focus on user-space samples from the benchmark itself.

4. View hotspots

perf report

5. Drill down into a specific function

We can use perf annotate to see the details of a certain function we are interested as long as we provide its symbol. But due to the possibility of function overload, its symbol could be so long. We can use the following command to find the complete symbol name firstly:

nm -C ./single_bench | grep <funcName(>

Here the complete name is csr_spmm(CSRMatrix const&, std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> >&), then we have:

perf annotate --stdio --source -l -s "csr_spmm(CSRMatrix const&, std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> >&)"

And it shows the location of hottest source code and corresponding assembly code:

Useful Tips

Compile with -g -fno-omit-frame-pointer for better symbols
Use realistic workloads
Run tests multiple times to reduce noise
Minimize background activity
Use architecture-specific events when needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Practical Guide to Using `perf` on Linux

What `perf` Can Help You Study

Environment on Our Server

Basic Syntax

Common Subcommands

Listing Available Events

Using `perf stat`

Using `perf record`

Using `perf report`

Profiling an Existing Process

Important Events

Cache Analysis

Branch Prediction Analysis

Source and Assembly Annotation

Typical Workflow

1. Get overall statistics

2. Measure key architecture-related metrics

3. Record detailed samples

4. View hotspots

5. Drill down into a specific function

Useful Tips

FilesExpand file tree

perfUsage.md

Latest commit

History

perfUsage.md

File metadata and controls

A Practical Guide to Using perf on Linux

What perf Can Help You Study

Environment on Our Server

Basic Syntax

Common Subcommands

Listing Available Events

Using perf stat

Using perf record

Using perf report

Profiling an Existing Process

Important Events

Cache Analysis

Branch Prediction Analysis

Source and Assembly Annotation

Typical Workflow

1. Get overall statistics

2. Measure key architecture-related metrics

3. Record detailed samples

4. View hotspots

5. Drill down into a specific function

Useful Tips

A Practical Guide to Using `perf` on Linux

What `perf` Can Help You Study

Using `perf stat`

Using `perf record`

Using `perf report`