From 10bdd53531b8394ea2765d0740bf24349ec69958 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 12:51:02 +0100 Subject: [PATCH 1/9] Add missing `portability` mention --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3f90771..2ae6797 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ The Open Catalog includes designed to demonstrate: - No performance degradation when implementing the correctness, modernization, - and security recommendations. + security, and portability recommendations. - Potential performance enhancements achievable through the optimization recommendations. From 0587e7800b3aa1a29e0b42c2258fe995b7ea8f2c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 13:09:27 +0100 Subject: [PATCH 2/9] Fix grammar mistakes --- Checks/PWD006/README.md | 8 ++++---- Checks/PWR009/README.md | 2 +- Checks/PWR019/README.md | 2 +- Checks/PWR020/README.md | 4 ++-- Checks/PWR021/README.md | 2 +- Checks/PWR031/README.md | 2 +- Checks/PWR048/README.md | 4 ++-- Checks/PWR050/README.md | 2 +- Checks/PWR051/README.md | 2 +- Checks/PWR052/README.md | 2 +- Checks/PWR055/README.md | 2 +- Checks/PWR056/README.md | 2 +- Checks/PWR057/README.md | 2 +- Checks/PWR060/README.md | 4 ++-- Checks/PWR068/README.md | 2 +- Checks/PWR069/README.md | 2 +- Checks/PWR070/README.md | 2 +- Checks/PWR075/README.md | 2 +- Checks/PWR080/README.md | 2 +- Deprecated/PWR033/README.md | 2 +- Glossary/Locality-of-reference.md | 2 +- Glossary/Loop-tiling.md | 2 +- Glossary/Memory-access-pattern.md | 11 ++++++----- Glossary/Multithreading.md | 4 +++- .../Recurrence.md | 2 +- 25 files changed, 38 insertions(+), 35 deletions(-) diff --git a/Checks/PWD006/README.md b/Checks/PWD006/README.md index 74e74f2..b69666e 100644 --- a/Checks/PWD006/README.md +++ b/Checks/PWD006/README.md @@ -13,9 +13,9 @@ memory segments are copied to the memory of the accelerator device. ### Relevance -The data of non-scalar variables might be spread across memory, laid out in non- -contiguous regions. One classical example is a dynamically-allocated two- -dimensional array in C/C++, which consists of a contiguous array of pointers +The data of non-scalar variables might be spread across memory, laid out in +non-contiguous regions. One classical example is a dynamically-allocated +two-dimensional array in C/C++, which consists of a contiguous array of pointers pointing to separate contiguous arrays that contain the actual data. Note that the elements of each individual array are contiguous in memory but the different arrays are scattered in the memory. This also holds for dynamically-allocated @@ -85,7 +85,7 @@ void foo(int **A) { } ``` -The *enter/exit data* statements ressemble how the dynamic bi-dimensional memory +The *enter/exit data* statements resemble how the dynamic bi-dimensional memory is allocated in the CPU. An array of pointers is allocated first, followed by the allocation of all the separate arrays that contain the actual data. Each allocation constitutes a contiguous memory segment and must be transferred diff --git a/Checks/PWR009/README.md b/Checks/PWR009/README.md index bda3521..4f881ce 100644 --- a/Checks/PWR009/README.md +++ b/Checks/PWR009/README.md @@ -20,7 +20,7 @@ specific setup in order to better exploit its capabilities. The OpenMP `parallel` construct specifies a parallel region of the code that will be executed by a team of threads. It is normally accompanied by a worksharing construct so that each thread of the team takes care of part of the -work (e.g the `for` construct assigns a subset of the loop iterations to each +work (e.g., the `for` construct assigns a subset of the loop iterations to each thread). This attains a single level of parallelism since all work is distributed across a team of threads. This works well for multi-core CPUs but GPUs are composed of a high number of processing units organized into groups diff --git a/Checks/PWR019/README.md b/Checks/PWR019/README.md index 1ea6cd1..79bed3b 100644 --- a/Checks/PWR019/README.md +++ b/Checks/PWR019/README.md @@ -12,7 +12,7 @@ innermost loop. ### Relevance -Vectorization takes advantage of having as high a trip count (ie. number of +Vectorization takes advantage of having as high a trip count (i.e., number of iterations) as possible. When loops are [perfectly nested](../../Glossary/Perfect-loop-nesting.md) and they can be safely interchanged, making the loop with the highest trip count the innermost should diff --git a/Checks/PWR020/README.md b/Checks/PWR020/README.md index e3fb094..a34e734 100644 --- a/Checks/PWR020/README.md +++ b/Checks/PWR020/README.md @@ -14,7 +14,7 @@ statements in a first loop and the non-vectorizable statements in a second loop. [vectorization](../../Glossary/Vectorization.md) is one of the most important ways to speed up the computation of a loop. In practice, loops may contain a mix of -computations where only a part of the loop body introduces loop-carrie +computations where only a part of the loop body introduces loop-carried dependencies that prevent vectorization. Different types of compute patterns make explicit the loop-carried dependencies present in the loop. On the one hand, the @@ -25,7 +25,7 @@ vectorized: * The [sparse reduction compute pattern](../../Glossary/Patterns-for-performance-optimization/Sparse-reduction.md) - e.g. -the reduction variable has an read-write indirect memory access pattern which +the reduction variable has a read-write indirect memory access pattern which does not allow to determine the dependencies between the loop iterations at compile-time. diff --git a/Checks/PWR021/README.md b/Checks/PWR021/README.md index 2ecc1b5..922ef23 100644 --- a/Checks/PWR021/README.md +++ b/Checks/PWR021/README.md @@ -28,7 +28,7 @@ vectorized: * The [sparse reduction compute pattern](../../Glossary/Patterns-for-performance-optimization/Sparse-reduction.md) - e.g. -the reduction variable has an read-write indirect memory access pattern which +the reduction variable has a read-write indirect memory access pattern which does not allow to determine the dependencies between the loop iterations at compile-time. diff --git a/Checks/PWR031/README.md b/Checks/PWR031/README.md index 6eaabaf..d8ab668 100644 --- a/Checks/PWR031/README.md +++ b/Checks/PWR031/README.md @@ -22,7 +22,7 @@ or square roots. > [!NOTE] > Some compilers under some circumstances (e.g. relaxed IEEE 754 semantics) can > do this optimization automatically. However, doing it manually will guarantee -> best performance across all the compilers. +> the best performance across all the compilers. ### Code example diff --git a/Checks/PWR048/README.md b/Checks/PWR048/README.md index e83cd0a..f1f9029 100644 --- a/Checks/PWR048/README.md +++ b/Checks/PWR048/README.md @@ -34,8 +34,8 @@ __attribute__((const)) double example(double a, double b, double c) { } ``` -In the above example, the expression `a + b * c` is effectively a FMA operation -and it can be replaced with a call to `fma`: +In the above example, the expression `a + b * c` is effectively an FMA +operation and it can be replaced with a call to `fma`: ```c #include diff --git a/Checks/PWR050/README.md b/Checks/PWR050/README.md index 13d0740..2a7c5f7 100644 --- a/Checks/PWR050/README.md +++ b/Checks/PWR050/README.md @@ -19,7 +19,7 @@ multithreaded code is not straightforward. Essentially, the programmer must explicitly specify how to execute the loop in vector mode on the hardware, as well as add the appropriate synchronization to avoid race conditions at runtime. Typically, minimizing the computational overhead of multithreading is the -biggest challenge to speedup the code. +biggest challenge to speed up the code. > [!NOTE] > Executing forall loops using multithreading incurs less overhead than in the diff --git a/Checks/PWR051/README.md b/Checks/PWR051/README.md index 199bdc2..e05e476 100644 --- a/Checks/PWR051/README.md +++ b/Checks/PWR051/README.md @@ -19,7 +19,7 @@ multithreaded code is not straightforward. Essentially, the programmer must explicitly specify how to execute the loop in vector mode on the hardware, as well as add the appropriate synchronization to avoid race conditions at runtime. Typically, minimizing the computational overhead of multithreading is the -biggest challenge to speedup the code. +biggest challenge to speed up the code. > [!NOTE] > Executing scalar reduction loops using multithreading incurs an overhead due to diff --git a/Checks/PWR052/README.md b/Checks/PWR052/README.md index 2cccb71..c4e64e4 100644 --- a/Checks/PWR052/README.md +++ b/Checks/PWR052/README.md @@ -19,7 +19,7 @@ computers, but writing multithreaded code is not straightforward. Essentially, the programmer must explicitly specify how to execute the loop in vector mode on the hardware, as well as add the appropriate synchronization to avoid race conditions at runtime. Typically, minimizing the computational overhead of -multithreading is the biggest challenge to speedup the code. +multithreading is the biggest challenge to speed up the code. > [!NOTE] > Executing sparse reduction loops using multithreading incurs an overhead due to diff --git a/Checks/PWR055/README.md b/Checks/PWR055/README.md index 31db357..b7e3a05 100644 --- a/Checks/PWR055/README.md +++ b/Checks/PWR055/README.md @@ -19,7 +19,7 @@ is not straightforward. Essentially, the programmer must explicitly manage the data transfers between the host and the accelerator, specify how to execute the loop in parallel on the accelerator, as well as add the appropriate synchronization to avoid race conditions at runtime. Typically, minimizing the -computational overhead of offloading is the biggest challenge to speedup the +computational overhead of offloading is the biggest challenge to speed up the code using accelerators. > [!NOTE] diff --git a/Checks/PWR056/README.md b/Checks/PWR056/README.md index 24e7af5..5811c03 100644 --- a/Checks/PWR056/README.md +++ b/Checks/PWR056/README.md @@ -23,7 +23,7 @@ loop in parallel on the accelerator, as well as add the appropriate synchronization to avoid race conditions at runtime. Typically, **minimizing the computational overhead of offloading is the biggest -challenge to speedup the code using accelerators**. +challenge to speed up the code using accelerators**. > [!NOTE] > Offloading scalar reduction loops incurs an overhead due to the synchronization diff --git a/Checks/PWR057/README.md b/Checks/PWR057/README.md index 0878fe9..732db6c 100644 --- a/Checks/PWR057/README.md +++ b/Checks/PWR057/README.md @@ -21,7 +21,7 @@ is not straightforward. Essentially, the programmer must explicitly manage the data transfers between the host and the accelerator, specify how to execute the loop in parallel on the accelerator, as well as add the appropriate synchronization to avoid race conditions at runtime. Typically, minimizing the -computational overhead of offloading is the biggest challenge to speedup the +computational overhead of offloading is the biggest challenge to speed up the code using accelerators. > [!NOTE] diff --git a/Checks/PWR060/README.md b/Checks/PWR060/README.md index 79b6fec..05d398c 100644 --- a/Checks/PWR060/README.md +++ b/Checks/PWR060/README.md @@ -17,8 +17,8 @@ written in the first loop and read in the second loop. Vectorization is one of the most important ways to speed up computation in the loop. In practice, loops may contain vectorizable statements, but vectorization -may be either inhibited or inefficient due to the usage of data stored in non- -consecutive memory locations. Programs exhibit different types of +may be either inhibited or inefficient due to the usage of data stored in +non-consecutive memory locations. Programs exhibit different types of [memory access patterns](../../Glossary/Memory-access-pattern.md) that lead to non-consecutive memory access, e.g. strided, indirect, random accesses. diff --git a/Checks/PWR068/README.md b/Checks/PWR068/README.md index 13c6ea2..9346112 100644 --- a/Checks/PWR068/README.md +++ b/Checks/PWR068/README.md @@ -206,7 +206,7 @@ Factorial of 5 is 120 > [!TIP] > When interoperating between Fortran and C/C++, it's necessary to manually > define explicit interfaces for the C/C++ procedures to call. Although this is -> not a perfect solution, since the are no guarantees that these interfaces +> not a perfect solution, since there are no guarantees that these interfaces > will match the actual C/C++ procedures, it's still best to make the > interfaces as explicit as possible. This includes specifying details such as > argument intents, to help the Fortran compiler catch early as many issues as diff --git a/Checks/PWR069/README.md b/Checks/PWR069/README.md index a7b06f7..86dbffe 100644 --- a/Checks/PWR069/README.md +++ b/Checks/PWR069/README.md @@ -22,7 +22,7 @@ code. In procedures with implicit typing enabled, an `use` without the `only` specification can also easily lead to errors. If the imported module is later expanded with new members, these are automatically imported into the procedure -and might inadvertedly shadow existing and implicitly typed variables, +and might inadvertently shadow existing and implicitly typed variables, potentially leading to difficult-to-diagnose bugs. By leveraging the `only` keyword, the programmer restricts the visibility to diff --git a/Checks/PWR070/README.md b/Checks/PWR070/README.md index 21387e2..b8a1121 100644 --- a/Checks/PWR070/README.md +++ b/Checks/PWR070/README.md @@ -36,7 +36,7 @@ and more efficient: - In general, they lack compile-time checks for consistency between the provided and the expected array. -Aditionally, explicit-shape and assumed-size dummy arguments require contiguous +Additionally, explicit-shape and assumed-size dummy arguments require contiguous memory. This forces the creation of intermediate data copies when working with array slices or strided accesses. In contrast, assumed-shape arrays can handle these scenarios directly, leading to enhanced performance. diff --git a/Checks/PWR075/README.md b/Checks/PWR075/README.md index 0ef60ad..2e08de5 100644 --- a/Checks/PWR075/README.md +++ b/Checks/PWR075/README.md @@ -254,7 +254,7 @@ included in this PWR075 documentation. | Function to extract the imaginary part of a complex number: `IMAGPART` | Use the generic intrinsic function `AIMAG(Z)` that returns the imaginary part of a complex number | | Types to convert values to integers of different precisions: `INT2`, `INT8` | Use the generic intrinsic function `INT(A, KIND)` along with standard kind type parameters (e.g. `C_INT16_T` or `C_INT64_T`) | | Mathematical functions to compute the natural logarithm of the Gamma function: `LGAMMA`, `ALGAMA`, `DLGAMA` | Use the generic intrinsic function `LOG_GAMMA` | -| Function to find the last non-blanck character in a string: `LNBLNK` | Use the generic intrinsic function `LEN_TRIM(STRING [, KIND])` | +| Function to find the last non-blank character in a string: `LNBLNK` | Use the generic intrinsic function `LEN_TRIM(STRING [, KIND])` | | Functions for generating random numbers: `RAND`, `RAN`,`IRAND`, `SRAND` | Use the generic intrinsic to generate pseudorandom numbers: `RANDOM_NUMBER` | | Function to extract the real part of a complex number: `REALPART` | Use the generic intrinsic function `REAL(A [, KIND])` or `DBLE(A)` if double precision is required to obtain the real part | | Function to execute a system command from Fortran: `SYSTEM` | Use the generic intrinsic subroutine `EXECUTE_COMMAND_LINE` | diff --git a/Checks/PWR080/README.md b/Checks/PWR080/README.md index 2efe98f..68a08b2 100644 --- a/Checks/PWR080/README.md +++ b/Checks/PWR080/README.md @@ -11,7 +11,7 @@ undefined behavior due to its indeterminate value. To prevent bugs in the code, ensure the problematic variable is initialized in all possible code paths. It may help to add explicit `else` or `default` branches in control-flow blocks, or even set a default initial value -inmediately after declaring the variable. +immediately after declaring the variable. ### Relevance diff --git a/Deprecated/PWR033/README.md b/Deprecated/PWR033/README.md index 50d374e..aa524d2 100644 --- a/Deprecated/PWR033/README.md +++ b/Deprecated/PWR033/README.md @@ -31,7 +31,7 @@ may be larger but it should also become faster. > This optimization is called [loop unswitching](../../Glossary/Loop-unswitching.md) > and the compilers can do it automatically in simple cases. However, in more > complex cases, the compiler will omit this optimization and therefore it is -> beneficial to do it manually.. +> beneficial to do it manually. ### Code example diff --git a/Glossary/Locality-of-reference.md b/Glossary/Locality-of-reference.md index 662f0d9..24ba42e 100644 --- a/Glossary/Locality-of-reference.md +++ b/Glossary/Locality-of-reference.md @@ -47,7 +47,7 @@ brings performance gain. Writing code that makes efficient use of vectorization is essential to write performant code for modern hardware. For example, loop fission enables splitting -an non-vectorizable loop into two or more loops. The goal of the fission is to +a non-vectorizable loop into two or more loops. The goal of the fission is to isolate the statements preventing the vectorization into a dedicated loop. By doing this, we enable vectorization in the rest of the loop, which can lead to speed improvements. Note loop fission introduces overheads (e.g. loop control diff --git a/Glossary/Loop-tiling.md b/Glossary/Loop-tiling.md index 0e6e8a3..c7dc15d 100644 --- a/Glossary/Loop-tiling.md +++ b/Glossary/Loop-tiling.md @@ -62,7 +62,7 @@ for (int jj = 0; jj < m; jj += TILE_SIZE) { The careful reader might notice that after this intervention, the values for the array `a` will be read `m / TILE_SIZE` times from the memory. If the size of array `a` is large, then it can be useful to perform loop tiling on the loop -over `i` a as well, like this: +over `i` as well, like this: ```c for (int ii = 0; ii < n; ii += TILE_SIZE_I) { diff --git a/Glossary/Memory-access-pattern.md b/Glossary/Memory-access-pattern.md index 85f1398..a41e987 100644 --- a/Glossary/Memory-access-pattern.md +++ b/Glossary/Memory-access-pattern.md @@ -48,12 +48,13 @@ follows: * Access to `d[i]` is constant. It doesn't depend on the value of `j` and it has the same value inside the innermost loop. -* Access to `a[j]` is sequential. Everytime the iterator variable `j` increases by -1, the loop is accessing the next neighboring element. The same applies to the -access to `index[j]`. +* Access to `a[j]` is sequential. Every time the iterator variable `j` +increases by 1, the loop is accessing the next neighboring element. The same +applies to the access to `index[j]`. -* Access to `b[j * n]` is strided. Everytime the iterator variable `j` increases -by 1, the loop is accessing the element of the array `b` increased by `n`. +* Access to `b[j * n]` is strided. Every time the iterator variable `j` +increases by 1, the loop is accessing the element of the array `b` increased by +`n`. * Access to `c[index[j]]` is random. The value accessed when the iterator variable `j` increases its value is not known and it is considered random. diff --git a/Glossary/Multithreading.md b/Glossary/Multithreading.md index 7fa0036..31da756 100644 --- a/Glossary/Multithreading.md +++ b/Glossary/Multithreading.md @@ -38,4 +38,6 @@ The two biggest challenges with multithreading are: 1. [Deciding which data should be thread-private and which should be shared](Variable-scoping-in-the-context-of-OpenMP.md), -2. and thread synchronization and possible data races. Without it the parallelization either doesn't pay off in term of performance or gives the wrong results. +2. and thread synchronization and possible data races. Without it the + parallelization either doesn't pay off in terms of performance or gives the + wrong results. diff --git a/Glossary/Patterns-for-performance-optimization/Recurrence.md b/Glossary/Patterns-for-performance-optimization/Recurrence.md index f89ced5..319bf9b 100644 --- a/Glossary/Patterns-for-performance-optimization/Recurrence.md +++ b/Glossary/Patterns-for-performance-optimization/Recurrence.md @@ -13,7 +13,7 @@ A more formal definition is that a recurrence is a computation `a(s) = e`, where `e` contains a set of occurrences `a(s1), ..., a(sm)` so that, in the general case, the subscripts `s, s1, ..., sm` are different. Note that in the classical sense, a recurrence satisfies the additional constraint that at least one -subscript is symbolically different than `s`, and thus dependencies between +subscript is symbolically different from `s`, and thus dependencies between different loop iterations are introduced. ### Code examples From 49cbed43621cfc1a905c5cd8ab76b1ac771549d9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 12:52:15 +0100 Subject: [PATCH 3/9] Prefer US English spelling for consistency --- Checks/PWD005/README.md | 2 +- Checks/PWR035/README.md | 2 +- Checks/PWR063/README.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Checks/PWD005/README.md b/Checks/PWD005/README.md index ea87c4e..27773cd 100644 --- a/Checks/PWD005/README.md +++ b/Checks/PWD005/README.md @@ -11,7 +11,7 @@ Update the copied array range to match the actual array usage in the code. ### Relevance -Minimising data transfers is one of the main optimization points when offloading +Minimizing data transfers is one of the main optimization points when offloading computations to the GPU. An opportunity for such optimization occurs whenever only part of an array is required in a computation. In such cases, only a part of the array may be transferred to or from the GPU. However, the developer must diff --git a/Checks/PWR035/README.md b/Checks/PWR035/README.md index a32baaf..39443f4 100644 --- a/Checks/PWR035/README.md +++ b/Checks/PWR035/README.md @@ -15,7 +15,7 @@ or changing the data layout to avoid non-consecutive access in hot loops. ### Relevance Accessing an array in a non-consecutive order is less efficient than accessing -consecutive positions because the latter maximises +consecutive positions because the latter maximizes [locality of reference](../../Glossary/Locality-of-reference.md). ### Code example diff --git a/Checks/PWR063/README.md b/Checks/PWR063/README.md index e69e8ce..fd92ef8 100644 --- a/Checks/PWR063/README.md +++ b/Checks/PWR063/README.md @@ -118,7 +118,7 @@ arithmetic `if` statement: ``` Although it is a simple program, using an arithmetic `if` to drive the flow of -the loop makes the behaviour of the program less explicit than modern loop +the loop makes the behavior of the program less explicit than modern loop construct. We may improve the readability, intent, and maintainability of the code if we From 0f6db7d5cc9619c14405d3601700dbb9627e5088 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 13:04:14 +0100 Subject: [PATCH 4/9] Avoid redundancy --- Checks/PWR002/README.md | 6 +++--- Checks/PWR003/README.md | 10 +++++----- Checks/PWR022/README.md | 6 +++--- Checks/PWR040/README.md | 4 ++-- Checks/PWR045/README.md | 4 ++-- Checks/PWR049/README.md | 10 +++++----- Checks/RMK012/README.md | 2 +- Deprecated/PWR033/README.md | 4 ++-- Glossary/Loop-sectioning.md | 2 +- Glossary/Loop-unswitching.md | 4 ++-- Glossary/Multithreading.md | 6 +++--- .../Recurrence.md | 2 +- Glossary/Scalar-to-vector-promotion.md | 2 +- 13 files changed, 31 insertions(+), 31 deletions(-) diff --git a/Checks/PWR002/README.md b/Checks/PWR002/README.md index b2d7f95..d5d9100 100644 --- a/Checks/PWR002/README.md +++ b/Checks/PWR002/README.md @@ -6,7 +6,7 @@ A scalar variable should be declared in the smallest [scope](../../Glossary/Variable-scope.md) possible. In computer programming, the term scope of a variable usually refers to the part of the code where the variable can be used (e.g. a function, a loop). During the execution of a program, a -variable cannot be accessed from outside of its scope. This effectively limits +variable cannot be accessed from outside its scope. This effectively limits the visibility of the variable, which prevents its value from being read or written in other parts of the code. @@ -40,7 +40,7 @@ incompatible purposes, making code testing significantly easier. In the following code, the function `example` declares a variable `t` used in each iteration of the loop to hold a value that is then assigned to the array -`result`. The variable `t` is not used outside of the loop. +`result`. The variable `t` is not used outside the loop. ```c void example() { @@ -96,7 +96,7 @@ code within larger programs by grouping sections together. Conveniently, In the following code, the subroutine `example` declares a variable `t` used in each iteration of the loop to hold a value that is then assigned to the array -`result`. The variable `t` is not used outside of the loop. +`result`. The variable `t` is not used outside the loop. ```fortran subroutine example() diff --git a/Checks/PWR003/README.md b/Checks/PWR003/README.md index b31a4e2..ea1ed81 100644 --- a/Checks/PWR003/README.md +++ b/Checks/PWR003/README.md @@ -77,19 +77,19 @@ int example_impure(int a) { * `const` function: * Depends only on `a` and `b`. If successive calls are made with the same `a` and `b` values, the output will not change. - * Returns a value without modifying any data outside of the function. + * Returns a value without modifying any data outside the function. * `pure` function: * Depends on `c`, a global variable whose value can be modified between successive calls to the function by other parts of the program. Even if successive calls are made with the same `a` value, the output can differ depending on the state of `c`. - * Returns a value without modifying any data outside of the function. + * Returns a value without modifying any data outside the function. * "Normal" function: * Depends on `c`, a global variable. This restricts the function to be `pure`, at most. - * However, the function also modifies `c`, memory outside of its scope, thus + * However, the function also modifies `c`, memory outside its scope, thus leading to a "normal" function. In the case of the `pure` and "normal" functions, it is equivalent that they @@ -129,12 +129,12 @@ end module example_module successive calls to the function by other parts of the program. Even if successive calls are made with the same `a` value, the output can be different depending on the state of `c`. - * Returns a value without modifying any data outside of the function. + * Returns a value without modifying any data outside the function. * "Normal" function: * Depends on `c`, a public variable. This restricts the function to be `pure`, at most. - * However, the function also modifies `c`, memory outside of its scope, thus + * However, the function also modifies `c`, memory outside its scope, thus leading to a "normal" function. >[!WARNING] diff --git a/Checks/PWR022/README.md b/Checks/PWR022/README.md index f0f2a56..009c60a 100644 --- a/Checks/PWR022/README.md +++ b/Checks/PWR022/README.md @@ -3,18 +3,18 @@ ### Issue Conditional evaluates to the same value for all loop iterations and can be -[moved outside of the loop](../../Glossary/Loop-unswitching.md) to favor +[moved outside the loop](../../Glossary/Loop-unswitching.md) to favor [vectorization](../../Glossary/Vectorization.md). ### Actions -Move the invariant conditional outside of the loop by duplicating the loop body. +Move the invariant conditional outside the loop by duplicating the loop body. ### Relevance Classical vectorization requirements do not allow branching inside the loop body, which would mean no `if` and `switch` statements inside the loop body are -allowed. However, loop invariant conditionals can be extracted outside of the +allowed. However, loop invariant conditionals can be extracted outside the loop to facilitate vectorization. Therefore, it is often good to extract invariant conditional statements out of vectorizable loops to increase performance. A conditional whose expression evaluates to the same value for all diff --git a/Checks/PWR040/README.md b/Checks/PWR040/README.md index bb46704..06ea2bd 100644 --- a/Checks/PWR040/README.md +++ b/Checks/PWR040/README.md @@ -19,8 +19,8 @@ for low performance on modern computer systems. Matrices are Iterating over them column-wise (in C) and row-wise (in Fortran) is inefficient, because it uses the memory subsystem suboptimally. -Nested loops that iterate over matrices in an inefficient manner can be -optimized by applying [loop tiling](../../Glossary/Loop-tiling.md). In contrast to +Nested loops that iterate over matrices inefficiently can be optimized by +applying [loop tiling](../../Glossary/Loop-tiling.md). In contrast to [loop interchange](../../Glossary/Loop-interchange.md), loop tiling doesn't remove the inefficient memory access, but instead breaks the problem into smaller subproblems. Smaller subproblems have a much better diff --git a/Checks/PWR045/README.md b/Checks/PWR045/README.md index 94559c2..8283e89 100644 --- a/Checks/PWR045/README.md +++ b/Checks/PWR045/README.md @@ -8,7 +8,7 @@ boost. ### Actions -Calculate the reciprocal outside of the loop and replace the division with +Calculate the reciprocal outside the loop and replace the division with multiplication with a reciprocal ### Relevance @@ -18,7 +18,7 @@ performing the division in each iteration of the loop, one could do the following: * For the expression `A / B`, calculate the reciprocal of the denominator -(`RECIP_B = 1.0 / B`) and put it outside of the loop. +(`RECIP_B = 1.0 / B`) and put it outside the loop. * Replace the expression `A / B`, use `A * RECIP_B`. diff --git a/Checks/PWR049/README.md b/Checks/PWR049/README.md index 6cb5ce4..e361dd0 100644 --- a/Checks/PWR049/README.md +++ b/Checks/PWR049/README.md @@ -2,12 +2,12 @@ ### Issue -A condition that depends only on the iterator variable can be moved outside of -the loop. +A condition that depends only on the iterator variable can be moved outside the +loop. ### Actions -Move iterator-dependent condition outside of the loop. +Move iterator-dependent condition outside the loop. ### Relevance @@ -15,13 +15,13 @@ A condition that depends only on the iterator is predictable: we know exactly at which iteration of the loop it is going to be true. Nevertheless, it is evaluated in each iteration of the loop. -Moving the iterator-dependent condition outside of the loop will result in fewer +Moving the iterator-dependent condition outside the loop will result in fewer instructions executed in the loop. This transformation can occasionally enable vectorization, and for the loops that are already vectorized, it can increase vectorization efficiency. > [!NOTE] -> Moving an iterator-dependent condition outside of the loop is a creative +> Moving an iterator-dependent condition outside the loop is a creative > process. Depending on the type of condition, it can involve loop peeling, > [loop fission](../../Glossary/Loop-fission.md) or loop unrolling. diff --git a/Checks/RMK012/README.md b/Checks/RMK012/README.md index 52236be..9b4ed92 100644 --- a/Checks/RMK012/README.md +++ b/Checks/RMK012/README.md @@ -33,7 +33,7 @@ vectorization. * If the condition in the loop is always evaluated to a loop-invariant value (i.e. its value is either true or false across the execution of the loop), this -condition can be moved outside of the loop (see +condition can be moved outside the loop (see [loop unswitching](../../Glossary/Loop-unswitching.md)). * If the condition in the loop depends on iterator variables only, the conditions diff --git a/Deprecated/PWR033/README.md b/Deprecated/PWR033/README.md index aa524d2..018bce9 100644 --- a/Deprecated/PWR033/README.md +++ b/Deprecated/PWR033/README.md @@ -51,7 +51,7 @@ void example(int addTwo) { In each iteration, the increment statement evaluates the argument to decide how much to increment. However, this value is fixed for the whole execution of the -function and thus, the conditional can be moved outside of the loop. The +function and thus, the conditional can be moved outside the loop. The resulting code is as follows: ```c @@ -87,7 +87,7 @@ end subroutine In each iteration, the increment statement evaluates the argument to decide how much to increment. However, this value is fixed for the whole execution of the -function and thus, the conditional can be moved outside of the loop. The +function and thus, the conditional can be moved outside the loop. The resulting code is as follows: ```fortran diff --git a/Glossary/Loop-sectioning.md b/Glossary/Loop-sectioning.md index 502f8bd..110a859 100644 --- a/Glossary/Loop-sectioning.md +++ b/Glossary/Loop-sectioning.md @@ -5,7 +5,7 @@ efficiency of vectorization by splitting the loop execution into several sections. Instead of iterating from `0` to `N`, the loop iterates in sections which are -smaller in size, e.g. `0` to `S`, from `S` to `2S - 1`, etc. +smaller, e.g. `0` to `S`, from `S` to `2S - 1`, etc. There are two distinct use cases for loop sectioning: diff --git a/Glossary/Loop-unswitching.md b/Glossary/Loop-unswitching.md index 389c9a4..75ae9f8 100644 --- a/Glossary/Loop-unswitching.md +++ b/Glossary/Loop-unswitching.md @@ -2,7 +2,7 @@ **Loop unswitching** is a program optimization technique, where invariant conditions inside loops (i.e. conditions whose value is always the same inside -the loop) can be taken outside of the loop by creating copies of the loop. +the loop) can be taken outside the loop by creating copies of the loop. To illustrate loop unswitching, consider the following example: @@ -22,7 +22,7 @@ in case `a[i]` is negative and we are debugging, we want to log an error. The condition `if (debug)` is loop invariant, since the variable `debug` never changes its value. By doing loop unswitching and moving this condition outside -of the loop, the loop becomes faster. Here is the same loop after loop +the loop, the loop becomes faster. Here is the same loop after loop unswitching: ```c diff --git a/Glossary/Multithreading.md b/Glossary/Multithreading.md index 31da756..841bd16 100644 --- a/Glossary/Multithreading.md +++ b/Glossary/Multithreading.md @@ -8,9 +8,9 @@ to several CPU cores in order to speed up its execution. The crucial underlying concept of multithreading is **thread**. The simplest way to imagine a thread is as an independent worker, which has its own code that it -is executing. Some of the data used by the thread is local to the thread, and -some of it is shared among all threads. An important aspect of multithreading is -that all the threads in principle have access to the same address space. +is executing. Some data used by the thread is local to the thread, and some of +it is shared among all threads. An important aspect of multithreading is that +all the threads in principle have access to the same address space. Although the user can create as many logical threads as they want, for optimum performance the number of threads should correspond to the number of CPU cores. diff --git a/Glossary/Patterns-for-performance-optimization/Recurrence.md b/Glossary/Patterns-for-performance-optimization/Recurrence.md index 319bf9b..81cafe8 100644 --- a/Glossary/Patterns-for-performance-optimization/Recurrence.md +++ b/Glossary/Patterns-for-performance-optimization/Recurrence.md @@ -39,7 +39,7 @@ end do ### Parallelizing recurrences with OpenMP and OpenACC In general, codes containing a recurrence pattern are difficult to parallelize -in an efficient manner, and may even not be parallelizable at all. An example of +efficiently, and may even not be parallelizable at all. An example of parallelizable recurrence is the computation of a cumulative sum, which can be computed efficiently in parallel through parallel prefix sum operations. This is usually known as scan operation and it is supported in OpenMP since version 5.0. diff --git a/Glossary/Scalar-to-vector-promotion.md b/Glossary/Scalar-to-vector-promotion.md index fbf1257..1799898 100644 --- a/Glossary/Scalar-to-vector-promotion.md +++ b/Glossary/Scalar-to-vector-promotion.md @@ -7,7 +7,7 @@ optimization techniques, notably [loop fission](Loop-fission.md). In this technique, a temporary scalar is converted to a vector whose value is preserved between loop iterations, with the goal to enable loop fission needed to extract the statements preventing -optimizations outside of the critical loop. +optimizations outside the critical loop. ### Loop interchange From cd59074e6bbf2521158907f10ffb90bdc2d234bc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 13:11:14 +0100 Subject: [PATCH 5/9] Leverage `,` to improve readability --- Checks/PWD009/README.md | 2 +- Checks/PWR006/README.md | 2 +- Checks/PWR022/README.md | 2 +- Checks/PWR023/README.md | 2 +- Checks/PWR024/README.md | 4 ++-- .../Patterns-for-performance-optimization/Scalar-reduction.md | 2 +- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/Checks/PWD009/README.md b/Checks/PWD009/README.md index 82591a8..0cbd654 100644 --- a/Checks/PWD009/README.md +++ b/Checks/PWD009/README.md @@ -12,7 +12,7 @@ Change the data scope of the variable from private to shared. Specifying an invalid scope for a variable may introduce race conditions and produce incorrect results. For instance, when a variable must be shared among -threads but it is privatized instead. +threads, but it is privatized instead. ### Code example diff --git a/Checks/PWR006/README.md b/Checks/PWR006/README.md index 8ea4c34..a6040c6 100644 --- a/Checks/PWR006/README.md +++ b/Checks/PWR006/README.md @@ -13,7 +13,7 @@ Set the scope of the read-only variable to shared. Since a read-only variable is never written to, it can be safely shared without any risk of race conditions. **Sharing variables is more efficient than -privatizing** them from a memory perspective so it should be favored whenever +privatizing** them from a memory perspective, so it should be favored whenever possible. ### Code example diff --git a/Checks/PWR022/README.md b/Checks/PWR022/README.md index 009c60a..c762194 100644 --- a/Checks/PWR022/README.md +++ b/Checks/PWR022/README.md @@ -25,7 +25,7 @@ it will always be either true or false. > This optimization is called > [loop unswitching](../../Glossary/Loop-unswitching.md) and the compilers can do > it automatically in simple cases. However, in more complex cases, the compiler -> will omit this optimization and therefore it is beneficial to do it manually. +> will omit this optimization and, therefore, it is beneficial to do it manually. ### Code example diff --git a/Checks/PWR023/README.md b/Checks/PWR023/README.md index 18eb1c2..8f68068 100644 --- a/Checks/PWR023/README.md +++ b/Checks/PWR023/README.md @@ -18,7 +18,7 @@ guarantee that the pointers do not alias one another, i.e. no memory address is accessible through two different pointers. The developer can use the `restrict` C keyword to inform the compiler that the specified block of memory is not aliased by any other block. Providing this information can help the compiler -generate more efficient code or vectorize the loop. Therefore it is always +generate more efficient code or vectorize the loop. Therefore, it is always recommended to use `restrict` whenever possible so that the compiler has as much information as possible to perform optimizations such as vectorization. diff --git a/Checks/PWR024/README.md b/Checks/PWR024/README.md index a849023..427f328 100644 --- a/Checks/PWR024/README.md +++ b/Checks/PWR024/README.md @@ -3,8 +3,8 @@ ### Issue The loop is currently not in -[OpenMP canonical](../../Glossary/OpenMP-canonical-form.md) form but it can be made -OpenMP compliant through refactoring. +[OpenMP canonical](../../Glossary/OpenMP-canonical-form.md) form, but it can be +made OpenMP compliant through refactoring. ### Actions diff --git a/Glossary/Patterns-for-performance-optimization/Scalar-reduction.md b/Glossary/Patterns-for-performance-optimization/Scalar-reduction.md index 6a56abc..c2ad5c8 100644 --- a/Glossary/Patterns-for-performance-optimization/Scalar-reduction.md +++ b/Glossary/Patterns-for-performance-optimization/Scalar-reduction.md @@ -35,7 +35,7 @@ end do ### Parallelizing scalar reductions with OpenMP and OpenACC The computation of the scalar reduction has concurrent read-write accesses to -the scalar reduction variable. Therefore a scalar reduction can be computed in +the scalar reduction variable. Therefore, a scalar reduction can be computed in parallel safely only if additional synchronization is inserted in order to avoid race conditions associated to the reduction variable. From cfd4adb1921db62409d9c1a68d745ca968c68c53 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 13:11:34 +0100 Subject: [PATCH 6/9] Capitalize titles --- Checks/PWR034/README.md | 2 +- Deprecated/RMK001/README.md | 2 +- Deprecated/RMK003/README.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Checks/PWR034/README.md b/Checks/PWR034/README.md index 481c797..a92fb64 100644 --- a/Checks/PWR034/README.md +++ b/Checks/PWR034/README.md @@ -1,4 +1,4 @@ -# PWR034: avoid strided array access to improve performance +# PWR034: Avoid strided array access to improve performance ### Issue diff --git a/Deprecated/RMK001/README.md b/Deprecated/RMK001/README.md index d34929f..89affda 100644 --- a/Deprecated/RMK001/README.md +++ b/Deprecated/RMK001/README.md @@ -1,4 +1,4 @@ -# RMK001: loop nesting that might benefit from hybrid parallelization using multithreading and SIMD +# RMK001: Loop nesting that might benefit from hybrid parallelization using multithreading and SIMD > [!WARNING] > This check was deprecated in favor of [PWR050](../../Checks/PWR050/README.md), diff --git a/Deprecated/RMK003/README.md b/Deprecated/RMK003/README.md index 0bebbc2..304bc4f 100644 --- a/Deprecated/RMK003/README.md +++ b/Deprecated/RMK003/README.md @@ -1,4 +1,4 @@ -# RMK003: potential temporary variable for the loop which might be privatizable, thus enabling the loop parallelization +# RMK003: Potential temporary variable for the loop which might be privatizable, thus enabling the loop parallelization > [!WARNING] > This check was deprecated due to the lack of actionable guidance and examples. From 2209a9b4793574ec13781f8668463782f074548a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 12:58:17 +0100 Subject: [PATCH 7/9] Adjust Markdown style markers --- Checks/PWD006/README.md | 2 +- Checks/PWR032/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Checks/PWD006/README.md b/Checks/PWD006/README.md index b69666e..c2b170e 100644 --- a/Checks/PWD006/README.md +++ b/Checks/PWD006/README.md @@ -90,7 +90,7 @@ is allocated in the CPU. An array of pointers is allocated first, followed by the allocation of all the separate arrays that contain the actual data. Each allocation constitutes a contiguous memory segment and must be transferred individually using *enter data*. The deallocation takes place in the inverted -order and the same happens with the *exit *data statements. +order and the same happens with the *exit* data statements. ### Related resources diff --git a/Checks/PWR032/README.md b/Checks/PWR032/README.md index 684630b..ef545f3 100644 --- a/Checks/PWR032/README.md +++ b/Checks/PWR032/README.md @@ -17,7 +17,7 @@ In C, there are several versions of the same mathematical function for different types. For example, the square root function is available for floats, doubles and long doubles through `sqrtf`, `sqrt` and `sqrtl`, respectively. Oftentimes, the developer who is not careful will not use the function matching the data -type. For instance, most developers will just use "sqrt" for any data type, +type. For instance, most developers will just use `sqrt` for any data type, instead of using `sqrtf` when the argument is float. The type mismatch does not cause a compiler error because of the implicit type From a3aa62e24157d086295b07ec631d45c259c5619e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 12:57:24 +0100 Subject: [PATCH 8/9] Remove repeated whitespaces --- Checks/PWD006/README.md | 4 ++-- Checks/PWD007/README.md | 2 +- Checks/PWR005/README.md | 2 +- Checks/PWR012/README.md | 2 +- Checks/PWR042/README.md | 4 ++-- Checks/PWR075/README.md | 2 +- Checks/RMK015/README.md | 2 +- Deprecated/PWR010/README.md | 2 +- Glossary/Locality-of-reference.md | 2 +- Glossary/Patterns-for-performance-optimization/Recurrence.md | 2 +- 10 files changed, 12 insertions(+), 12 deletions(-) diff --git a/Checks/PWD006/README.md b/Checks/PWD006/README.md index c2b170e..0cd2618 100644 --- a/Checks/PWD006/README.md +++ b/Checks/PWD006/README.md @@ -3,7 +3,7 @@ ### Issue The copy of a non-scalar variable to an accelerator device has been requested -but none or only a part of its data will be transferred because it is laid out +but none or only a part of its data will be transferred because it is laid out non-contiguously in memory. ### Actions @@ -25,7 +25,7 @@ In order to offload such non-scalar variables to an accelerator device using OpenMP or OpenACC, it is not enough to add it to a data movement clause. This is known as deep copy and currently is not automatically supported by either OpenMP or OpenACC. To overcome this limitation, all the non-contiguous memory segments -must be explicitly transferred by the programmer. In OpenMP 4.5, this can be +must be explicitly transferred by the programmer. In OpenMP 4.5, this can be achieved through the *enter/exit data* execution statements. Alternatively, the code could be refactored so that it uses variables with contiguous data layouts (eg. flatten an array of arrays). diff --git a/Checks/PWD007/README.md b/Checks/PWD007/README.md index 3be428a..9ee00e8 100644 --- a/Checks/PWD007/README.md +++ b/Checks/PWD007/README.md @@ -12,7 +12,7 @@ Protect the recurrence or execute the code sequentially if that is not possible. ### Relevance The recurrence computation pattern occurs when the same memory position is read -and written to, at least once, in different iterations of a loop. It englobes +and written to, at least once, in different iterations of a loop. It englobes both true dependencies (read-after-write) and anti-dependencies (write-after- read) across loop iterations. Sometimes the term "loop-carried dependencies" is also used. If a loop with a recurrence computation pattern is parallelized diff --git a/Checks/PWR005/README.md b/Checks/PWR005/README.md index 737bed8..0b32d48 100644 --- a/Checks/PWR005/README.md +++ b/Checks/PWR005/README.md @@ -15,7 +15,7 @@ Add `default(none)` to disable default OpenMP scoping. When the scope for a variable is not specified in an [OpenMP](../../Glossary/OpenMP.md) `parallel` directive, a default scope is assigned to it. Even when set explicitly, using a default scope is considered a bad -practice since it can lead to wrong data scopes inadvertently being applied to +practice since it can lead to wrong data scopes inadvertently being applied to variables. Thus, it is recommended to explicitly set the scope for each variable. diff --git a/Checks/PWR012/README.md b/Checks/PWR012/README.md index 211e0c1..01257e1 100644 --- a/Checks/PWR012/README.md +++ b/Checks/PWR012/README.md @@ -24,7 +24,7 @@ variable modifications, and also contributes to improve compiler and static analyzer code coverage. In parallel programming, derived data types are often discouraged when -offloading to the GPU because they may inhibit compiler analyses and +offloading to the GPU because they may inhibit compiler analyses and optimizations due to [pointer aliasing](../../Glossary/Pointer-aliasing.md). Also, it can cause unnecessary data movements impacting performance or incorrect data movements impacting correctness and even crashes impacting code quality. diff --git a/Checks/PWR042/README.md b/Checks/PWR042/README.md index b090369..6a934ab 100644 --- a/Checks/PWR042/README.md +++ b/Checks/PWR042/README.md @@ -104,7 +104,7 @@ first and the third loops are single non-nested loops, so let's focus on the second loop nest as it will have a higher impact on performance. Note that this loop nest is perfectly nested, making loop interchange -applicable. This optimization will turn the `ij` order into `ji`, improving +applicable. This optimization will turn the `ij` order into `ji`, improving the locality of reference: ```c @@ -187,7 +187,7 @@ first and the third loops are single non-nested loops, so let's focus on the second loop nest as it will have a higher impact on performance. Note that this loop nest is perfectly nested, making loop interchange -applicable. This optimization will turn the `ij` order into `ji`, improving +applicable. This optimization will turn the `ij` order into `ji`, improving the locality of reference: ```fortran diff --git a/Checks/PWR075/README.md b/Checks/PWR075/README.md index 2e08de5..ce56e7c 100644 --- a/Checks/PWR075/README.md +++ b/Checks/PWR075/README.md @@ -238,7 +238,7 @@ included in this PWR075 documentation. | Non-standard double precision hyperbolic trigonometric functions: `DACOSH`, `DASINH`, `DATANH` | Use the generic intrinsic procedures: `ACOSH`, `ASINH`, `ATANH` | | Mathematical function to compute the Gamma function for double precision arguments: `DGAMMA` | Use the generic `GAMMA` that also accepts double precision arguments | | Mathematical function for double precision complementary error function: `DERFC` | Use the generic intrinsic function for the complementary error function: `ERFC` | -| Functions for processor time measurements: `DTIME`, `SECOND` | Use the generic intrinsic subroutine `CPU_TIME(TIME)` | +| Functions for processor time measurements: `DTIME`, `SECOND` | Use the generic intrinsic subroutine `CPU_TIME(TIME)` | | Functions to retrieve date and time information: `FDATE`, `IDATE`, `ITIME`, `CTIME`, `LTIME`, `GMTIME` | Use the generic intrinsic subroutine `DATE_AND_TIME([DATE, TIME, ZONE, VALUES])` | | Functions for low-level file input: `FGET`, `FGETC` | Use `READ` or C interoperability | | Functions to indicate integers of different precisions: `FLOATI`, `FLOATJ`, `FLOATK` | Use the generic `REAL(A)` function or `DBLE(A)` function if double precision is required. | diff --git a/Checks/RMK015/README.md b/Checks/RMK015/README.md index 132cc9e..cee6662 100644 --- a/Checks/RMK015/README.md +++ b/Checks/RMK015/README.md @@ -16,7 +16,7 @@ debugging tools. Compilers are designed to **convert source code into efficient executable code for the target hardware**, **reducing the cost of the compilation process** and -**facilitating the debugging  process** by the programmer. Compilers provide +**facilitating the debugging process** by the programmer. Compilers provide optimization flags to improve performance, as well as optimization flags for reducing the size of the executable code. Typical compiler optimization flags for performance are `-O0`, `-O1`, `-O2`, `-O3` and `-Ofast`. On the other hand, diff --git a/Deprecated/PWR010/README.md b/Deprecated/PWR010/README.md index 52feef3..7bbd9cc 100644 --- a/Deprecated/PWR010/README.md +++ b/Deprecated/PWR010/README.md @@ -11,7 +11,7 @@ ### Issue -In the C and C++ programming languages, matrices are stored in a +In the C and C++ programming languages, matrices are stored in a [row-major layout](../../Glossary/Row-major-and-column-major-order.md); thus, iterating the matrix column-wise is non-optimal and should be avoided if possible. diff --git a/Glossary/Locality-of-reference.md b/Glossary/Locality-of-reference.md index 24ba42e..5659ff4 100644 --- a/Glossary/Locality-of-reference.md +++ b/Glossary/Locality-of-reference.md @@ -14,7 +14,7 @@ ways: * **Temporal locality**: If the CPU has accessed a certain memory location, there is a high probability that it will access it again in the near future. Using the -same values in different loop iterations is an example of temporal locality. +same values in different loop iterations is an example of temporal locality. * **Spatial locality**: If the CPU has accessed a certain memory location, there is a high probability that it will access its neighboring locations in the near diff --git a/Glossary/Patterns-for-performance-optimization/Recurrence.md b/Glossary/Patterns-for-performance-optimization/Recurrence.md index 81cafe8..4ade0f6 100644 --- a/Glossary/Patterns-for-performance-optimization/Recurrence.md +++ b/Glossary/Patterns-for-performance-optimization/Recurrence.md @@ -14,7 +14,7 @@ A more formal definition is that a recurrence is a computation `a(s) = e`, where case, the subscripts `s, s1, ..., sm` are different. Note that in the classical sense, a recurrence satisfies the additional constraint that at least one subscript is symbolically different from `s`, and thus dependencies between -different loop iterations are introduced. +different loop iterations are introduced. ### Code examples From f4a61cb73e042fb4e3c66372186c97609bc8afc0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20G=2E=20Dieste?= Date: Fri, 23 Jan 2026 13:13:03 +0100 Subject: [PATCH 9/9] Replace Cyrillic letter --- Checks/PWR042/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Checks/PWR042/README.md b/Checks/PWR042/README.md index 6a934ab..71a7923 100644 --- a/Checks/PWR042/README.md +++ b/Checks/PWR042/README.md @@ -30,7 +30,7 @@ efficient one. In order to perform the loop interchange, the loops need to be [perfectly nested](../../Glossary/Perfect-loop-nesting.md), i.e. all the statements need to be inside the innermost loop. However, due to the initialization of a -reduction variablе, loop interchange is not directly applicable. +reduction variable, loop interchange is not directly applicable. > [!NOTE] > Often, loop interchange enables vectorization of the innermost loop which