Commit ca79be4
committed
Optimize modulo in clock-sweep algorithm
Improve performance by replacing the modulo operator which compiles into
a division instruction that can be slow on certain architectures. When
the size of the clock (NBuffers) is a power-of-two we can simply
bitshift to get the modulo (4 instructions, ~3-4 cycles). When it isn't
we can replace modulo using a 64-bit multiplication by the inverse of
the clock size and a right shift as described in the paper "Division by
Invariant Integers using Multiplication" (4 instructions, ~8-12 cycles).
In both cases the branch prediction should be nearly 100% given that
NBuffers never changes at runtime. In comparison a modulo operation
translates into IDIV and the code would require ~26-90 cycles. Switching
to this invariant method should use common ALU operations that don't
block the pipeline and have better instruction level parallelism.
[1] https://gmplib.org/~tege/divcnst-pldi94.pdf1 parent e9649ad commit ca79be4
1 file changed
Lines changed: 55 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
94 | 96 | | |
95 | 97 | | |
96 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
97 | 126 | | |
98 | 127 | | |
99 | 128 | | |
| |||
110 | 139 | | |
111 | 140 | | |
112 | 141 | | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
113 | 154 | | |
114 | | - | |
| 155 | + | |
115 | 156 | | |
116 | 157 | | |
117 | 158 | | |
| |||
121 | 162 | | |
122 | 163 | | |
123 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
124 | 174 | | |
125 | | - | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
126 | 179 | | |
127 | 180 | | |
128 | 181 | | |
| |||
0 commit comments