kaloffl.github.io/0010.html at main · Kaloffl/kaloffl.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="alternate" type="application/atom+xml"  title="Atom Feed" href="feed.xml">
    <link rel="stylesheet" type="text/css" href="style.css">
    <title>But not that volatile - Lars' Website</title>
  </head>

  <body>

    <header>
      <nav>
        <h1><a href="index.html">Lars' website</a></h1>
        <a href="archive.html" title="archive">Archive</a>
        <a href="feed.xml" title="feed">Feed</a>
      </nav>
    </header>

    <article>
      <header>
        <h2>But not that volatile</h2>
        <div>First published: <time>2022-08-27</time></div>
      </header>

      <section>
        <p>
A recent <a target="_blank" href="https://nitter.net/AllenWebster4th/status/1562894505174790144">discussion on twitter</a> inspired me to look through the uses of <code>volatile</code> in my code.
And while a search for this keyword in my desktop code didn't turn up anything meaningful, there were quite a few hits in code for embedded micro-controllers.
After looking through this code I came to the conclusion that even there the use of this keyword is not ideal.
In this post I'll pick out one example to show why.
        </p>
      </section>

      <section>
        <h3>Why use <code>volatile</code> in the first place?</h3>
        <p>
In the uses I found, the <code>volatile</code> keyword is most often placed as a type qualifier in front of global variables.
The goal is to force the compiler to never cache these variables in registers and re-read them from memory whenever they're used.
This might seem very inefficient but has a good reason:
in code that makes use of interrupts you have to life in constant fear that at any point during the execution, the program might stop whatever it is doing, jump to a a different function (the ISR), modify variables, and then return to what it was doing before.
Of course we know what variables might get modified, since we can look into the ISR, so really we just have to be careful when interacting with those variables.
And one easy solution is to tell the compiler to always read the freshest state of those variables from memory, problem solved!
Keep in mind though that volatile doesn't make those reads atomic, so if you're on a 8-bit processor and read a 16-bit value you'll need additional protection.
        </p>
      </section>

      <section>
        <h3>The example</h3>
        <p>
The example I've chosen is from a sensor that measures rotational speed and direction via a <a target="_blank" href="https://en.wikipedia.org/wiki/Incremental_encoder#Quadrature_outputs">quadrature encoder</a>.
The encoder is connected to a 8-bit micro-controller where it triggers an interrupt whenever it detects a certain fixed increment of rotation.
This means, the faster the rotation, the more interrupts we get per second, until at some point the micro-controller can't keep up and misses interrupts.
So it is important to figure out the highest reasonable rotational speed and design around that.
The interrupt increments a counter, the value of which we want to send out roughly every 10ms over some bus to another chip.
With the highest possible speed the counter would get thousands of increments in that time, so a 16-bit variable would be necessary.
To keep the sensor-ISR as fast as possible, we implemented a 2-step approach: the sensor-ISR uses a single byte counter, and a timer-ISR periodically comes by, adds the count to a 16-bit variable and clears the counter.
To deal with the atomicity problem mentioned above, a ping-pong-buffer is used: one buffer for reading one for writing, switching around every 10ms.
Here is the relevant code, you can also check it out in the <a target="_blank" href="https://godbolt.org/z/a4b5sG5EM">Compiler Explorer</a>:
        </p>
        <code>
          <pre>
<span>#include &lt;avr/io.h&gt;</span>
<span>#include &lt;avr/interrupt.h&gt;</span>
<span></span>
<span>// Global variables</span>
<span>volatile int16_t ticks_accumulator[2] = {};</span>
<span>volatile int8_t  ticks = {};</span>
<span>volatile uint8_t timer_counter = 0;</span>
<span>volatile uint8_t send_ticks = 0;</span>
<span></span>
<span>// Pin interrupt triggered by external sensor</span>
<span>ISR(INT0_vect) {</span>
<span>    int8_t sign = -1;</span>
<span>    if ((PIND >> PD0) & 1) sign = -sign;</span>
<span>    if ((PIND >> PD1) & 1) sign = -sign;</span>
<span>    ticks += sign;</span>
<span>}</span>
<span></span>
<span>// Timer interrupt triggered every 0.624ms</span>
<span>ISR(TIMER2_COMP_vect) {</span>
<span>    uint8_t index = (timer_counter & 0x10) ? 1 : 0;</span>
<span>    ticks_accumulator[index] += ticks;</span>
<span>    ticks = 0;</span>
<span>    </span>
<span>    // every 16 times (9.984ms)</span>
<span>    if ((timer_counter & 0x0F) == 0x0F) {</span>
<span>        send_ticks = 1;</span>
<span>    }</span>
<span>    ++timer_counter;</span>
<span>}</span>
<span></span>
<span>// unimportant</span>
<span>void setup();</span>
<span>void send(int16_t *data);</span>
<span></span>
<span>// Main entry point into code</span>
<span>int main() {</span>
<span>    setup();</span>
<span>    while (1) {</span>
<span>        if (send_ticks) {</span>
<span>            int16_t values;</span>
<span>            send_ticks = 0;</span>
<span></span>
<span>            uint8_t index = (timer_counter & 0x10) ? 1 : 0;</span>
<span>            values = ticks_accumulator[index];</span>
<span>            ticks_accumulator[index] = 0;</span>
<span>            send(&values);</span>
<span>        }</span>
<span>    }</span>
<span>}</span>
          </pre>
        </code>
        <p>
As a side-note: operations like <code>+=</code> and <code>++</code> on <code>volatile</code> variables have been deprecated in C++20, hence the warnings in Compiler Explorer.
        </p>
        <p>
A good example why putting <code>volatile</code> directly on the variable is not ideal can be seen with <code>timer_counter</code> in the timer ISR.
Here is the assembly:
        </p>
        <code>
          <pre>
__vector_9:                     // ISR(TIMER2_COMP_vect) {
        in   r18,   __RAMPZ__
        push r18
        push r24
        push r25
        push r30
        push r31
        <b>lds  r24,   timer_counter</b>
        lds  r18,   ticks
        bst  r24,   4              // timer_counter & 0x10
        clr  r30
        bld  r30,   0              // ? 1 : 0
        ldi  r31,   0
        lsl  r30
        rol  r31
        subi r30,   lo8(-(ticks_accumulator)) //
        sbci r31,   hi8(-(ticks_accumulator)) // ticks_accumulator[index]
        ld   r24,   Z
        ldd  r25,   Z+1
        add  r24,   r18           //
        adc  r25,   __zero_reg__  // +=
        sbrc r18,   7
        dec  r25
        std  Z+1,   r25
        st   Z,     r24
        sts  ticks, __zero_reg__  // ticks = 0;
        <b>lds  r24,   timer_counter</b>  //
        andi r24,   lo8(15)        // if ((timer_counter & 0x0F) == 0x0F) {
        cpi  r24,   lo8(15)        //
        breq .L7
.L6:
        <b>lds  r24,           timer_counter</b>  //
        subi r24,           lo8(-(1))      // ++timer_counter;
        sts  timer_counter, r24            //
        pop  r31
        pop  r30
        pop  r25
        pop  r24
        pop  r18
        out  __RAMPZ__,     r18
        reti
.L7:
        ldi  r24,        lo8(1) //
        sts  send_ticks, r24    // send_ticks = 1;
        rjmp .L6
          </pre>
        </code>
        <p>
Great, for some reason the compiler refuses to use the <code>bst</code> instruction to directly write a 0 or 2 byte offset, instead it writes 0 or 1 and then does a 16-bit shift.
Compilers love leaving jokes like these around the machine-code, so you should definitely read through it from time to time to have a laugh!
        </p>
        <p>
But back on topic.
What I actually wanted to discuss is the fact that this code loads <code>timer_counter</code> three times.
No other function in this program modifies this variable, so it can only be changed here at the end of the interrupt.
But since the variable itself is marked as <code>volatile</code>, the compiler dutifully generated loads for every place the variable is used.
        </p>
        <p>
The reason the variable was marked as <code>volatile</code> in the first place is to make sure that <code>main</code> doesn't cache it.
But the heavy-handed <code>volatile</code> applies everywhere, causing unnecessary overhead.
The only place that such a catch-all <code>volatile</code> makes sense is for variables that are changed not by code but by the hardware itself:
for example the <code>PIND</code> register used in the sensor interrupt.
This register changes depending on the voltage on a hardware pin, which truly could change at any time during execution.
        </p>
      </section>
      <section>
        <h3>The alternative</h3>
        <p>
So what we really want is not a volatile type, but a way to force the compiler to generate load instructions.
A way to achieve this is to cast the variable in question to be volatile in those places where we want to guarantee that it is loaded from memory.
All we need is a simple macro:
        </p>
        <code>
          <pre>
#define force_load(x) (*(volatile __typeof(x)*)&(x))
          </pre>
        </code>
        <p>
It takes the address of the thing we want to read, casts it to be volatile and then dereferences it.
Simple as that.
        </p>
        <p>
And with that we can remove the volatile declarations from the global variables and instead use <code>force_load</code> in all the places that need it.
        </p>
        <code>
          <pre>
<span>#include &lt;avr/io.h&gt;</span>
<span>#include &lt;avr/interrupt.h&gt;</span>
<span></span>
<span>#define force_load(x) (*(volatile __typeof(x)*)&(x))</span>
<span></span>
<span>// Global variables</span>
<span>int16_t ticks_accumulator[2] = {};</span>
<span>int8_t  ticks = {};</span>
<span>uint8_t timer_counter = 0;</span>
<span>uint8_t send_ticks = 0;</span>
<span></span>
<span>// Pin interrupt triggered by external sensor</span>
<span>ISR(INT0_vect) {</span>
<span>    int8_t sign = -1;</span>
<span>    if ((PIND >> PD0) & 1) sign = -sign;</span>
<span>    if ((PIND >> PD1) & 1) sign = -sign;</span>
<span>    ticks += sign;</span>
<span>}</span>
<span></span>
<span>// Timer interrupt triggered every 0.624ms</span>
<span>ISR(TIMER2_COMP_vect) {</span>
<span>    uint8_t index = (timer_counter & 0x10) ? 1 : 0;</span>
<span>    ticks_accumulator[index] += ticks;</span>
<span>    ticks = 0;</span>
<span>    </span>
<span>    // every 16 times (9.984ms)</span>
<span>    if ((timer_counter & 0x0F) == 0x0F) {</span>
<span>        send_ticks = 1;</span>
<span>    }</span>
<span>    timer_counter += 1;</span>
<span>}</span>
<span></span>
<span>// unimportant</span>
<span>void setup();</span>
<span>void send(int16_t *data);</span>
<span></span>
<span>// Main entry point into code</span>
<span>int main() {</span>
<span>    setup();</span>
<span>    while (1) {</span>
<span>        if (force_load(send_ticks)) {</span>
<span>            int16_t values;</span>
<span>            send_ticks = 0;</span>
<span></span>
<span>            uint8_t index = (force_load(timer_counter) & 0x10) ? 1 : 0;</span>
<span>            values = force_load(ticks_accumulator[index]);</span>
<span>            ticks_accumulator[index] = 0;</span>
<span>            send(&values);</span>
<span>        }</span>
<span>    }</span>
<span>}</span>
          </pre>
        </code>
        <p>
You might have noticed, that only the <code>main</code> function makes use of this new macro.
The reason is, that the interrupts have no way to cache variables long-term, so they have to reload the values every time they are called anyways.
And since this chip doesn't support nested interrupts, there is no way that for example the sensor ISR could be called while the timer ISR is running.
        </p>
        <p>
And to prove that the use of this macro is necessary, here is part of the main function with and without the macro:
        </p>
        <code>
          <pre>

main:                                       │  main:
    ...                                     │          ...
    ldi r16,lo8(send_ticks)                 │  .L10:
    ldi r17,hi8(send_ticks)                 │          lds r24,send_ticks
    ...                                     │  .L9:
.L9:                                        │          tst r24
    movw r30,r16                            │          breq .L9
    ld r24,Z      /* Z = ((r31&lt;&lt;8)|r30) */  │          sts send_ticks,__zero_reg__
    tst r24                                 │          ...
    breq .L9                                │          rjmp .L10
    sts send_ticks,__zero_reg__             │
    ...                                     │
    rjmp .L9                                │
          </pre>
        </code>
        <p>
Yup, without the forced load the compiler decided that if the variable is 0 once, it'll always be 0, no need to check again (though the <code>tst</code> instruction is repeated every time).
        </p>
        <p>
Oh and one last thing: the timer ISR that we looked at earlier now indeed keeps the value of <code>timer_counter</code> in a register after loading it just once.
As a consequence though, one more register is used in the ISR, which meas one more <code>push</code> and <code>pop</code> pair, which are just as expensive as the memory loads.
In fact, the function is now one instruction longer.
If it is slower or not I can't say for sure, for that I would have to either measure it on real hardware or at least disassemble the actual binary after the linker got to play with it.
Instead I'll leave you with the final verdict about volatile:
It's coarse and rough and irritating and it gets everywhere.
        </p>
      </section>

      <section>
        <h2>Change-log</h2>
        <ul>
          <li>2022-08-27 - Initial post.</li>
        </ul>
      </section>

    </article>

    <footer>
      <a href="changelog.html">Change-log</a>
    </footer>
  </body>
</html>