forked from Foundations-of-Computer-Vision/visionbook
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathspatial_filter_sets.qmd
More file actions
645 lines (491 loc) · 36.8 KB
/
spatial_filter_sets.qmd
File metadata and controls
645 lines (491 loc) · 36.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
# Filter Banks {#sec-filter_banks}
## Introduction
Although linear filters can perform a wide range of operations as we
have seen in previous chapters, image understanding requires nonlinear
operators. In this chapter we will study how sets of filters can be used
to extract information about images. We will focus on some traditional
families of filters that have been used to build image representations
in the early days of computer vision and that help to understand part of
the representational power of modern deep neural networks.
We will show how simple nonlinearities applied to the output of linear
filters (such as the squaring nonlinearity) can be used to build useful
image representations.
## Gabor Filters
In @sec-fourier_analysis we discussed several useful image representations: representing an image in the frequency domain and decomposing it into amplitude and phase, and also the analysis of image content across different scales and orientations. The Fourier transform is a tool that allows us to extract that information, but only globally. For this information to be useful it needs to be localized. For instance, the analysis of orientations of local image structures can be done using image gradients (@sec-image_derivatives), which are localized in space. In fact, we made use of image derivatives in @sec-simplesystem to recover the three-dimensional (3D) structure of a scene. Here we will discuss another family of filters that has a long history of being used for image analysis: Gabor filters.
@fig-1D_gabor_function (a) shows a one-dimensional (1D) sine function. This function has infinite support. When used as a representation, the Fourier coefficients are obtained as the scalar product between the sine wave and the input image. Each Fourier coefficient involves a point-wise multiplication between the wave and the input image and then the sum of the result. As a consequence of the infinite support of the wave, one single Fourier coefficient does not tell us anything about what happens locally inside the analyzed image.
A good start for a localized image analysis is to restrict the spatial support of a sinusoidal basis function. One function that has a local support (a lot of nice properties as we discussed in @sec-blur_filters) is the Gaussian function (@fig-1D_gabor_function [b]). The product of the Gaussian and the sine wave (or a cosine wave) gives the function with the profile shown in @fig-1D_gabor_function (c) called a Gabor filter. Gabor functions were introduced by Dennis Gabor in 1946 @Gabor1946TheoryOC.
{#fig-1D_gabor_function width="100%"}
Gabor functions, originally proposed in 1D, were made popular as an
operator for image analysis by Goesta Granlund in his paper *In Search
of a General Picture Processing Operator* @GRANLUND1978155.
:::{.column-margin}
Gabor functions are localized in both the spatial and
frequency domain. One remarkable property of the Gabor function is that
it is the optimal function that achieves the best simultaneous
localization in space and frequency.
:::
Gabor functions can be extended to two dimensions by using 2D Gaussians and waves. We can also use complex waves (@sec-fourier_analysis) as they will give us a more general form of the Gabor function.
We can multiply a complex Fourier
basis function, $\exp{ \left(j \left(u_0 x + v_0 y \right) \right)}$,
by a spatially localized 2D Gaussian window,
$\exp{\left(-\frac{x^2 + y^2}{2 \sigma^2} \right) }$ to obtain a Gabor
function, $\psi(x,y)$:
$$\psi(x,y;u_0,v_0) = \frac{1}{2\pi \sigma^2} \exp{\left(-\frac{x^2 + y^2}{2 \sigma^2} \right)} \exp{ \left( j \left(u_0 x + v_0 y \right) \right)}
$${#eq-gaborcomplexfilter}
A Gabor function defined as in equation (@eq-gaborcomplexfilter) is a
complex-valued function of location and frequency. We can get the cosine
and sine waves by looking at the real and imaginary components,
$\psi(x,y;u_0,v_0) = \psi_r(x,y;u_0,v_0) + j \psi_i(x,y;u_0,v_0)$ with
$$\psi_r(x,y;u_0,v_0) = \frac{1}{2\pi \sigma^2} \exp{\left(-\frac{x^2 + y^2}{2 \sigma^2} \right)} \cos{ \left(u_0 x + v_0 y \right) }$$
$$\psi_i(x,y;u_0,v_0) = \frac{1}{2\pi \sigma^2} \exp{\left(-\frac{x^2 + y^2}{2 \sigma^2} \right)} \sin{ \left(u_0 x + v_0 y \right) }$$
Where $\psi_r(x,y;u_0,v_0)$ and $\psi_i(x,y;u_0,v_0)$ are the cosine and
sine phase Gabor filters. @fig-gabors shows the Gaussian window and the
real and imaginary parts of the Gabor function.
{#fig-gabors width="100%"}
The Fourier transform of a complex Gabor function is a Gaussian in the
Fourier domain shifted with respect to the origin:
$$\Psi(u,v; u_0,v_0) = \exp{ \left( - \left((u-u_0)^2 + (v-v_0)^2 \right) \frac{\sigma^2}{2} \right) }$$
:::{.column-margin}
The Gabor filter is an oriented band-pass filter.
:::
@fig-gabor_ft shows the magnitude of the Fourier transform of the Gabor filters. @fig-gabor_ft (a) shows the cosine phase Gabor function, $\psi_r(x,y;u_0,v_0)$, and @fig-gabor_ft (b) shows the sine phase Gabor function, $\psi_i(x,y;u_0,v_0)$. @fig-gabor_ft (c) shows the Fourier transform of the cosine phase Gabor function, $\Psi_r(u,v;u_0,v_0)$, and @fig-gabor_ft (d) shows the Fourier transform of the complex Gabor function, $\Psi(u,v;u_0,v_0)$.
{width=100% #fig-gabor_ft}
@fig-gabor_ex_ft shows how the Gabor function changes when modifying
its parameters (central frequency, $(u_0,v_0)$, and width, $\sigma$).
The value of $\sigma$ sets the locality of the window of analysis and
the values of $(u_0,v_0)$ adjust the orientation of the Gabor function
and frequency. A large $\sigma$ makes the spatial extend large and the
frequency extend small. A small value of $\sigma$ has the opposite
behavior. Analyzing the image with a set of Gabor functions with a large
$\sigma$ is like computing the Fourier transform of the image.
{width=100% #fig-gabor_ex_ft}
The sine phase Gabor function is zero mean. However, the cosine phase
Gabor function is not zero mean. The Gaussian width $\sigma$ has to be
sufficiently large so that the Gabor function behaves like a zero mean
filter. All the previous definitions are given in the continuous domain.
The discrete version of the Gabor function is obtained by sampling the
continuous functions.
One important characteristic of Gabor filters is that they are very
similar to the shape of some cortical receptive fields found in the
mammalian visual system. This provides a hint that we're on the right
track with these filters to build image representations.
:::{.column-margin}
Remember the description of simple and complex cells in the visual system from @sec-challenge_of_vision.
:::
The convolution of the Gabor function with an image, $\ell(x,y)$ results
in an output image that depends on both space, $(x,y)$, and frequency,
$(u,v)$: $$
\ell_{\psi} \left(x,y,u,v \right) = \ell\left(x,y\right) \circ \psi(x,y;u,v)$${#eq-gaborconv}
@fig-gabor_zebra shows the result of the convolution between a picture
and the cosine and sine phase Gabor functions at three different scales
($\sigma = 2,4,8$). In this example, the Gabor filters are tuned to
detect vertical edges. Gabor filters are useful for analyzing line or
edge phase structures in images. But they have many other benefits when
we combine them together in quadrature pairs.
{width=100% #fig-gabor_zebra}
### Quadrature Pairs and the Hilbert Transform
Pairs of filters can be in a relationship to each other that is called
**quadrature phase**. This relationship is useful for many low-level
visual processing tasks. When in quadrature phase, pairs of oriented
filters can measure what is called local oriented energy, can identify
contours, independently of the phase of the contour, and can measure
positional changes very accurately. Let's start with the mathematical
definition of quadrature.
For notational simplicity, we'll describe quadrature pair filters in the
time domain, but they extend naturally to two or more dimensions.
Consider an even symmetric zero-mean signal, $\ell(t)$, and with Fourier
transform $\mathscr{L}(\omega)$, with $\mathscr{L}(0)=0$ because the
signal is zero-mean. In the Fourier domain, two functions,
$\mathscr{L}(\omega)$ and $\mathscr{L}_h(\omega)$, are said to be
Hilbert transform pairs if: $$\mathscr{L}_h(\omega) = \begin{cases}
~ - j \mathscr{L}(\omega) & \quad \text{if } \omega >0 \\
j \mathscr{L}(\omega) & \quad \text{if } \omega <0 \\
\end{cases}$$We now define the complex signal: $$\ell_a(t) = \ell(t) + j \ell_h(t)
$${#eq-analyticsignal}
where $\ell_h(t)$ is the Hilbert transform of
$\ell(t)$. The signal $\ell_a(t)$ is called the **analytic signal** and
its Fourier transform has no negative frequency components.
It is easy to show that
$$\mathscr{L}_a(\omega) = \begin{cases}
~ 2\mathscr{L}(\omega) & \quad \text{if } \omega >0 \\
0 & \quad \text{if } \omega <0 \\
\end{cases}$$
It is interesting to write the complex signal $\ell_a(t)$
in polar form: $$\ell_a(t) = a(t) \exp \left( j \, \theta (t) \right)
$${#eq-analyticsignalpolar} where $a(t)$ is the instantaneous amplitude
(**local amplitude**) and $\theta (t)$ is the instantaneous phase
(**local phase**). This particular representation is common in
communications theory, and it has been used to build image
representations invariant to certain image structures as we will discuss
in the following sections.
The extension of the Hilbert transform to images can be done in several
ways. The most common approach in image processing is to define one
direction, $n$, in the frequency space and then the Hilbert transform is
as follows:
$$
\mathscr{L}_h(\omega_x,\omega_y) = \begin{cases}
~ - j \mathscr{L}(\omega_x,\omega_y) & \quad \text{if } n^T \cdot (\omega_x,\omega_y) > 0 \\
j \mathscr{L}(\omega_x,\omega_y) & \quad \text{if } n^T \cdot (\omega_x,\omega_y) <0 \\
\end{cases}
$$
Two band-pass filters are said to be in quadrature if the impulse
responses are Hilbert transform pairs along some orientation in the
frequency domain. Sine and cosine functions of the same frequency are
Hilbert transform pairs, as are sine and cosine phase Gabor functions,
of the same frequency and Gaussian envelope parameters. Thus, these
filter pairs are also quadrature pairs. When convolving two filters in
quadrature with a signal (or image) the two outputs are also in
quadrature.
The quadrature in the Gabor functions is an approximation that only
holds for $\sigma$ sufficiently large. For small $\sigma$ the filter
does not form a quadrature pair. For large $\sigma$, the real and
imaginary parts of the Gabor filter (cosine and sine phase local
filters) are filters in quadrature with the vector $n=(u_0,v_0)$
pointing in the direction of the central frequency of the Gabor
function.
### Local Amplitude
Let $h(x,y)$ and $q(x,y)$ be band-pass filters in quadrature, and let
$\ell_h(x,y)$ and $\ell_q(x,y)$ be the result of convolving the signal
$\ell(x,y)$ with $h(x,y)$ and $q(x,y)$, respectively. Then, the squared
local amplitude is a measure of the image power within the frequency
bandwidth of the filters in the local neighborhood of $(x,y)$:
$$a^2(x,y) = \ell_h ^2 (x,y) + \ell_q ^2 (x,y)$$ @fig-quad2 shows the steps
to compute the local amplitude of an input image using Gabor filters.
This is like a two-layer convolutional neural network with two channels
in the first layer, a squaring nonlinearity and then followed by a sum
in the second layer. This very simple system can perform a number of
interesting operations.
{width=50% #fig-quad2}
To see how useful the local amplitude is, let's start by computing it
for some simple images. If the input image is a delta,
$\ell(x,y) = \delta(x,y)$ (this is an image with a single bright dot on
it), and the filters $h$ and $q$ are the cosine and sine phase Gabor
filters, then, the local amplitude image is:
$$\begin{split}
a(x,y) & = \sqrt{\ell_h ^2 (x,y) + \ell_q ^2 (x,y)} \\
& = \sqrt{\psi^2_r(x,y;u_0,v_0) + \psi^2_i(x,y;u_0,v_0)} \\
& = \frac{1}{2\pi \sigma^2} \exp{\left(-\frac{x^2 + y^2}{2 \sigma^2} \right)}
\end{split}
$$
The amplitude of the delta function is the Gaussian envelope of the Gabor function. This result is independent of the contrast of the input image. For instance, if the input is $\ell(x,y) = -\delta(x,y)$, the amplitude image does not change (it is **sign invariant**). @fig-quad3 (a) shows an image with two impulses of opposite signs. @fig-quad3 (b) shows the output of the cosine and sine filters, and @fig-quad3 (c) shows the local amplitude $a(x,y)$. The local amplitude is two Gaussians centered on each impulse. Note that the sign of each impulse does not affect the sign or shape of the local amplitude. Therefore, this system is sign invariant.
The sign invariance property is especially useful when using the local amplitude to localize edges in images. @fig-quad3 (d) shows an image composed of several squares. Each square is defined by different polarities with respect to the background. Two of the squares are solid while the other two are only defined by lines. The local amplitude, @fig-quad3 (f), provides a detector for the square boundaries that is invariant to all those changes. The differences between all the squares are encoded in the local phase image as we will discuss in the following section. @fig-gabor_zebra shows the local amplitude signal computed on real images.
{width=100% #fig-quad3}
One property to notice in all these examples, is that, although the
images $\ell_h(x,y)$ and $\ell_q(x,y)$ are band-pass, the amplitude
$a (x,y)$ is a low-pass image.
### Local Phase
The local phase is a measure of angle between the real and imaginary
components (cosine and sine in the case of complex Gabor filters):
$$\theta(x,y) = \angle \left[ \ell_h(x,y) + j \ell_q(x,y) \right]$$
where $\theta (x,y)$ is the instantaneous phase (local phase). This is
another interesting nonlinearity although it is not commonly used in
neural networks. Local image phase has been used to estimate motion
@Fleet89. This information is invariant to local changes of contrast and
it is only sensitive to the local image structure.
For oriented, spatial filters, cycling through the phase of the
quadrature pair of filters can generate motion along the direction of
the phase change @Freeman91b.
### Gabor Filter Bank
Another useful concept is the notion of **filter bank**. A filter banks
is a collection of filters, each tuned to extract a different image
feature, and used to build an image representation.
As shown in @fig-gabor_ft, 2D Gabor filters are selective in spatial
frequency. It is very useful to work with sets of Gabor filters, each
selective to a different spatial frequency so that they cover the full
space of spatial frequencies.
@fig-gabor_rectandpolar_tiles shows two different arrangements of Gabor
filters. @fig-gabor_rectandpolar_tiles (a) shows a set of Gabor filters
sampling the frequency domain using a rectangular grid.
@fig-gabor_rectandpolar_tiles (c) shows the corresponding (cosine) Gabor
kernels. All the functions have the same $\sigma$.
@fig-gabor_rectandpolar_tiles (b) shows a polar arrangement of Gabor
functions, and @fig-gabor_rectandpolar_tiles (d) the spatial kernels.
Here, the Gaussian width $\sigma$ is proportional to the distance
between the central frequency and the origin. This produces filters that
are rotated and scaled versions of each other.
{width=100% #fig-gabor_rectandpolar_tiles}
If we write in detail the convolution from equation (@eq-gaborconv) we
can see that the convolution with a Gabor function is like doing a
Fourier transform of the image after applying to it a Gaussian window:
$$\begin{split}
\ell_{\psi} \left(x,y,u,v \right) & = \iint \ell\left(x',y'\right) \psi(x-x',y-y';u,v) \, dx' dy' = \\
& = \iint \ell\left(x',y'\right) g(x-x',y-y') \exp \left( j \left(u (x-x') + v (y-y') \right) \right) dx' dy' = \\
\end{split}$$ If we extract the term that does not depend on $(x',y')$,
and we also use the symmetry of the Gaussian window, we get the
following expression:
$$\begin{split}
& = \exp \left( j \left(u x + v y \right) \right) \iint \ell\left(x',y'\right) g(x'-x,y'-y) \exp \left(- j \left(u x' + v y' \right) \right) dx' dy'
\end{split}$$
The analogous to the Fourier transform is obtained when we
vary the central frequency of the Gabor function $(u,v)$. This is
usually called the Gabor transform. Note that the Gabor transform is not
self-inverting.
## Steerable Filters and Orientation Analysis
One question that arises with oriented filters is how to change their
orientation. More precisely, given a filter $h(x,y)$, we want to
transform it into a continuous function $h(x,y, \theta)$ of angle
$\theta$. The angle $\theta$ specifies the rotation of the original
filter $h(x,y)$.
In the case of the Gabor filters, equation (@eq-gaborcomplexfilter),
each orientation requires convolving the image with a Gabor function
tuned to that orientation. But do we need to create a new Gabor function
for each orientation, or can we interpolate between a fixed number of
predefined oriented filter outputs? How many orientations need to be
sampled?
:::{.column-margin}
In this section we will study a generalization of the
Nyquist sampling theorem but applied to dimensions different than
space.
:::
We'd like an analog in orientation for the Nyquist sampling theorem in
space: given a certain number of discrete samples in orientation (or
space), can one interpolate between the samples and synthesize what
would have been found from having a filter (or a spatial sample) at some
arbitrary, intermediate orientation? The answer is yes, and the number
of filter samples needed to interpolate depends on the form of the
filter.
In @sec-image_derivatives, we described the simplest example of an oriented filter: a Gaussian derivative. As we discussed in @eq-steerable_derivative_filter, we can synthesize a directional derivative in any direction as a linear combination of derivatives in the horizontal and vertical directions. By the linearity of convolution, that applies to the derivative applied to any filter or image, as well. It can be seen that the **steering equation** for the first derivative of a Gaussian filter is
$$g_{x}(x,y,\theta) = \cos(\theta) g_x(x,y) + \sin(\theta) g_y(x,y)$$
where $g_x$ and $g_y$ are the Gaussian derivatives along $x$ and $y$,
and $g_{x}(x,y,\theta)$ is the derivative along the direction defined by
the angle $\theta$. The interpolation functions are
$k_1(\theta) = \cos(\theta)$ and $k_2(\theta) = \sin(\theta)$.
In fact, all higher order Gaussian derivatives have the same property
but the number of basis filters changes. For instance, for second-order
Gaussian derivatives, the steering equation is: $$\begin{split}
g_{x,x}(x,y,\theta) & = g_{xx}(\cos(\theta) x - \sin (\theta) y, \sin(\theta) x + \cos (\theta)y) = \\
&= \cos^2(\theta) g_{xx}(x,y) + \sin^2(\theta) g_{yy}(x,y) - 2 \cos(\theta) \sin(\theta) g_{xy}(x,y)
\end{split}$$ To interpolate the derivative along any orientation
requires three basis filters and the interpolation functions are
$k_1(\theta) = \cos^2(\theta)$, $k_2(\theta) = \sin^2(\theta)$, and
$k_3(\theta) = -2\cos(\theta)\sin(\theta)$. The minus sign in $k_3$ is
due to the positive direction of $\theta$ to be counter-clockwise.
@fig-steer1 shows examples for the first- and second-order Gaussian
derivatives.
{width=100% #fig-steer1}
This leads to an architecture for processing images with oriented
*steerable* filters shown in @fig-steer1arc. The input images pass
through a set of **basis filters**, then the outputs of those filters
are modulated with a set of **gain maps** (which can be different at
each pixel). Those gain maps adjust the linear combinations of the basis
filters to allow the input filter to be steered to the desired
orientations at each position.
{width=60% #fig-steer1arc}
In the case of oriented Gabor filters it is not possible to reconstruct
exactly any filter orientation by interpolating filter responses. It is
interesting to study the conditions in which interpolation gives exact
results.
### Steering Theorem
Let's make the previous observation more precise and general. How many
basis filters does it take to steer any given filter? You could imagine
that will depend on how sharply oriented the filter is. A circularly
symmetric filter takes just one basis function to synthesize all other
orientation responses, and a very narrow filter will take quite a few.
This is quantified by steering theorems.
Let's consider a filter with impulse response $h(x,y)$. For convenience,
it is better to write the filter response in polar coordinates,
$h(r,\phi)$ The **steering condition** is the requirement that the
rotated filter, $h(r,\phi - \theta)$, be a linear combination of a set
of basis filters that are rotated versions of itself,
$h(r,\phi - \theta_m)$ with $m \in (1,M)$. The steering condition is:
$$h(r,\phi - \theta) = \sum_{m=1}^{M} k_{m}(\theta) h(r,\phi - \theta_m) .
$${#eq-steeringcondition}
If we express the filter to be steered as a Fourier series in angle
(using complex exponentials for notational convenience), we have
$$h(r,\phi) = \sum_{n=-N}^{N} a_n(r) \exp \left( j n \phi \right)
$${#eq-fourierseriesinangle}
Substituting equation (@eq-fourierseriesinangle) into equation
(@eq-steeringcondition), we have an equation for the interpolation
functions, $k_{j}(\theta)$. The steering condition, equation
(@eq-steeringcondition), holds for functions expandable in the form of
equation (@eq-fourierseriesinangle) if and only if the interpolation
functions $k_{j}(\theta)$ are solutions of:
$$\left[
\begin{array}{c}
1 \\
\exp (j \theta) \\
\ldots \\
\exp (j N \theta) \\
\end{array}
\right]
=
\left[
\begin{array}{cccc}
1 & 1 & \ldots & 1 \\
\exp (j \theta_{1}) & \exp (j \theta_{2}) & \ldots & \exp (j \theta_{M}) \\
\vdots & \vdots & \vdots & \vdots \\
\exp (j N \theta_{1}) & \exp (j N \theta_{2}) & \ldots & \exp (j N \theta_{M})
\end{array}
\right]
\left[
\begin{array}{c}
k_{1}(\theta) \\ k_{2}(\theta) \\ \vdots \\ k_{M}(\theta)
\end{array}
\right].
$${#eq-theorembigmatrix}
Let's check this for a simple example. Our derivative of a Gaussian
filter (equation \[@eq-derivate1gauss2dcont\]) is $x$ times a Gaussian.
When changing to polar coordinates $x=r \cos( \theta)$, which gives an
angular distribution times a radially symmetric Gaussian when written in
polar coordinates. This requires two complex exponentials to write (to
create the $\cos (\theta)$ from complex exponentials) and thus requires
two basis functions to steer.
Sometimes its more convenient to think of the filters as polynomials
times radially symmetric window functions (this is the case for
high-order Gaussian derivatives). Then you can show that for an $N$-th
order polynomial with even or odd symmetry $N+1$ basis functions are
sufficient @Freeman91.
Although everything has been derived in the continuous domain,
steerability is a property that still holds after sampling the filter
function. This is because spatial sampling and steerability are
interchangeable: the weighted sum of spatially sampled function is equal
to the spatial sampling of the same weighted sum of continuous basis
functions.
For computational efficiency, it's more convenient to have the basis
filters all be $x-y$ separable functions. In many cases, it's
straightforward to find such basis functions, and where it's not, there
are simple numerical methods to find the best fitting $x-y$ separable
basis set.
### Steerable Quadrature Pairs
As in the case of Gabor filters, it is useful to build quadrature pairs
of steerable filters. Steerable-quadrature filters allow for arbitrary
shifts both in orientation and in phase. We can design such filters. For
instance, let's consider the second-order derivatives of a Gaussian with
$\sigma^2=1/2$ and normalized so that the integral over all the space of
its squared magnitude equals 1: $$\begin{split}
g_{xx}(x,y) &= 0.9213(2x^2-1) \exp \left(-(x^2+y^2) \right) \\
g_{xy}(x,y) &= 1.843 (x y) \exp \left(-(x^2+y^2) \right) \\
g_{yy}(x,y) &= 0.9213(2y^2-1) \exp \left(-(x^2+y^2) \right)
\end{split}$$ It is possible to get a good approximation to its Hilbert
transform using a Gaussian times a third-order odd polynomial. The
approximation to the Hilbert transform of $g_{xx}(x,y)$ is:
$$h_{xx}(x,y) = -0.9780 (-2.254 x+x^3) \exp \left(-(x^2+y^2) \right)$$
where $g_{xx}(x,y)$ and its Hilbert transform $h_{xx}(x,y)$ have the
same spectral content, but the opposite phase. @fig-steerg2h2 analyzes
the quality of the approximation. The figure shows the quadrature pair,
$g_{xx}$ and $h_{xx}$, sampled in space and cropped into a window of
$13\times13$ pixels, and its Fourier transform. The DFT is computed by
zero padding to $256\times256$ pixels. In @fig-steerg2h2 (a) $g_{xx}$ is
the even phase, and in @fig-steerg2h2 (b) $h_{xx}$ is the odd phase.
@fig-steerg2h2 (c) illustrates that the sum of their squares reveals the
square of their Gaussian envelopes. @fig-steerg2h2 (d) illustrates a
section of the magnitude of their Fourier transforms along the $u$ axis
for $v=0$. The $h_{xx}$ is a sampled third-order polynomial
approximation to the Hilbert transform of $g_{xx}$, so their power
spectra may not be exactly the same. The blue line shows the magnitude
of the analytic filter $g_{xx}+jh_{xx}$. It has double amplitude, and
the content for negative frequencies is close to zero.
![Quadrature pair, $g_{xx} \left[n,m \right]$ and $h_{xx} \left[n,m \right]$. The filters are sampled in space and cropped into a window of $13\times13$ pixels.](figures/spatial_filter_sets/steer_quad_aprox_a.png){width=100% #fig-steerg2h2}
To steer $h_{xx}(x,y)$ to an angle $\theta$, this approximation requires
four basis functions, and not just three as for the second-order
derivative of a Gaussian. The other three functions needed are:
$$\begin{split}
h_2(x,y) &= -0.9780(-0.7515+x^2) y \exp \left(-(x^2+y^2) \right) \\
h_3(x,y) &= -0.9780(-0.7515+y^2) x \exp \left(-(x^2+y^2) \right) \\
h_4(x,y) &= -0.9780 (-2.254 y+y^3) \exp \left(-(x^2+y^2) \right)
\end{split}$$
These functions have been optimized in order to be $x-y$ separable. The basis function for steerability are not unique. For instance, figures @fig-steer_quad_basis (a and c) show two basis functions that span all rotations of $g_{xx}$. @fig-steer_quad_basis (a) has separable filters corresponding to $g_{xx}$, $g_{xy}$ and $g_{yy}$, and @fig-steer_quad_basis (c) shows three rotated versions of $g_{xx}$ at 0, 60, and 120 degrees. @fig-steer_quad_basis (c) is a non-separable basis spanning the same space as the filters of @fig-steer_quad_basis (a). Figures @fig-steer_quad_basis (b and d) show two basis functions that span all rotations on $h_{xx}$ spanning the same space as the filters of @fig-steer_quad_basis (b). Spatial scaling of the filters will result in changing $\sigma$.
{width=100% #fig-steer_quad_basis}
Putting it all together, can we compute oriented energy as a function of
angle, for all angles, just from the seven basis filter responses shown
in @fig-steer_quad_basis.
$$E(x,y,\theta) = g_{xx}(x,y,\theta)^2 + h_{xx}(x,y,\theta)^2$$
From the basis filter responses we can form polar plots of the oriented
energy as a function of angle (@fig-multioriflorets). Note some strange
goings on at intersections using the $g_{xx}$, $h_{xx}$ filters. You
might think this is a result of simply not enough angular resolution
from those filters. While it's true that the fourth-order Gaussian
derivatives and their quadrature pairs don't suffer from that problem,
the $g_{xx}$, $h_{xx}$ filters actually do have enough angular
resolution. In this case, the issue is a more subtle one.
{#fig-multioriflorets width=100%}
When there are two oriented structures within the passband of the
quadrature pair filters, the sum of the energies of the individual
structures is not the same as the energy of the sum of the structures.
Because we're squaring to find the energies, the combination of multiple
structures isn't linear. As @fig-multioriflorets (a) shows, when there are
two oriented structures within the passband, when the filter responses
are squared, the convolution in the Fourier domain picks up extra
cross-terms from the one oriented structure interacting with the other,
in addition to the desired term from simply squaring all the frequency
responses individually within the passband. These cross terms show up as
spurious spatial frequencies in the energy term, and we can get rid of
them by spatially low-pass filtering the squared oriented energy
responses. Using the blurred squared basis filter responses, we get much
cleaner oriented energy as a function of angle plots, even with the
$g_{xx}$, $h_{xx}$ filters in the junctions, @fig-multioriflorets (b).
@fig-multioriflorets_examples shows two examples on real images. At
each pixel, the polar plots reveal the local image structure.
{width=100% #fig-multioriflorets_examples}
We can also make steerable filters in three dimensions (3D), allowing us
to analyze medical volumetric data, or to process spatiotemporal volumes
to measure image motion.
## Motion Analysis
Spatiotemporal filter banks are a basic building block to build video
understanding vision systems. In this chapter we will describe some
traditional filters that are similar to filters learned when using
neural networks.
### Spatiotemporal Gabor Filters
Just as we did with Gaussian derivatives, extending Gabor filter for
motion analysis is a direct generalization of the $x-y$ 2D Gabor
function to a $x-y-t$ 3D Gabor function. @fig-spacetimefilts (a) shows a
$x-t$ (cosine and sine) Gabor function in one spatial dimension, and
@fig-spacetimefilts (b) shows a sketch of its Fourier transform. This
function is selective to signals translating to the right with a speed
$v=1$, i.e. $\ell(x-t)$. The red line in @fig-spacetimefilts (b) shows
Dirac line that contains the energy of the moving signal.
In two spatial dimensions, @fig-spacetimefilts (c) shows the sketch of the
Gabor transfer function. Note that the $x-y-t$ Gabor filter is not
selective to velocity. If we have a 2D moving signal
$\ell(x-v_xt, y-v_yt)$, the Fourier transform is contained inside a
Dirac plane. Therefore, there are an infinite number of planes that will
pass by the frequencies of the Gabor filter. All those planes intersect
the red line shown in @fig-spacetimefilts (c). A single Gabor filter
cannot disambiguate the input velocity.
:::{#fig-spacetimefilts}
{width=100% #fig-spacetimefilts}
Space-time Gabor filters. (a) Cosine and sine $x$-$t$ Gabor filter, and (b) the sketch of its transfer function. (c) Sketch of the transfer function of a spatiotemporal Gabor filter in two spatial dimensions ($x$-$y$-$t$). The two planes show examples of spatiotemporal planes that intersect the Gabor filter in the same way.
:::
### Velocity-Tuned Filters {#sec-velocityTunedFilters}
How can we measure input velocity? There are many different approaches in the computer vision community for measuring motion, and we will study them in @sec-motion_estimation. Here we show that it is possible to measure motion even with the simple processing machinery that we've developed so far.
We can use quadrature pairs of oriented filters in space-time to find
motion speed and direction in the video signal. We just need to find the
space-time orientation of strongest response. @fig-spacetimetiles2 (a)
shows a set of Gabor filters sampling the space-time frequency domain.
When the input contains a moving signal, we can use a set of filters to
identify the plane in the Fourier domain that contains the input energy.
@fig-spacetimetiles2 (b) shows the subset of filters that have the
strongest output for a particular input motion.
{width="100%" #fig-spacetimetiles2}
As an illustration, @fig-MT_velocity_tuned shows one possible
architecture to create velocity-selective units. The first layer is
composed by space-time Gabor filters (cosine and sine), which are
frequency-selective units. Here we represent the impulse response of
each filter by a small $x-y-t$ cube. For each quadrature pair we compute
the amplitude. Then amplitude outputs are combined according to form
different planes in the Fourier domain to create velocity-selective
outputs. A normalization layer can be added to normalize the outputs by
dividing every output by the sum of all the amplitudes (not shown). The
full architecture is nonlinear.
Given an input sequence, one can estimate velocity by looking at the
velocity-tuned unit with the strongest response.
{#fig-MT_velocity_tuned}
## Concluding Remarks
In this chapter we have seen the power of using filter banks with some
simple nonlinearities.
Hand-crafted filter banks started as a model of low-level vision
mechanisms in humans, and became the basis to perform many visual tasks
such as texture analysis @RG-Heeger-Bergen95, image segmentation
@Perona91, motion analysis @Heeger92, orientation analysis @Freeman90c,
image denoising @Simoncelli96, and many others.
These approaches had an advantage in that they were based on first
principles and required no training data. However, note that
performances when using hand-crafted architectures were limited. Many of
the architectures we described in this chapter can be seen as precursors
to many of the learning-based architectures we will discuss in
subsequent chapters.
Before we dive into learning-based architectures, we will study
multiscale image pyramids in the following chapter.