visionbook/part_foundation_image_processing.qmd at main · Invinsible-Coder/visionbook · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Foundations of Image Processing {#part-image_processing}

Many of the techniques in computer vision are rooted in signal processing. The goal of this collection of chapters is to introduce the most important signal processing concepts and tools that any computer vision researcher should have in their toolbox. We will present those concepts from the perspective of images while keeping in mind that the goal of computer vision is to build representations that are useful for downstream tasks. We do not assume any prior knowledge about signal processing.

## Outline

- **Chapter @sec-linear_image_filtering** introduces signal processing, linear image filtering, and convolutions.

- **Chapter @sec-fourier_analysis** describes the Fourier transform and some applications.

## Notation

- Images: We will use the $\ell$ symbol to denote images (i.e., $\ell$ight), but also arbitrary signals that correspond to physical quantities (such as sounds).

- Discrete signals: We will index specific values using $\ell \left[ n\right]$ or $\ell \left[ n, m \right]$ for 1D and 2D discrete signals/images, where $n$ and $m$ are discrete spatial indices. We will use bold fonts when signals are treated as vectors: ${\boldsymbol\ell}$. Vectors will be column vectors, and their transpose is ${\boldsymbol\ell}^T$.

- Continuous signals: We will use parentheses for continuous signals, $\ell(t)$, where $t$ is a continuous variable.

- Functions: We will mostly use $f$ to denote functions. We will write inputs and outputs as $\ell_{\text{in}}$ and $\ell_{\text{out}}$, analogously to the notation used in the neural network chapters, resulting in the notation $\ell_{\text{out}} = f \left( \ell_{\text{in}} \right)$.

- Convolution kernels: $h\left[ n\right]$ or $h \left[ n, m \right]$.

- Discrete Fourier transforms: We will use capital letters: ${\mathcal{L}} \left[ u\right]$, where $u$ is a discrete frequency index of the discrete Fourier transform of $\ell \left[ n \right]$.

- Convolution operator, $\circ$, and cross-correlation operator, $\star$, as in $\ell_1 \left[ n\right] \circ \ell_2 \left[ n\right]$.