forked from Foundations-of-Computer-Vision/visionbook
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcamera_as_linsys.qmd
More file actions
189 lines (127 loc) · 16.8 KB
/
camera_as_linsys.qmd
File metadata and controls
189 lines (127 loc) · 16.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# Cameras as Linear Systems {#sec-cameras-as-linear-systems}
## Introduction
Lens-based and pinhole cameras are special-case devices where the light
falling on the sensors or the photographic film forms an image. For
other imaging systems, such as medical or astronomical systems, the
intensities recorded by the camera may look nothing like an
interpretable image. Because the optical elements of imaging systems are
often linear in their response to light, it is convenient and powerful
to describe such general cameras using linear algebra, which then
provides a powerful machinery to recover an interpretable image from the
recorded data. This formulation also helps to build intuitions about how
cameras work and it will allow us think about new types of cameras.
## Flatland
For the simplicity of visualization, let's consider one dimensional (1D)
imaging systems. As shown in the sensor is 1D and the scene lives in flatland.
{#fig-flatland_camera_linear width="70%"}
Let the light intensities in the world be represented by a column vector, $\boldsymbol\ell_{\texttt{w}}$. The value of the $n$-th component of $\boldsymbol\ell_{\texttt{w}}$ gives the intensity of the light at position $n$, heading in the direction of the camera. There are a few more things to notice in @fig-flatland_camera_linear. The first one is that the axis in the camera sensor is reversed, consistent with @fig-pinholeGeometry. The value $n$ corresponds to image coordinates and the units are pixels. The second is that we are modeling the scene as being a line with varying albedos at a fixed depth from the camera. If the scene had a more complex spatial structure we would have to use the **light field** in order to use the same linear algebra that we will use in this chapter to model cameras. We will use light fields in @sec-nerfs.
::: {.column-margin}
In this chapter we will approximate the input light by a discrete and finite set of light rays instead of a continuous wave. This approximation allow us to use linear algebra to describe the imaging process.
:::
## Cameras as Linear Systems
As shown in @fig-amats1 (a), the sensor is 1D and the scene lives in flatland. Let the sensor measurements be $\boldsymbol\ell_{\texttt{s}}$ and the unknown scene be $\boldsymbol\ell_{\texttt{w}}$. If the camera sensors respond linearly to the light intensity at each sensor, their measurements, $\boldsymbol\ell_{\texttt{s}}$, will be some linear combination, given by the matrix, $\mathbf{A}$, of the light intensities:
$$
\boldsymbol\ell_{\texttt{s}} = \mathbf{A} \boldsymbol\ell_{\texttt{w}}
$$ {#eq-likelihood}
We represent the camera by matrix $\mathbf{A}$. For the case of a pinhole camera, and assuming 13 pixel sensor observations, the camera matrix is just a 13×13 identity matrix, depicted in @fig-amats1 (b), as each sensor depends only on the value of a single scene element. For the case of conventional lens and pinhole cameras, where the observed intensities, $\boldsymbol\ell_{\texttt{s}}$, are an image of the reflected intensities in the scene, $\boldsymbol\ell_{\texttt{w}}$, then $\mathbf{A}$ is approximately an identity matrix. For more general cameras, $\mathbf{A}$ may be very different from an identity matrix, and we will need to estimate $\boldsymbol\ell_{\texttt{w}}$ from $\boldsymbol\ell_{\texttt{s}}$.
In the presence of noise, there may not be a solution $\boldsymbol\ell_{\texttt{w}}$ that exactly satisfies ast squares sense. In most cases, $\mathbf{A}$ is either not invertible, or is poorly conditioned. It is often useful to introduce a regularizr, an additional term in the objective function to. be minimized @Poggio85. If the regularizaion term favors a smallworld $\boldsymbol\ell_{\texttt{world}}$, then the objective term to minimize, $E$, could be
$$E = \lVert \boldsymbol\ell_{\texttt{s}}- \mathbf{A} \boldsymbol\ell_{\texttt{w}}\rVert ^2 + \lambda \lVert \boldsymbol\ell_{\texttt{w}}\rVert ^2
$$ {#eq-posterior}
The regularization parameter, $\lambda$, determines the trade-off between explaining the observations and satisfying the regularization term.
{#fig-amats1}
Setting the derivative of @eq-posterior of the vector $\boldsymbol\ell_{\texttt{w}}$ equal to zero, we have
$$
\begin{align}
0 &= \bigtriangledown_{\boldsymbol\ell_{\texttt{w}}} \lVert \boldsymbol\ell_{\texttt{s}}- \mathbf{A} \boldsymbol\ell_{\texttt{w}}\rVert ^2 + \bigtriangledown_{\boldsymbol\ell_{\texttt{w}}} \lambda \lVert \boldsymbol\ell_{\texttt{w}}\rVert ^2 \\
&= \mathbf{A}^T \mathbf{A} \boldsymbol\ell_{\texttt{w}}- \mathbf{A}^T \boldsymbol\ell_{\texttt{s}}+ \lambda \boldsymbol\ell_{\texttt{w}}
\end{align}
$$ {#eq-deriv}
or
$$\boldsymbol\ell_{\texttt{w}}= (\mathbf{A}^T \mathbf{A} + \lambda \mathbf{I})^{-1}\mathbf{A}^T \boldsymbol\ell_{\texttt{s}}
$$ {#eq-deriv2}
Where the matrix $\mathbf{B}=(\mathbf{A}^T \mathbf{A} + \lambda \mathbf{I})^{-1} \mathbf{A}^T$ is the regularized inverse of the imaging matrix $\mathbf{A}$.
Next, consider the case of a wide aperture pinhole camera, shown in @fig-amats1 (c). If a single pixel in the sensor plane covers exactly two positions of the
scene intensities, then the geometry is as shown in @fig-amats1 (c). @fig-amats1 (d) also shows the imaging matrix, $\mathbf{A}$, its invere, $\mathbf{A}^{-1}$, and the regularized inverse of the imaging matrix, which will usually give image
reconstructions with fewer artifacts as it is less sensitive to noise.
Note that $\mathbf{A}^{-1}$ contains lots of fast changes that will make
the result very sensitive to noise in the observation. The regularized
matrix, $\mathbf{B}$, shown in the bottom row of @fig-amats1 (d), is better behaved.
The following plot @fig-2pixelwidepinhole shows an example of an input 1D signal, ${\boldsymbol\ell_{\texttt{w}}}$, and the output of the 2-pixel wide pinhole camera, ${\boldsymbol\ell_{\texttt{s}}}$.
{#fig-2pixelwidepinhole}
The output is obtained by multiplying the input vector by the matrix $\mathbf{A}$ shown in he results of adding up
two consecutive input values. The output is nearly identical to the input signal but with a larger magnitude and a bit smoother.
## More General Imagers
Many different optical systems can form cameras and the linear analysis described before can be used to characterize the imaging process. Even a simple edge will do. @fig-amats3 shows two non traditional imaging systems that we will analyze in this section.
{#fig-amats3}
### Edge Camera
Consider the example of s to an **edge camera**. This is not a traditional pinhole camera; instead light is blocked only on one side.
For this camera, the imaging matrix and its inverse are as follows:
$$\mathbf{A} =
\left[
\begin{array}{cccccc}
1 & 1 & 1 & 1 & \dots & 1 \\
0 & 1 & 1 & 1 & ~ & 1 \\
0 & 0 & 1 & 1 & ~ & 1 \\
0 & 0 & 0 & 1 & ~ & 1 \\
\vdots & ~ & ~ & ~ & \ddots & ~ \\
0 & 0 & 0 & 0 & ~ & 1
\end{array}
\right],
\mathbf{A}^{-1} =
\left[
\begin{array}{cccccc}
1 & -1 & 0 & 0 & \dots & 0 \\
0 & 1 & -1 & 0 & ~ & 0 \\
0 & 0 & 1 & -1 & ~ & 0 \\
0 & 0 & 0 & 1 & ~ & 0 \\
\vdots & ~ & ~ & ~ & \ddots & ~ \\
0 & 0 & 0 & 0 & ~ & 1
\end{array}
\right]
$$ {#eq-edge}
If we consider the same input as in the pinhole camera example in the previous section, the output signal for the corner camera will look like @fig-1dcornercamera:
{#fig-1dcornercamera}
The output now looks very different than that of a pinhole camera. In the pinhole camera, the output is very similar to the input. This is not the case here where the output looks like the integral of the input
(reversed along the horizontal axis). Note that the first value of $\boldsymbol\ell_{\texttt{s}}$ is equal to the sum of all the values of $\boldsymbol\ell_{\texttt{w}}$:
$$\ell_{\texttt{s}}\left[0 \right] = \sum_{n=0}^{12} \ell_{\texttt{w}}\left[n \right]$$
@fig-amats3 (b) illustrates the imaging matrix, $\mathbf{A}$, and reconstruction matrices for the imager of @eq-edge. You can think of this imager as computing an integral of the input signal. Therefore, its inverse looks like a
derivative. The regularized inverse looks like a blurred derivative (we will talk more about blurred derivatives in @sec-image_derivatives).
### Pinspeck Camera
@fig-amats3 (c) shows another nontraditional imager. Now, instead of a pinhole we have an occluder. The occluder blocks part of the light (complementary to the large-hole pinhole camera shown in @fig-amats3 (c)). What the camera sensor records is the shadow of the occluder. We can write the imaging matrix
$\mathbf{A}$, which corresponds to 1-$\mathbf{A}_{pinhole}$ as shown in @fig-amats1 (d). @fig-amats3 (d) also shows its inverse and the regularized inverse. This camera is
called a **pinspeck camera** @CohenPinspeck, @Torralba2014 and has also been used in practice.
In the example of @fig-amats3 (c), the occluder has the size of the wide pinhole from @fig-amats1 (c). The following plots show @fig-pinspeck_output_plot (left), the input scene (same as in the previous examples), and @fig-pinspeck_output_plot (right), the output recorded by the camera sensor of the pinspeck camera.
{#fig-pinspeck_output_plot}
The output is now a signal with a wide dynamic range (a max value of 14, which correspond to the sum of the values in
$\boldsymbol\ell_{\texttt{w}}$) and with fluctuations due to the shadow of the occluder on the camera sensor. If there was no occluder, then the output would be a constant signal of value 14. In red we show the effect of the shadow, which is the light missing because of the presence of the occluder. You can see how the missing signal is identical to the output of the two-pixel wide pinhole camera but reversed in sign.
### Corner Camera
To show a real-world example of a more general imager, let us consider the **corner camera** @Bouman17. This is similar to the edge camera of @eq-edge and @fig-amats3, but with slightly more complicated geometry. As shown in @fig-ccmodel (a), a vertical edge partially blocks a scene from view, creating intensity variations on the ground, observable by viewing those intensity variations from around the corner.
::: {#fig-ccmodel layout-ncol=2}
{width="100%" #fig-ccmodel-a}
{width="100%" #fig-ccmodel-b}
The corner camera @Bouman17. (a) Objects, such as the cylinders labeled A and B, hidden behind a corner from a camera nonetheless cause very small intensity differences in the ambient illumination falling on the ground. The corner camera @Bouman17. (b) The spatial mask is multiplied by the observed reflected ground image.
:::
In practice, we will subtract a mean image from our observations of the ground plane, so in the rendering equation below, we will only consider components of the scene that may change over time, under the assumption that what we want to image behind the corner (e.g., a person) is moving. We will call these intensities $S(\phi, \theta)$ (S for the subject), where $\phi$ measures vertical inclination and $\theta$ measures azimuthal angle, relative to the position where the vertical edge intersects the ground plane.
Integrating the light intensities falling on the ground plane, the observed intensities on the ground will be $\boldsymbol\ell_{\texttt{ground}}(r, \theta)$, where the polar coordinates $r$ and $\theta$ are measured with respect to the corner. Assuming Lambertian diffuse reflection from the ground plane, we have for the observed intensities $\boldsymbol\ell_{\texttt{ground}}(r, \theta)$:
$$
\boldsymbol\ell_{\texttt{ground}}(r, \theta) = \int_{\phi=0}^{\phi=\pi} \int_{\xi=0}^{\xi=\theta} \cos(\phi) S(\phi, \xi) \, \mathrm{d} \phi \, \mathrm{d} \xi,
$$ {#eq-corner}
where the $\cos(\phi)$ term in the equation follows from the equation for surface reflection discussed in @eq-lambert.
The dependence of the observation, $\boldsymbol\ell_{\texttt{ground}}$, on vertical variations in $S(\phi, \theta)$ is very weak, just through the $\cos(\phi)$ term. We can integrate over $\phi$ first, to form the 1D signal, $\ell_{\texttt{w}}(\xi)$:
$$\ell_{\texttt{w}}(\xi) = \int_{\phi=0}^{\phi=\pi}
\cos(\phi) S(\phi, \xi) \mbox{d} \phi
$${#eq-xcorner}
Then, to a good approximation, @eq-corner has the form,
$$\boldsymbol\ell_{\texttt{ground}}(r, \theta) \approx \int_{\xi=0}^{\xi=\theta}
\ell_{\texttt{w}}(\xi) \mbox{d} \xi,
$${#eq-corner1d} where $\ell_{\texttt{w}}(\xi)$ is a 1D image of
the scene around the corner from the vertical edge.
@fig-ccmodel shows the corner camera geometry for a three-dimensional (3D) scene. @fig-ccmodel (a) shows two objects, such as the cylinders labeled A and B, hidden behind a corner from a camera. Despite being behind the corner, they cause very small intensity differences in the ambient illumination falling on the ground, observable from the camera as a very small change in the light intensity reflecting from the ground. Can we reconstruct the scene hidden behind the corner using just the intensities observed on the ground? The image reconstruction operation is analogous to that for the edge camera of @fig-amats3: we subtract the reflected intensities observed at one orientation angle, say $\theta_A$ from those observed at another, say $\theta_B$.
If we sample @eq-corner1d in its continuous variables, we can write it in the form $\boldsymbol\ell_{\texttt{ground}} = \mathbf{A} \boldsymbol\ell_{\texttt{w}}$. Solving @eq-deriv2 for the multiplier to apply to $\boldsymbol\ell_{\texttt{ground}}$ to estimate $\boldsymbol\ell_{\texttt{w}}$ yields the form shown in @fig-ccmodel (b). @fig-ccmodel (b) shows the spatial mask to be multiplied by the observed reflected ground image, with the result summed over all spatial pixels in order to estimate the light coming from around the corner at one particular orientation angle relative to the corner, in this case approximately 45 degrees. We see that the way to read the 1D signal from the ground plane is to take a derivative with respect to angle. This makes intuitive sense, as the light intensities on the ground integrate all the light up to the angle of the vertical edge. To find the 1D signal at the angle of the edge, we ask, "What does one pie-shaped ray from the wall see that the pie-shaped ray next to it doesn't see?"
{#fig-cctraces width="90%"}
It can be shown @Bouman17 that the image intensities from around-the-corner scenes introduce a perturbation of about $\frac{1}{1,000}$ to the light reflected from the ground from all sources. By averaging image intensities over the appropriate pie-shaped regions on the ground at the corner (@fig-ccmodel (b)), one can extract a 1D image as a function of time from the scene around the corner. @fig-cctraces shows two 1D videos reconstructed from one (@fig-cctraces (a)) and two (@fig-cctraces (b)) people walking around the corner. By processing videos of very faint intensity changes on the ground, we can infer a 1D video of the scene around the corner. The image inversion formulas were derived using inversion methods very similar to @eq-deriv2. The corner camera is just one of the many possible instantiations of a computational imaging system.
## Concluding Remarks
Treating cameras as general linear systems allows for the machinery of
linear algebra to be applied to camera design and processing. We
reviewed several simple camera systems, including cameras utilizing
pinholes, pinspecks, and edges to form images.