Fourier Transform is just one of many different face recognition methods that have been developed over the last 25 years. Comparing to the Machine Learning approach, Fourier Transform is a very simple and fast algorithm. It extracts frequency features of a face, rather than analyzing the image pattern using a convolutional network. The Main idea is to find the most variant frequencies in the image database and identify the faces by matching these frequencies.
The face on the left is used as input in this algorithm. The predicted face is on the right:
The formula of Fourier Transform means that an image of size N x M can be decomposed into frequencies (of various wave length j) in the directions of u or v. u corresponds to the horizontal direction, while v corresponds to the vertical direction. x and y are the measurements along u and v.
Euler's formula just says that each wave length is made of cos and sin waves, written in a complex number form, where cos is in the real part and sin is in the imaginary part.
The math may seem to be complicated, but the two formulas explain a simple concept: an image is composed of various frequencies. Here is the example of Fourier Transform:
| 5 horizontal waves | 10 horizontal waves | 15 diagonal waves |
|---|---|---|
![]() |
![]() |
![]() |
| FFT of 5 horizontal waves | FFT of 10 horizontal waves | FFT of 15 diagonal waves |
![]() |
![]() |
![]() |
The above images show: After Fourier Transform, each frequency is decomposed into 2 white pixels, symmetrical around the origin (0,0). Higher frequencies are farther away from the origin and their direction aligns with the wave direction. Overlaying the above three images and Fourier Transform it, we get this:
| Three waves combined | FFT of Three waves combined |
|---|---|
![]() |
![]() |
Take a face image and pad the image into 128 x 128 pixels. Fourier Transform of the image is on the right:
| Input Sample Image | FFT of Sample |
|---|---|
![]() |
![]() |
However, this is not very helpful. We don't know which frequency in this image is very different from the rest. We need to apply Fourier Transform to the whole image database, and find out where is not most variant frequencies.
| The variance of frequency in image data |
|---|
![]() |
The above plot is the variance of frequency throughout the whole image database. If we decompose this variance plot into "real" and the "imaginary " parts, we get the two plots below. (The real part is the cos wave, and imaginary part is the sin wave, as we explained in the above). We can see something very interesting: Two bright diamonds around the origin! Look closely, the brightest part is the most variant frequencies, distinguishing the face features of among all images. They are the feature frequencies!
| real feature frequency | imaginary feature frequency |
|---|---|
![]() |
![]() |
Set the threshold for both real and imaginary part, we can filter those feature frequencies!
% locate the big variance in both parts of the image data
big_variance_real = real_var_map > 0.7;
big_variance_imag = imag_var_map > 0.05;Plot the 1/4 of the "diamond", which is the fourth lower quadrant. 1 indicates the most variant frequencies throughout the image database. We can use the two matrices as masks to filter out the feature frequencies.
| Location of real feature frequency | Location of imaginary feature frequency |
|---|---|
After deleting these feature frequencies, the input sample loses many details:
| original input | delete feature frequencies |
|---|---|
![]() |
![]() |
The algorithm only matches the frequency in these positions. The distances between the feature vector of the input face and all faces in the image data are calculated. The smallest distance then gives the closest match.
function x = feature_distance(A, B)
%The distances between the feature vectors
M = abs(A-B);
x = sum(sum(M));
endThe Matlab code takes a sample face as an input, and find the closest distance in the feature space and output that face from the image database. I found the real part frequency matters more than the imaginary. It has above 95% accuracy. So I tried to rotate and flip the input image and check the accuracy again.
| original input # 17 | # 17 flip | # 17 rotate 90 |
|---|---|---|
![]() |
![]() |
![]() |
| output: real 17, imag 17 | output: real 17, imag 9 | output: real 37, imag 29 |
| avg acc > 95% | avg acc > 90% | avg acc < 10% |
This algorithm seems not to be very robust to rotations, but I doubt people would rotate their head 90 degrees when taking pictures. ¯ \ _ (ツ) _ / ¯
















