Skip to content

A Book Cover (image) Matcher. Extract Book Covers from Image

Notifications You must be signed in to change notification settings

ZixiQu/CoverMatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project 4: Book Cover Finder

Usage section towards the end of the report

1. Take Test Images:

Test images are /images/test_xx.jpg


2: Crawl book covers

A crawler is implemented in crawl.py.

The crawler used selenium package. The crawler is designed specifically to crawl the product images on amazon.ca searching result, a page looks similar to this:

amazon

The crawler will target every product image and download the best quality image of all book covers.

amazon.ca display around 40-60 products in one page. The crawler can crawl one page at a time. However, by giving the crawler different pages URL as input, I used the crawler to download ~200 high quality book covers, each has a resolution of about 700x1000.

Challenges:
  • The default images source was compressed. By investigating more on webpage html information format, I am able to locate the images with original quality.
  • The crawler was once troubled with amazon.ca protection method, including reCAPTCHA. Solved by given the browser 10s window to allow manual input.
  • SSL certificate check. Solved by python certifi package.

3. Homography Estimation with RANSAC

homography_RANSAC() that find the best homography between a book cover and a test image is implemented in helper.py,. The majority of the work is copied from my A4 submission.

The function can be described as following pseudocode

def find_all_matches(SIFT_img1, SIFT_img2):
    """
    helper function for homography_RANSAC(): find matches between two sets of SIFT descriptors
    """
    for each descriptor in SIFT_img1:
        compute distance to all descriptors in SIFT_img2
        if closest distance / second closest distance match < 0.8:
            accept this match (closest match)
        else:
            discard this match
    return [all matches between SIFT_img1 and SIFT_img2 found]
        
    
def homography_RANSAC():
    SIFT_img1, SIFT_img2 <- SIFT 128D descriptors of both images

    # a small optimizating to improve runtime from ~1min to ~10s
    while any of SIFT_img1 or SIFT_img2 has more than 10000 SIFT descriptor detected:
        increase the SIFT detection contrast Threshold
        reassign to SIFT_img1 or SIFT_img2

    # start the process
    matches <- find_all_matches(SIFT_img1, SIFT_img2)  # see function above
    
    for 200 iterations:
        randomly pick 4 matches and compute the homography
        for the rest matches:
            apply homography to img1 SIFT, compute the distance between computed and real img2 SIFT
            if distance < some threshold:
                inliner count += 1
            else:
                not an inliner
                
    after 200 iterations, the four matches that has the highest inliner count is the winner
    return the winner!
Challenge:

​ Since both book covers and test images taken are very high resolution images, one image can have more than 20,000 SIFT detected. The time complexity of find_all_matches() is $O(m\times n)$, where $m, n$ are SIFT detectors count of both images, this can result in this function run for more than 1m30s for each book covers.

​ The solution to this is the small optimization proposed in the pseudo-code above, if too much SIFT key points detected, there are too many noisy and meaningless key points, so increase the contrast threshold and recompute SIFT key points. I have tried the key points # threshold as 2000, 5000, 8000, 10000, and noticed if fewer than 10,000 key points are detected, the homography computation has worse quality, it does not correctly find the book covers in the test image.

Below is an example of using 2000 as threshold:

2000

10,000 key points is a nice compromise number that ensures high quality homography computation and almost 10x runtime improvement (from ~90s to less than 10s). Below is the example using 10000 as threshold

10000


4. Real Juice: Efficient Retrieval Approach by Nister and Stewenius

In Nister and Stewenius paper, they proposed a Vocabulary Tree approach, which is essentially an hierarchical k-means tree of database SIFT 128D descriptors, that allows very efficient descriptor comparison that is able to find closest SIFT descriptors in $O(\log(N))$ time instead of $O(N)$, which is a huge improvement.

The python implementation can be found in find_book_cover.ipynb: build_vocabulary_tree() and find_closest_leaf() and database image scoring.

Pseudo-code of build_vocabulary_tree() and find_closest_leaf() :

def build_vocabulary_tree(depth):
    all data are normalied to reduced the affection by the magnitude of SIFT
    
    if reach bottom of the tree:  # base case
        leaves are SIFT descriptors and their corresponding image
    else:
        define kmeans model with 10 clusters and fit the data set
        leaves are sub-vocabulary-tree, where center is kmeans center, and data is assigned to corresponding clusters based on euclidean distances.
        build_vocabulary_tree(depth - 1) recursion on each of the kmean cluster
        
        
def find_closest_leaf():
    recursively compare eculidean distance at each level
    if reach the leaf (real SIFT with corresponding image), return the image.  # base case

For each query image, here's how to find the score of likelihood to all the database images:

scores <- tracker of scores of each database images
for each descriptor in test image:
    normalize the descriptor
    winner <- find_closest_leaf(vocabulary_tree, des)
    scores[winner] += 1
   
images ranking is based on the highest score in scores

**Run time **of this project is important, which I will discuss in the Key Findings section below.

For each of the query image, top10 book cover candidate are found and displayed:

top10

3 out of 5 books are successfully matched (at top1, top2, top5). I will discuss about the 2 that didn't match later in "problem" area.

From below example, we can see the Efficient Retrieval is able to localize the book half covered, as long as the uncovered part has interesting enough features for the algorithm to match. (more on 5. and 6. about localize books).

halfcovered

Key Findings:

Human beings can get bored if they don't click the mouse in five seconds. To not bored people, for any of the not-finish-instantaneously code block, I have add a progress bar to show time estimation. Using tqdm library.

progress

If you play with my code, you will find following not-instantaneously code block:

Load whole database images. I have 203 images in my database, takes ~15s to load

Build vocabulary tree. You can select how deep the tree goes. The deeper tree the more time to build, but deeper tree make matching noticeable faster. The paper suggest 6 layers with branching factor(BF) of 10, which can take at least $10^6 = 1,000,000$ SIFT descriptors. However, since my database is smaller compare to the paper (203 images has 363268 key points), I am building a smaller tree.

Here's the relationship between layer number and run time. It reflects "The deeper tree the more time to build, but deeper tree make matching noticeable faster".

Vocabulary Tree Layer BF Build Tree (second) Matching (second)
2 10 5 33
3 10 9.1 3
4 10 16 less than 1
5 10 can't build N/A

Layer number will not affect matching result.

If layer $\ge$ 5, at the bottom of the tree, some node has $&lt;$ 10 key points, but building a 10 cluster k-means requires $\ge$​ 10 key points, so building Vocab Tree failed. If we instead having a bigger database, we can have a deeper tree, and matching will therefore be even faster.

Problem:

Some books are usually hard to be found, like these two books:

SSENSE STEWART

Possible reasons could be these books are lack of distinguished pattern (large chunk of white area) and the text font is too thin so that key point detection method like SIFT cannot capture the pattern in the text or produce ambiguous SIFT key points that lead to a lower similarity score at the end of detection.

Here's SIFT detection of the SSENSE magazine:

ssenseSIFT

The model in the center of the magazine cover and the car model on the bottom has some SIFT detected. However, most of the SIFT are detected in the text, that can easily cause ambiguity in matching.


5. Localize Book Covers from Test Image

After Efficient Retrieval had found the top10 candidate matches of book covers, we can manually select which of the top 10 matches we are trying to localize. In the last code cell: candidate = 0 select which candidate (0 indexed).

This is done by applying homography_RANSAC() from 3. to match one candidate to the test image.

Potential future work

The algorithm is now comparing all SIFT descriptors in the test image with book covers. However, after we do Efficient Retrieval, we already filtered a group of SIFT descriptors that points to the book cover. We can instead use those instead of all SIFT descriptors from the image to improve efficiency and localization accuracy.


6. Plot the localized book cover

To sum up and visualize what I did, taking the homography matrix computed in 5. and apply to the dimensions of the book covers, I am able to localize the books in test image.

Here's the gallery of matching result:

g1

g1

g1

g1

g1

g1


Usage:

  1. Open find_book_cover.ipynb
  2. change DATABASE = "book_covers", should be folder path to image database.
  3. Run first code block that contains import and all functions.
  4. Run following two code blocks, which load the database and put together all SIFT descriptors.
  5. Run next code block to build Vocab Tree vocabulary_tree = build_vocabulary_tree(all_SIFT, 4) . Adjust 4 to layer number. Run time is discussed in above section
  6. Using ipynb so that do not need to rebuild vocab tree every match.
  7. Change TARGET = "images/test_5.jpg" to OS path to test image, run the block to match top10 from database
  8. After that, manually select which candidate from top10 matches candidate = 0 (0 index) and run block to localize.

Summarization

To summarize, This tool can robustly detect and localize book cover in the test image, even the book is half covered. As building deeper Vocab Tree will require more time but making matching way more efficient, in practice, it is worth to build a deeper Vocab Tree in the offline phase, so that in the online phase, the user can efficiently match book covers.

About

A Book Cover (image) Matcher. Extract Book Covers from Image

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published