Skip to content

Project Backrub: implement a document link graph and incorporate backlinks in score #147

@charlesreid1

Description

@charlesreid1

Project Backrub is our implementation of a PageRank-like algorithm that accounts for the number of back-links (that is, pages that link back to a given page), which is a measure of popularity of a document.

See this comment in dib-lab/copper#305:

The graph concept could also be extended and utilized by centillion, for example to enhance the ranking system (with a graph where nodes are documents and edges are interlinked documents, highly linked-to documents receive higher weighting in centillion)...

This helps centillion to transition to a high-level view of documents in the Data Commons - and will greatly improve its ability to retrieve the most relevant results based on the number of back-links to a document. (The PageRank algorithm was originally called Backrub.)

However, linking this idea of the graph structure to centillion... would be centric to the documents indexed by centillion (i.e., it would be restricted to a particular folder hierarchy).

The idea is to assemble a graph of documents in the search index (node = document indexed by centillion, directed edge = link from document A to document B), compute the in-degree of each node in the graph (number of documents that link to a given document), and store this in the search index, for use in the scoring mechanism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions