Skip to content

Latest commit

 

History

History
66 lines (52 loc) · 2.49 KB

File metadata and controls

66 lines (52 loc) · 2.49 KB

Playmaker

This project is a search engine, including web crawling, indexing, ranking, and query processing.

Built With

Java SpringBoot React

Demo Video

Playmaker-SearchEngine.mp4

Search Engine Modules

Web Crawler

  • The web crawler is responsible for collecting documents from the web.
  • It starts with a list of URL addresses (seed set) and downloads the documents identified by these URLs.
  • Extracts hyperlinks from downloaded documents and adds them to the list of URLs to be downloaded.
  • Key features:
    • Avoids revisiting the same page.
    • Crawls documents of specific types (HTML).
    • Maintains state for resuming interrupted crawls.
    • Handles robot.txt exclusions.
    • Provides multithreaded implementation.
    • Crawls a specified number of pages.
    • Uses appropriate data structures for page visit order.

Indexer

  • Indexes the contents of downloaded HTML documents.
  • Features:
    • Persistence in secondary storage.
    • Fast retrieval for word-based queries.
    • Incremental update with newly crawled documents.
    • Considers storage for result ranking and searching.

Query Processor

  • Processes search queries.
  • Performs necessary preprocessing and searches the index for relevant documents.
  • Retrieves documents containing words with shared stems from the search query.

Phrase Searching

  • Supports phrase searching with quotation marks.
  • Results must match the order of words in the phrase.

Ranker

  • Ranks documents based on relevance and popularity.
  • Calculates relevance based on query-word appearance and aggregation.
  • Measures popularity using algorithms like PageRank.

Web Interface

  • Implements a user-friendly web interface.
  • Receives user queries and displays search results with snippets.
  • Displays website title, URL, and relevant paragraph with query words in bold.

How to Run

  1. Clone the repository.
  2. Install required dependencies.
  3. Run the main application file.
  4. Access the React web interface.