Text Based Search Engine

Boolean and Vector Space (Mixed) Model for Information Retrieval

Retrieval Model:

This repository contains an implementation of Vector Space Model of Information Retrieval. Data is read from a .csv(Comma Separated Values) file. Words are then represented in an inverted index using the data structure-nested maps. The documents are returned using Boolean Retrieval Model. These documents are then ranked using the Vector Space Model based on tf-idf score.

Library Used:

Boost library for Tokenization

ADDITIONAL FEATURE:

When the query is entered, the system returns a set of closely matching results in the form of document id in the ranked order according to vector space model with the help of tf-idf score. Select from one of them to proceed to document retrieval. Also if the query is mistyped ,the search engine suggest you some queries which are implemented with the help of edit distance algorithm.

INSTRUCTIONS ON QUERY FORMAT:

1) Enter words separated by & to return set of documents which have entered words in conjunction. 
2) Enter words separated by ' '(space) to return set of documents which have entered words in disjunction.
3) The characters: '-', '!',  are treated as separators and hence the words obtained will be conjunctively searched.  
4) The only precedence order followed among boolean operators is from left to right .

OUTPUT FORMAT:

The code will print the documents in the order (highest matching to lowest matching).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Search Engine Implementation		Search Engine Implementation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Based Search Engine

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Based Search Engine

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages