Skip to content

Jahnavi-S-24/NLP-Keyword-Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 

Repository files navigation

Keyword_extraction_using_NLP

This project demonstrates how to extract keywords from raw text using three popular NLP techniques:

  • πŸ”’ TF-IDF (Term Frequency-Inverse Document Frequency)
  • 🧠 RAKE (Rapid Automatic Keyword Extraction)
  • πŸ”— TextRank (Graph-based ranking algorithm)

Developed entirely in Google Colab using Python and nltk, rake_nltk, sklearn, and networkx.

πŸš€ Project Objective

To compare different keyword extraction methods and understand how each performs on a sample corpus of text. This is useful for applications in:

  • Search engine optimization
  • Summarization tools
  • Content classification
  • Information retrieval

πŸ“Š Methods & Workflow

  1. Data Input: Raw text string (manually input).
  2. Preprocessing: Tokenization, stopword removal.
  3. TF-IDF: Extracts top words based on statistical frequency.
  4. RAKE: Extracts keyword phrases based on word co-occurrence and frequency.
  5. TextRank: Builds a graph of words and uses PageRank to find the most relevant ones.

πŸ› οΈ Libraries Used

  • nltk
  • rake_nltk
  • sklearn
  • networkx
  • matplotlib

πŸ““ Notebook


✨ Output

Each method extracts a ranked list of keywords from the same input text.
This helps visually and practically compare how different techniques interpret "importance."


About

Keyword extraction from text using NLP techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors