This project demonstrates how to extract keywords from raw text using three popular NLP techniques:
- π’ TF-IDF (Term Frequency-Inverse Document Frequency)
- π§ RAKE (Rapid Automatic Keyword Extraction)
- π TextRank (Graph-based ranking algorithm)
Developed entirely in Google Colab using Python and nltk, rake_nltk, sklearn, and networkx.
To compare different keyword extraction methods and understand how each performs on a sample corpus of text. This is useful for applications in:
- Search engine optimization
- Summarization tools
- Content classification
- Information retrieval
- Data Input: Raw text string (manually input).
- Preprocessing: Tokenization, stopword removal.
- TF-IDF: Extracts top words based on statistical frequency.
- RAKE: Extracts keyword phrases based on word co-occurrence and frequency.
- TextRank: Builds a graph of words and uses PageRank to find the most relevant ones.
nltkrake_nltksklearnnetworkxmatplotlib
Each method extracts a ranked list of keywords from the same input text.
This helps visually and practically compare how different techniques interpret "importance."