📝 Persian Text Processing with Parsivar

This project demonstrates how to process Persian (Farsi) text using the Parsivar NLP library. It includes text normalization, tokenization, stemming, and spell checking, with additional tools to handle proper display of Persian characters.

🔍 Features

✅ Normalization – Cleans and standardizes Persian text.
✅ Tokenization – Splits text into sentences and words.
✅ Stemming – Converts words to their root forms.
✅ Spell Checking – Detects and corrects misspellings in Persian.
✅ Display Support – Uses arabic_reshaper and python-bidi to fix RTL display issues.

🧰 Libraries Used

parsivar – NLP tools for Persian.
arabic_reshaper – For reshaping characters to correct forms.
python-bidi – Ensures proper display of RTL scripts like Persian.

📌 How It Works

Read Persian text from a .txt file.
Normalize the text using Parsivar.
Tokenize the normalized text into words and sentences.
Apply stemming to get root forms of words.
Use spell correction on custom input.
Display reshaped output for better readability in terminals.

🚀 Usage

Install dependencies:

pip install parsivar arabic_reshaper python-bidi

pip install -r requirements.txt

then in code first we normalize then we tokenize, and after tokenize we stammer and in the end for spell detection you need to download these two files and put it in the this below path:

first create a spell folder in this path:
venv\Lib\site-packages\parsivar\resource

then replace these two file in the spell folder:
- onegram.pckl
- mybigram_lm.pckl

🔽Download two files from here

🎥preview

📳technology

python
nltk
parsivar
bidi
arabic_reshaper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📝 Persian Text Processing with Parsivar

🔍 Features

🧰 Libraries Used

📌 How It Works

🚀 Usage

Install dependencies:

🔽Download two files from here

🎥preview

📳technology

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

📝 Persian Text Processing with Parsivar

🔍 Features

🧰 Libraries Used

📌 How It Works

🚀 Usage

Install dependencies:

🔽Download two files from here

🎥preview

📳technology