Welcome to my GitHub profile. I work on digital humanities, text analysis, and cultural infrastructure.
Most of my work focuses on computational text processing, topic modelling, and public archival systems, with a particular interest in low-resource languages like Bengali.
My current projects explore how machine learning techniques such as Latent Dirichlet Allocation (LDA), named entity recognition (NER), and keyword extraction can be adapted for literary corpora, cultural datasets, and historical archives. I work extensively with Bengali texts, using natural language processing (NLP) to model themes, classify documents, and build interpretive tools for researchers and students.
- anvay: A web-based topic modelling tool for Bengali text corpora. Supports custom preprocessing, visualisations, and interpretive reports. Built with Python, Gensim, Flask, and Plotly.
- gridOCR: gridOCR is a desktop OCR tool for digitising historical printed books and periodicals.
- Bengali NLP | Topic Modelling | Text Mining
- Digital Humanities in India | Cultural Analytics | Archive Infrastructure
- Corpus Linguistics | Open-source Research Tools | Visualisation of Text Data