Simple Scholar search engine from Google Scholar.
ScholarScraper is a lightweight Python-based tool designed to extract academic publication data directly from Google Scholar. It provides a simple interface to search for research papers, collect metadata such as titles, authors, publication years, citation counts, and links, and structure them into a usable format for analysis or integration into larger projects.
This project aims to simplify academic data collection for intelligent information retrieval final exams project in University of Surabaya
- Wrtier-based search – Enter any writer's papers from Google Scholar.
- Topic-based search – Enter any topic or keyword to retrieve relevant papers from Google Scholar.
- Extract structured data – Automatically fetch and organize paper details:
- Title
- Authors
- Publication year
- Citation count
- Source link
- Export capability – Save extracted data into CSV or JSON formats for further processing.
- Lightweight and fast
- The user inputs a search query.
- ScholarScraper sends a formatted request to Google Scholar’s search results page.
- It parses the HTML using Selenium to extract structured data (titles, authors, citations, etc.).
- Results are stored in a Pandas DataFrame, allowing easy export and analysis.
Evaluating google scholar's web page and how the page work is crucial for automation information retrieval. The evaluation can be accessed here.
- Python 3.13
- Selenium – for dynamic content scraping
- Pandas – for data organization and export
Warning
This tool is intended for educational and research purposes only. Google Scholar does not provide an official public API, so excessive or automated requests may violate its terms of service. Please use responsibly.