Skip to content

noaa-ocs-modeling/ScholarScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScholarScraper

A Python tool to automatically track, filter, and categorize research publications from Google Scholar for a specific list of authors.

File Structure

  • main.py: The entry point. It loads config.json, runs the scraper, and saves the final text report.
  • scraper.py: The core logic engine. It handles API requests, filters by year, and categorizes papers based on your keywords.
  • config.conf: Your settings file (API keys, Author IDs, and keyword lists).
  • requirements.txt: Contains the necessary Python libraries for the project.

Setup Instructions

1. Get your SerpApi Key

  1. Sign up at SerpApi.com.
  2. Copy your API Key from your private dashboard.

2. Find Google Scholar Author IDs

  1. Go to Google Scholar.
  2. Search for an author and click their profile.
  3. Look at the URL. The ID is the string of characters after user=.
    • Example: https://scholar.google.com/citations?user=h1AbC2_AAAAJ → The ID is h1AbC2_AAAAJ.

3. Installation

Open your terminal in this folder and install the required library:

pip install -r requirements.txt

4. Configuration

Open config.conf and fill in your specific details:

api_key: Your SerpApi key.

author_ids: A list of IDs (e.g., ["ID1", "ID2"]).

start_year / current_year: The range of publications to fetch.

How to Run

Execute the program from your terminal:

python main.py

Note: Every page of articles and every DOI search uses 1 SerpApi credit. Monitor your usage at SerpApi.com.

About

A Python tool to automatically track, filter, and categorize research publications from Google Scholar for a specific list of authors.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages