Skip to content

rajab-bett-analytics/Instagram-Profile-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

InstaInsight - Instagram Profile Scraper

InstaInsight is an asynchronous Python scraper built with Playwright to extract Instagram profile data by hashtags. It supports login, proxy usage, automatic scrolling, profile filtering, and CSV export with progress tracking.


Features

  • Login handling for Instagram accounts
  • Proxy support for anonymous scraping
  • Headless browser operation
  • Scrapes profiles from hashtags with configurable limits
  • Filters profiles based on follower count
  • Extracts profile details including name, username, posts, followers, following, bio, profile picture, verification status, and post links
  • CSV export and progress tracking
  • Graceful shutdown and resume capabilities
  • Randomized delays to reduce detection

Requirements

  • Python 3.10+
  • Playwright
  • python-dotenv

Installation

  1. Clone the repository:

    git clone https://github.com/ScrapiqCBett/InstaInsight.git
    cd InstaInsight
  2. Create a virtual environment and activate it:

    python -m venv .venv
    .venv\Scripts\Activate.ps1   # Windows PowerShell
    source .venv/bin/activate    # macOS/Linux
  3. Install dependencies:

    pip install -r requirements.txt
    playwright install
  4. Create a .env file in the project root with your credentials:

    INSTAGRAM_USERNAME=your_username
    INSTAGRAM_PASSWORD=your_password
    PROXY_SERVER=optional_proxy
    HEADLESS=true
    
  5. Configure config.json:

    {
        "user_data_dir": "user_data",
        "hashtags": ["examplehashtag1", "examplehashtag2"],
        "max_posts_per_hashtag": 100,
        "max_profiles": 500,
        "delay_between_actions": [2, 5],
        "viewport": {"width": 1200, "height": 800},
        "timezone_id": "UTC"
    }

Usage

Run the scraper using:

python scraper.py
  • The scraper logs output to instagram_scraper.log.
  • Scraped profiles are saved to a CSV file.
  • Progress is stored in scraper_progress.json to allow resuming after interruption.

Notes

  • Targeted profiles are filtered by follower count between 2,500 and 50,000 by default.
  • Randomized delays and headless mode help reduce detection risk.
  • Always respect Instagram's terms of service.

Git Ignore

Make sure to ignore sensitive files:

.env
*.csv
scraper_progress.json
instagram_scraper.log
.venv/

License

This project is open-source and available under the MIT License.

About

InstaInsight is an advanced Instagram profile scraper built using Python and Playwright. It automates the process of collecting public Instagram profile data from posts under specific hashtags. The scraper is designed for efficiency, reliability, and safety, handling login, session persistence, delays, and blocks gracefully.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors