InstaInsight is an asynchronous Python scraper built with Playwright to extract Instagram profile data by hashtags. It supports login, proxy usage, automatic scrolling, profile filtering, and CSV export with progress tracking.
- Login handling for Instagram accounts
- Proxy support for anonymous scraping
- Headless browser operation
- Scrapes profiles from hashtags with configurable limits
- Filters profiles based on follower count
- Extracts profile details including name, username, posts, followers, following, bio, profile picture, verification status, and post links
- CSV export and progress tracking
- Graceful shutdown and resume capabilities
- Randomized delays to reduce detection
- Python 3.10+
- Playwright
- python-dotenv
-
Clone the repository:
git clone https://github.com/ScrapiqCBett/InstaInsight.git cd InstaInsight -
Create a virtual environment and activate it:
python -m venv .venv .venv\Scripts\Activate.ps1 # Windows PowerShell source .venv/bin/activate # macOS/Linux
-
Install dependencies:
pip install -r requirements.txt playwright install
-
Create a
.envfile in the project root with your credentials:INSTAGRAM_USERNAME=your_username INSTAGRAM_PASSWORD=your_password PROXY_SERVER=optional_proxy HEADLESS=true -
Configure
config.json:{ "user_data_dir": "user_data", "hashtags": ["examplehashtag1", "examplehashtag2"], "max_posts_per_hashtag": 100, "max_profiles": 500, "delay_between_actions": [2, 5], "viewport": {"width": 1200, "height": 800}, "timezone_id": "UTC" }
Run the scraper using:
python scraper.py- The scraper logs output to
instagram_scraper.log. - Scraped profiles are saved to a CSV file.
- Progress is stored in
scraper_progress.jsonto allow resuming after interruption.
- Targeted profiles are filtered by follower count between 2,500 and 50,000 by default.
- Randomized delays and headless mode help reduce detection risk.
- Always respect Instagram's terms of service.
Make sure to ignore sensitive files:
.env
*.csv
scraper_progress.json
instagram_scraper.log
.venv/
This project is open-source and available under the MIT License.