This project scrapes all public Actors from the Apify Store (via the public Algolia index) and enriches them with developer profile information, then exports the result to CSV / Markdown / Excel.
The script is designed to be:
- Accurate – uses Algolia’s structured fields for stats/pricing instead of brittle HTML parsing.
- Robust – async I/O, retries with backoff, and range-based pagination to bypass Algolia’s 16k pagination limit.
- Ready for analysis – clean, normalized columns that work well in Excel, BI tools, or Python notebooks.
Each row represents a single Actor and contains at least:
actor_url– full URL, e.g.https://apify.com/epctex/youtube-video-downloaderactor_name– Actor title from the Store (human‑readable)pricing– human‑readable summary derived fromcurrentPricingInfo, e.g.FREEFLAT_PRICE_PER_MONTH; 15 USD/month; trialMinutes=4320PAY_PER_EVENT; minimalMaxTotalChargeUsd=0.5; primaryEvent=Scraped place; FREE=0.004 USD/eventPRICE_PER_DATASET_ITEM; unit=result; FREE=0.0005 USD/item
bookmarked– bookmark count (integer)total_users– total users count (integer)monthly_active_users– 30‑day active users (integer, when available)developer_name– developer display name (fallbacks touserFullName/ username)developer_profile_url–https://apify.com/<username>developer_joined– join date from the profile, e.g.Joined May 2023developer_contacts– all external/contact links found on the profile main content, e.g.:- email addresses (
user@example.com,mailto:user@example.com) - websites (
https://example.com) - social profiles (LinkedIn, Twitter/X, GitHub, YouTube, etc.)
- email addresses (
If a developer genuinely didn’t provide any external links, developer_contacts is empty for that row.
For transparency, developer profiles that consistently return HTTP errors (e.g. 404) are logged to:
developer_profile_failures.csv
-
Discover Actors via Algolia
- Uses the public Apify Store Algolia index
prod_PUBLIC_STORE. - Applies numeric range splits on
modifiedAt+ paginated queries to bypass Algolia’spaginationLimitedTo(16k) limit. - Only keeps hits that have both
usernameandname.
- Uses the public Apify Store Algolia index
-
Enrich with developer profiles
- Deduplicates by
usernameand fetches each profile page once (with retries and concurrency limits). - Parses the HTML using
BeautifulSoup, focusing on the<main>section to avoid global footer/header noise. - Extracts:
Joined <Month> <Year>(regex over the full text blob).- All external/contact links and emails from the main content (with a conservative fallback if the page has no
<main>).
- Deduplicates by
-
Export
- CSV:
apify_actors_with_developers.csv(full dataset). - Markdown:
apify_actors_with_developers.md(first 200 rows for quick diff/preview). - Excel:
apify_actors_with_developers.xlsxwith:- Frozen header row.
- Auto‑sized columns (capped to a reasonable width).
- Clickable hyperlinks for
actor_urlanddeveloper_profile_url. - Wrapped text for long fields like
pricinganddeveloper_contacts.
- CSV:
- Python 3
- httpx – async HTTP client
- BeautifulSoup4 – HTML parsing
- pandas – data wrangling & tabular exports
- openpyxl – Excel (
.xlsx) output - tenacity – retry with exponential backoff + jitter
- tqdm – progress bars for long‑running jobs
All Python dependencies are listed in requirements.txt.
git clone https://github.com/Thordata/scrape-apify-actors-with-developers.git
cd scrape-apify-actors-with-developers
python -m venv .venv
source .venv/Scripts/activate # on Windows Git Bash / WSL
# or: .venv\Scripts\activate # on classic Windows CMD/PowerShell
pip install -r requirements.txtThe script does not use your private Apify API token for listing Actors.
Instead, it relies on the public search‑only Algolia key that the Apify Store itself uses.
You need to provide:
APIFY_STORE_ALGOLIA_APP_ID– Algolia application id (from the Apify Store network requests)APIFY_STORE_ALGOLIA_API_KEY– public search‑only key forprod_PUBLIC_STORE(from the Apify Store network requests)
Example (Git Bash / WSL):
export APIFY_STORE_ALGOLIA_APP_ID=<YOUR_APP_ID>
export APIFY_STORE_ALGOLIA_API_KEY=<YOUR_SEARCH_ONLY_KEY>Important:
- Do not commit your private Apify account tokens to Git.
- The key used here should be the public search‑only key exposed by the Store UI, not your personal API token.
- Open the Apify Store page (e.g.
https://apify.com/store/categories) - Open DevTools → Network
- Find a request to:
...algolia.net/1/indexes/prod_PUBLIC_STORE/query
- Copy request headers:
x-algolia-application-idx-algolia-api-key
You can tweak runtime behaviour via environment variables:
APIFY_STORE_LIMIT_ACTORS– limit number of Actors for faster test runs (default:0= no limit).- e.g.
export APIFY_STORE_LIMIT_ACTORS=200for a quick smoke test.
- e.g.
APIFY_STORE_HITS_PER_PAGE– AlgoliahitsPerPage(default:100).APIFY_STORE_PAGE_CONCURRENCY– concurrent Algolia page fetches (default:15).APIFY_STORE_PROFILE_CONCURRENCY– concurrent developer profile fetches (default:30).APIFY_STORE_PAGINATION_LIMIT– expected Algolia pagination limit (default:16000).APIFY_STORE_OUTPUT_BASENAME– base name for output files (default:apify_actors_with_developers).
Verify the pipeline and field correctness on a small sample:
export APIFY_STORE_LIMIT_ACTORS=200
python scrape_apify_actors.pyCheck:
apify_actors_with_developers.csvand.xlsx– validate pricing, stats and contacts for a few Actors.developer_profile_failures.csv– developer profiles that consistently returned HTTP errors (e.g. 404).
Unset the limit (or set to 0) and run:
unset APIFY_STORE_LIMIT_ACTORS # or: export APIFY_STORE_LIMIT_ACTORS=0
python scrape_apify_actors.pyDepending on your connection, the full crawl (≈ 22k Actors + ~15 developer profiles batches) usually completes within ~10–15 minutes.
After a successful run you should see:
apify_actors_with_developers.csv– full dataset.apify_actors_with_developers.xlsx– full dataset, formatted for Excel.apify_actors_with_developers.md– first 200 rows (Markdown table for quick inspection).developer_profile_failures.csv– optional; only present if some developer profiles consistently failed (non‑200 HTTP).
- This project is unofficial and not affiliated with Apify.
- Data comes from public sources (Apify Store + public developer profiles).
Please respect Apify’s Terms of Service and any applicable rate limits. - Some developers genuinely don’t provide external contact info – those rows will have an empty
developer_contacts. - A small number of developer profile URLs might be dead or private; those are recorded in
developer_profile_failures.csv.
Suggestions and pull requests are welcome. Some ideas:
- Add richer pricing normalization (e.g. per‑month cost estimate for PAY_PER_EVENT models).
- Add more advanced segmentation (e.g. by category, MAU buckets) into separate sheets.
- Integrate with a dashboard (e.g. Superset, Metabase) for “Apify ecosystem analytics”.
If you find this useful, consider ⭐ starring or forking the repo – it helps others discover it. :)
