This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
PyTok is a TikTok web scraping library using a dual-approach architecture:
- Primary: Uses the TikTok-Api library for API requests
- Fallback: Automatically falls back to browser automation (zendriver) when API fails
All operations are async/await based.
# Install
pip install git+https://github.com/networkdynamics/pytok.git@master
# Run scripts (using a conda environment)
conda run -n <env> python <script>
# Run tests
conda run -n <env> pytest tests/
# Run single test
conda run -n <env> pytest tests/test_user.py::test_user_videosPyTok (tiktok.py)
├── zendriver browser - CDP network response tracking
├── TikTok-Api client - API requests with msToken from browser cookies
└── Request cache - stores recent API responses
API Classes (api/*.py) - all inherit from Base
├── User - user info, videos
├── Video - metadata, bytes, comments, related videos
├── Hashtag - hashtag info and videos
├── Search, Sound, Trending (partial implementations)
Every data-fetching method follows this pattern:
try:
response = await self.parent.tiktok_api.make_request(...)
except ApiFailedException:
# Fallback to browser scrapingPyTok tracks network responses via Chrome DevTools Protocol:
- Captures responses matching
/api/,video/tos,v16-webapp,v19-webappURL patterns - Stores response bodies before Chrome garbage collects them
- Used to extract video bytes and API data from page loads
- Automatic solving via OpenCV image matching (
captcha_solver.py) - Supports slide and whirl puzzle types
- Manual solving available with
manual_captcha_solves=True
tiktok.py- Main entry point, manages browser and API clientapi/base.py- Base class with DOM interaction, captcha detection, scrollingapi/user.py- User data and video fetchingapi/video.py- Video metadata, bytes download, commentshelpers.py- HTML parsing, extracts__UNIVERSAL_DATA_FOR_REHYDRATION__JSON from pagesutils.py- DataFrame conversion helpers (get_video_df,get_comment_df,get_user_df)
PyTok(
logging_level=logging.WARNING,
request_delay=0, # seconds between requests
headless=False, # headless doesn't work reliably
manual_captcha_solves=False,
log_captcha_solves=False, # save captcha data to JSON files
)async with PyTok() as api:
user = api.user(username="therock")
user_data = await user.info()
async for video in user.videos(count=100):
video_data = video.info()
video_bytes = await video.bytes()