Skip to content

feat(douyin): add search command for keyword video search#1759

Open
Daily-AC wants to merge 3 commits into
jackwener:mainfrom
Daily-AC:feat/douyin-search
Open

feat(douyin): add search command for keyword video search#1759
Daily-AC wants to merge 3 commits into
jackwener:mainfrom
Daily-AC:feat/douyin-search

Conversation

@Daily-AC
Copy link
Copy Markdown

@Daily-AC Daily-AC commented May 26, 2026

Motivation

The douyin adapter ships 13 subcommands today (activities, hashtag, publish, stats, ...) but no keyword video search. hashtag search --keyword returns topic/challenge rows, not videos, so anything that wants "find me douyin videos for a query" has nowhere to go. This came up while wiring omnireach — its multi-source search needed a tiktok-mirror command for the China-region equivalent (see issue Daily-AC/omnireach#12).

This PR adds opencli douyin search <query> against www.douyin.com.

Output shape (tiktok-aligned)

{"rank": 1, "desc": "...", "author": "...", "url": "...", "plays": 0, "likes": 0, "comments": 0, "shares": 0}

Same eight columns as tiktok search, same order. Downstream tools that already normalize tiktok search rows can ingest douyin rows without per-adapter mapping.

Caveat on counters: plays, comments, shares are exposed as 0 because the search result card markup only surfaces the like count. Clients that need the full counter set should call /aweme/v1/web/aweme/detail/?aweme_id=<id> for the relevant id parsed from url. Surfacing 0 is preferable to fabricating values or dropping the columns (which would break tiktok-shape consumers).

Prerequisites

  • Chrome profile bound to OpenCLI must be logged in to https://www.douyin.com. Anonymous visitors get a results page with no rendered cards (no visible login prompt either — it just silently empty); the adapter detects this and raises AuthRequiredError. Log in once via Chrome and the cookies persist.

Implementation strategy: DOM extraction (post-rewrite)

The initial draft used Strategy.INTERCEPT to capture the SPA's signed /aweme/v1/web/general/search/single/ XHR. Live testing on a logged-in profile revealed this never works:

  • wait xhr "general/search/single" times out at 20s even though [data-e2e="scroll-list"] li has 16 rendered result cards in the DOM.
  • Direct synthesis of the XHR from page context (with the full SPA param set: device_platform=webapp, aid=6383, pc_client_type=1, version_code, etc.) returns status_code: 0, data: [], search_nil_info.search_nil_type: "verify_check" — the endpoint validates an a_bogus signature that lives in the SPA bundle, and bare URLs miss it.
  • The SPA renders results into <ul data-e2e="scroll-list"> server-side during initial navigation. The data is in the HTML at navigation time; no client-side fetch is required.

Final approach (clis/douyin/search.js):

  1. Navigate to https://www.douyin.com/search/<urlencoded>?type=video.
  2. page.evaluate a MutationObserver-backed waiter that resolves to one of {state: 'rendered', cards} / {state: 'login_wall'} / {state: 'timeout'} within a 15s budget.
  3. For each <li> inside [data-e2e="scroll-list"], harvest the row's <a href*="/video/"> and the ordered list of leaf-element textContents. The classnames are obfuscated and churn between Douyin builds, so we pin only data-e2e selectors and identify fields by shape:
    • ^\d{1,2}:\d{2}(:\d{2})?$ → duration (skipped)
    • ^\d+(\.\d+)?[万亿]?$likes (with parseDouyinCount handling 万/亿)
    • text equal to @, followed by → author
    • longest remaining text → desc
  4. Project to the tiktok-aligned schema; plays/comments/shares stay 0 per the caveat above.

This survives classname obfuscation, doesn't need request signing, and matches the pattern xiaohongshu/rednote use for the same problem class.

Live test evidence: VERIFIED on logged-in profile

5 queries × --limit 5, all return 5 well-formed rows. Top-row samples (full JSON arrays validated locally; trimming to 1-2 rows per query for review):

Query 1: opencli douyin search "AI 编程" --limit 5 -f json

[
  {"rank": 1, "desc": "有了AI编程程序员还有出路吗? #科技科普 #计算机 #AI #编程 #程序员", "author": "英雄编程", "url": "https://www.douyin.com/video/7573894888656293163", "plays": 0, "likes": 17000, "comments": 0, "shares": 0},
  {"rank": 4, "desc": "致全地球人,AI编程速成指南,代码改变命运! #AI在抖音  #AI编程  #编程入门#编程学习   #计算机专业", "author": "数字游牧人Samuel", "url": "https://www.douyin.com/video/7462667529274592547", "plays": 0, "likes": 35000, "comments": 0, "shares": 0}
]

Query 2: opencli douyin search "claude code" --limit 5 -f json

[
  {"rank": 1, "desc": "全网最全!60分钟全面掌握Claude Code~ 【附完整文档】\n#AI #秋芝2046 #ClaudeCode #AI教程 #前沿科技趋势发布月", "author": "秋芝2046", "url": "https://www.douyin.com/video/7636497165430394162", "plays": 0, "likes": 40000, "comments": 0, "shares": 0},
  {"rank": 3, "desc": "Claude Code 零基础终极教程! ... #ai新星计划 #claudecode  #ai教程  #claude #智能体", "author": "木子不写代码", "url": "https://www.douyin.com/video/7636872072064470298", "plays": 0, "likes": 408000, "comments": 0, "shares": 0}
]

Query 3: opencli douyin search "周杰伦演唱会现场" --limit 5 -f json

[
  {"rank": 1, "desc": "#周杰伦甜甜的南宁 嘉年华2 南宁站 day2 I Do  4k超清#演唱会现场 #神级现场 #神级live现场 #live现场", "author": "SecretXSPEC", "url": "https://www.douyin.com/video/7640855326389263801", "plays": 0, "likes": 1671, "comments": 0, "shares": 0},
  {"rank": 4, "desc": "周杰伦2010年超时代世界巡回演唱会", "author": "清心影视", "url": "https://www.douyin.com/video/7032189958270045470", "plays": 0, "likes": 11000, "comments": 0, "shares": 0}
]

Query 4: opencli douyin search "#美食探店" --limit 5 -f json (hashtag form — works the same as plaintext)

[
  {"rank": 1, "desc": "特厨探店|不惊艳,但是吃着超级舒服的家常菜—老太婆家常菜 #美食探店 #隋坡 #美食 #黄山", "author": "特厨隋坡(重新出发版)", "url": "https://www.douyin.com/video/7643888049474063631", "plays": 0, "likes": 96000, "comments": 0, "shares": 0},
  {"rank": 2, "desc": "一锅三吃太夯了!被万象城这家贵州烙锅惊艳到了#美食探店#长沙美食#贵州烙锅湖南首店#长沙正宗贵州烙锅#黔寨寨贵州烙锅", "author": "Seven不七亏", "url": "https://www.douyin.com/video/7643712014329827950", "plays": 0, "likes": 6091, "comments": 0, "shares": 0}
]

Query 5: opencli douyin search "rust" --limit 5 -f json

[
  {"rank": 1, "desc": "【Rust腐蚀】有史以来最棒的开荒之旅! #steam游戏 #多人联机 #生存游戏", "author": "Bone骨头碎片", "url": "https://www.douyin.com/video/7472103372993105161", "plays": 0, "likes": 57000, "comments": 0, "shares": 0},
  {"rank": 3, "desc": "Rust大讲堂 #1 开局的思路详解,新手必看,老手一起交流 #Rust #腐蚀 #代号#前哨 #新手教程", "author": "Jullseye", "url": "https://www.douyin.com/video/7481895809458474240", "plays": 0, "likes": 4552, "comments": 0, "shares": 0}
]

Coverage observed: English ("rust", "claude code"), CJK ("AI 编程", "周杰伦演唱会现场"), hashtag form ("#美食探店"). No verify_check rejections, no holdouts, no anti-bot drift in this run. Like counts span 8 to 408k, confirming the 万/亿 parser handles both small-creator and viral rows.

Tests

  • 30 unit tests in clis/douyin/search.test.js covering: arg validation, parseSearchLimit bounds, parseDouyinCount over 万/亿/plain/empty/non-string, normalizeDouyinVideoUrl over scheme-relative/absolute/empty, projectCard over classname-agnostic leaf-text shapes (duration skip, like-count parsing, author-after-@ extraction, longest-text fallback for desc, fused @author prefix stripping, safe defaults for missing leafTexts), full func() flow including login-wall mapping, timeout-to-AuthRequired mapping, {session, data} envelope unwrap, malformed-payload CommandExecutionError, --limit cap, and Chinese URL encoding.
  • npm test (442 files / 4664 tests / 1 skipped): all pass.
  • npm run typecheck: clean.
  • opencli validate: PASS (14 douyin commands, 0 errors, 0 warnings).

Files

  • clis/douyin/search.js (new) — DOM-extraction adapter
  • clis/douyin/search.test.js (new) — 30 unit tests
  • cli-manifest.json — regenerated entry, column list [rank, desc, author, url, plays, likes, comments, shares]

DOM extraction from www.douyin.com/search/<q>?type=video. Requires logged-in
profile. plays/comments/shares exposed as 0 (card markup only surfaces likes);
see Follow-ups for full-counter path. Schema aligned with tiktok search.

Refs Daily-AC/omnireach#12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants