Skip to content

feat: add merged forward message (合并转发) parsing support#9

Open
Wuchen-4810 wants to merge 1 commit into
huohuoer:mainfrom
Wuchen-4810:feat/merge-forward-parser
Open

feat: add merged forward message (合并转发) parsing support#9
Wuchen-4810 wants to merge 1 commit into
huohuoer:mainfrom
Wuchen-4810:feat/merge-forward-parser

Conversation

@Wuchen-4810
Copy link
Copy Markdown

Summary

Add merged forward (合并转发) message parsing support to wechat-cli.

Problem

When WeChat users forward multiple messages together (合并转发), the resulting chat record uses app_type=17 or app_type=19. Previously, wechat-cli displayed these as [链接/文件], making the forwarded content unreadable.

Changes

1. New function: _format_merged_forward_message()

Parses the nested XML structure inside merged forward messages:

  • Extracts <recorditem> to <dataitem> elements from CDATA sections
  • Handles various data types (text, image, file, nested merged forward)
  • Falls back to parsing <desc> text when structured data is unavailable
  • Displays each message as: ├ sender [time]: content

2. Increased XML parse size limit

  • _XML_PARSE_MAX_LEN: 20000 -> 200000
  • Merged forward messages can contain very long chat records (e.g., 64KB+)

3. XML declaration cleanup

  • Added re.sub(r'<\?xml\b[^?]*\?>', '', content) before parsing
  • Fixes cases where <?xml version="1.0"?> is embedded inside the <msg> tag

4. App type routing

  • Added app_type in (17, 19) branch in _format_app_message_text()
  • Routes to the new merged forward parser before falling back to [链接/文件]

Before/After

Before:
[2026-05-11 20:52] User: [链接/文件]

After:
[2026-05-11 20:52] User: [合并转发] Chat History Title
  ├ Sender1 [20:00]: Hello...
  ├ Sender2 [20:01]: Discussion...
  ├ Sender3 [20:02]: Conclusion...

Testing

Tested with real WeChat merged forward messages from group chats, including a 64KB chat record containing dozens of messages.

- Add _format_merged_forward_message to parse app_type 17/19 XML
- Extract individual messages from <recorditem> <dataitem> elements
- Show sender, timestamp, and content for each forwarded message
- Handle nested forwarded messages and file references
- Fix XML parsing failure when <?xml?> declaration is embedded inside <msg>
- Increase _XML_PARSE_MAX_LEN from 20000 to 200000 for long chat records

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant