Skip to content

does this support Chinese word doc/docx file #6

@xwydq

Description

@xwydq

my docx file contain chinese like
四、我们确认,我们完全同意招标文件制定的投标规则,并承诺按照这些规则履行我们的所有义务,包括一旦投标文件被贵方接受,将履行社会资本合作方的义务

in my mac, i used doc_ripper and the result shows below

➜  ~ irb
irb(main):001:0> require 'doc_ripper'
=> true
irb(main):002:0> DocRipper::rip('/Users/datSource/test/docx1.docx')
=> "ç\u009B® å½\u0095 TOC \\o \"1-4\" \\h \\z \\u ä¸\u0080ã\u0080\u0081æ\u008A\u0095èµ\u0084ç\u0094³è¯·ä¹¦ PAGEREF _Toc448258241 \\h 2äº\u008Cã\u0080\u0081æ\u008E\u0088æ\u009D\u0083å§\u0094æ\u0089\u0098书 PAGEREF _Toc448258242 \\h 5ä¸\u0089ã\u0080\u0081å¼\u0080æ \u0087ä¸\u0080è§\u0088表 PAGEREF _Toc448258243 \\h 6å\u009B\u009Bã\u0080\u0081è¯\u0084å\u0088\u0086ç´

how can i get the right plain text

thks!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions