ocrx_line example by kba · Pull Request #39 · kba/hocr-spec

kba · 2016-10-01T15:04:53Z

No description provided.

amitdo · 2016-10-02T12:34:20Z

1.2/spec.md

+
+```html
+...
+<span class="ocrx_line">


ocr_lines nested in ocrx_line? That's doesn't look right to me.

It's ocr_line nested in ocrx_line, in this case a single heading split over two lines.

But I'll gladly make a better example if you have an idea. What i've seen in the wild is just replacements for ocr_line, e.g. https://github.com/jwilk/ocrodjvu/blob/master/lib/hocr.py.

It's ocr_line nested in ocrx_line

Yeah, I fixed my original mistake...

Sadly, I don't know what is the right way in this case.

ocrx_line is engine-specific line markup. It exists for those cases where your OCR engine outputs text lines that don't correspond to "normal" text lines.

The most common case is if you apply an engine that's not capable of column segmentation to a multi-column document and you want to prevent subsequent processing stages from assuming that the text lines it gets contain text in reading order.

Basically, if you use ocrx_line instead of ocr_line, you're (intentionally) breaking most subsequent processing, since most OCR output processing will look for ocr_line tags (and assume they are in reading order).

Tom, thanks for clarifying this for us.

fix #19 fix #39

Example for ocrx_line, #19

b69b342

kba force-pushed the ocrx_line-example branch from cd35c43 to b69b342 Compare October 1, 2016 15:06

amitdo reviewed Oct 2, 2016

View reviewed changes

amitdo mentioned this pull request Oct 22, 2016

ocr_line vs. ocrx_line #19

Open

kba added a commit that referenced this pull request Nov 30, 2017

Add note on ocrx_line by @tmbdev

e2bf67d

fix #19 fix #39

kba mentioned this pull request Nov 30, 2017

Add note on ocrx_line by @tmbdev #105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocrx_line example#39

ocrx_line example#39
kba wants to merge 1 commit intomasterfrom
ocrx_line-example

kba commented Oct 1, 2016

Uh oh!

amitdo Oct 2, 2016 •

edited

Loading

Uh oh!

kba Oct 2, 2016

Uh oh!

amitdo Oct 2, 2016

Uh oh!

amitdo Oct 2, 2016

Uh oh!

tmbdev Oct 22, 2016

Uh oh!

amitdo Oct 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kba commented Oct 1, 2016

Uh oh!

amitdo Oct 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kba Oct 2, 2016

Choose a reason for hiding this comment

Uh oh!

amitdo Oct 2, 2016

Choose a reason for hiding this comment

Uh oh!

amitdo Oct 2, 2016

Choose a reason for hiding this comment

Uh oh!

tmbdev Oct 22, 2016

Choose a reason for hiding this comment

Uh oh!

amitdo Oct 22, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amitdo Oct 2, 2016 •

edited

Loading