diff --git a/1.2/index.bs b/1.2/index.bs index 59ce6d4..a5f63da 100644 --- a/1.2/index.bs +++ b/1.2/index.bs @@ -656,10 +656,34 @@ Issue: [ocr_carea vs ocrx_block](https://github.com/kba/hocr-spec/issues/28) ### `ocrx_line` -Issue: [ocr_line vs ocrx_line](https://github.com/kba/hocr-spec/issues/19) - - * any kind of "line" returned by an OCR system that differs from the standard ocr_line above + * any kind of "line" returned by an OCR system that differs from [[#ocr_line]] * might be some kind of "logical" line + * examples include line continuations and rowspan in tables + +
+ +Consider the following snippet, containing a wide-spaced heading broken over +two physical lines: + +
+ Wide spaced two line heading +
+ +An OCR engine could produce the following output, indicating the two physical +lines that form a single logical line: + +```html +... + + Aus den Gewinn- und Verlust- + rechnungen + +... +``` +
### `ocrx_word` diff --git a/1.2/index.html b/1.2/index.html index f83ecca..7e8ed50 100644 --- a/1.2/index.html +++ b/1.2/index.html @@ -2124,13 +2124,29 @@

9.1.2. ocrx_line

-

ocr_line vs ocrx_line

+
+ +

Consider the following snippet, containing a wide-spaced heading broken over +two physical lines:

+
Wide spaced two line heading
+

An OCR engine could produce the following output, indicating the two physical +lines that form a single logical line:

+
...
+<span class="ocrx_line">
+  <span class='ocr_line' title="bbox 16 16 860 47">Aus den Gewinn- und Verlust-</span>
+  <span class='ocr_line' title="bbox 302 62 603 98">rechnungen</span> 
+</span>
+...
+
+

9.1.3. ocrx_word