You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ISO-2022-JP, Shift_JIS, EUC-KR, UTF-16BE, UTF-16LE, and x-user-defined.
65
65
66
+
For details on Lexbor's encoding support, see the [Encoding module documentation](../modules/encoding.md).
67
+
66
68
## Preprocessing
67
69
68
70
Once we have decoded the bytes into Unicode characters, we need to perform a
@@ -448,7 +450,7 @@ be the corresponding ID from the enumeration.
448
450
449
451
Example:
450
452
451
-
```c
453
+
```C
452
454
typedefenum {
453
455
LXB_TAG__UNDEF = 0x0000,
454
456
LXB_TAG__END_OF_FILE = 0x0001,
@@ -471,15 +473,15 @@ the DOM (Document Object Model) will include a `Tag ID`. This approach avoids
471
473
the need for two comparisons: one for the node type and one for the element.
472
474
Instead, a single check can be performed:
473
475
474
-
```c
476
+
```C
475
477
if (node->tag_id == LXB_TAG_DIV) {
476
478
/* Optimal code */
477
479
}
478
480
```
479
481
480
482
Alternatively, you could use:
481
483
482
-
```c
484
+
```C
483
485
if (node->type == LXB_DOM_NODE_TYPE_ELEMENT && node->tag_id == LXB_TAG_DIV) {
484
486
/* Oh my code */
485
487
}
@@ -536,7 +538,7 @@ tree. This is achieved using the Flags bitmap field.
536
538
537
539
The field can contain the following values:
538
540
539
-
```c
541
+
```C
540
542
enum {
541
543
LXB_HTML_TOKEN_TYPE_OPEN = 0x0000,
542
544
LXB_HTML_TOKEN_TYPE_CLOSE = 0x0001,
@@ -550,6 +552,9 @@ enum {
550
552
LXB_HTML_TOKEN_TYPE_DONE = 0x0100
551
553
};
552
554
```
555
+
556
+
**Note:** This enum reflects an earlier version of the codebase. In the current implementation (see `source/lexbor/html/token.h`), the `TEXT`, `DATA`, `RCDATA`, `CDATA`, and `NULL` token types have been removed, and the remaining values have been renumbered.
557
+
553
558
Besides the opening/closing token type, there are additional values for the data
554
559
converter. Only the tokenizer knows how to correctly convert data, and it marks
555
560
the token to indicate how the data should be processed.
@@ -232,7 +232,7 @@ It might look something like this:
232
232
div {width: 10px!important}
233
233
```
234
234
235
-
```html
235
+
```
236
236
"div {" — Selectors parse
237
237
"width: 10px !important}" — Declarations parse
238
238
```
@@ -355,6 +355,8 @@ like `Qualified Rule`, `At-Rule`, etc., as well as different system phases.
355
355
There is also a stack due to the recursive nature of CSS structures, which
356
356
avoids recursion directly.
357
357
358
+
**Note:** The `LXB_CSS_SYNTAX_TOKEN__TERMINATED` token and the `lxb_css_syntax_parser_token()` function described above reflect the internal parsing architecture. For the public API, see the [CSS module documentation](../modules/css.md).
359
+
358
360
**Pros:**
359
361
1. Complete control over the tokenizer.
360
362
2. Speed, as everything happens on the fly.
@@ -374,15 +376,15 @@ is structured. Values in grammars can include combinators and multipliers.
374
376
375
377
**Sequential Order**
376
378
377
-
```html
379
+
```
378
380
<my> = a b c
379
381
```
380
382
381
383
`<my>` can contain the following value:
382
384
-`<my> = a b c`
383
385
384
386
**One Value from the List**:
385
-
```html
387
+
```
386
388
<my> = a | b | c
387
389
```
388
390
@@ -392,7 +394,7 @@ is structured. Values in grammars can include combinators and multipliers.
392
394
-`<my> = c`
393
395
394
396
**One or All Values from the List in Any Order**:
395
-
```html
397
+
```
396
398
<my> = a || b || c
397
399
```
398
400
@@ -437,7 +439,7 @@ For those familiar with regular expressions, this concept will be immediately
437
439
clear.
438
440
439
441
**Zero or Infinite Number of Times**:
440
-
```html
442
+
```
441
443
<my> = a*
442
444
```
443
445
@@ -447,7 +449,7 @@ clear.
447
449
-`<my> = `
448
450
449
451
**One or Infinite Number of Times**:
450
-
```html
452
+
```
451
453
<my> = a+
452
454
```
453
455
@@ -456,7 +458,7 @@ clear.
456
458
-`<my> = a a a a a a a a a a a a a`
457
459
458
460
**May or May Not be Present**:
459
-
```html
461
+
```
460
462
<my> = a?
461
463
```
462
464
@@ -465,7 +467,7 @@ clear.
465
467
-`<my> = `
466
468
467
469
**May be Present from `A` to `B` Times, Period**:
468
-
```html
470
+
```
469
471
<my> = a{1,4}
470
472
```
471
473
@@ -476,7 +478,7 @@ clear.
476
478
-`<my> = a a a a`
477
479
478
480
**One or Infinite Number of Times Separated by Comma**:
479
-
```html
481
+
```
480
482
<my> = a#
481
483
```
482
484
@@ -487,7 +489,7 @@ clear.
487
489
-`<my> = a, a, a, a`
488
490
489
491
**Exactly One Value Must be Present**:
490
-
```html
492
+
```
491
493
<my> = [a? | b? | c?]!
492
494
```
493
495
@@ -497,7 +499,7 @@ error.
497
499
498
500
**Multipliers can be Combined**:
499
501
500
-
```html
502
+
```
501
503
<my> = a#{1,5}
502
504
```
503
505
@@ -547,7 +549,7 @@ The main problems I encountered:
0 commit comments