diff --git a/source/articles/example-CSS-selectors-easy-way.md b/source/articles/example-CSS-selectors-easy-way.md index f9a3239..08a2e68 100644 --- a/source/articles/example-CSS-selectors-easy-way.md +++ b/source/articles/example-CSS-selectors-easy-way.md @@ -1,5 +1,7 @@ # Examples: CSS selectors, the easy way +For the full CSS and Selectors API reference, see the [CSS module](../modules/css.md) and [Selectors module](../modules/selectors.md) documentation. + Let's start with an easy example of using `lexbor` for parsing and serializing CSS selectors. This example breaks down the major steps and elements, explaining the overall purpose, requirements, and assumptions at each step. @@ -30,7 +32,7 @@ real-world example will be provided later. The code includes the necessary header files and defines a callback function (`callback`) that prints the parsed data. -```c +```C #include lxb_status_t callback(const lxb_char_t *data, size_t len, void *ctx) @@ -45,7 +47,7 @@ lxb_status_t callback(const lxb_char_t *data, size_t len, void *ctx) The `main` function initializes the CSS parser, parses a CSS selector string, and then serializes the resulting selector list. -```c +```C int main(int argc, const char *argv[]) { // ... (variable declarations) @@ -81,7 +83,7 @@ int main(int argc, const char *argv[]) The code defines a CSS selector string (`slctrs`) and initializes the CSS parser. -```c +```C static const lxb_char_t slctrs[] = ":has(div, :not(as, 1%, .class), #hash)"; parser = lxb_css_parser_create(); @@ -94,7 +96,7 @@ status = lxb_css_parser_init(parser, NULL); The code parses the CSS selector string, checks for parsing errors, and prints the result. -```c +```C list = lxb_css_selectors_parse(parser, slctrs, sizeof(slctrs) / sizeof(lxb_char_t) - 1); @@ -109,7 +111,7 @@ if (parser->status != LXB_STATUS_OK) { The example serializes the parsed selector list and prints any parser logs. -```c +```C printf("Result: "); (void) lxb_css_selector_serialize_list(list, callback, NULL); printf("\n"); @@ -118,7 +120,7 @@ printf("\n"); if (lxb_css_log_length(lxb_css_parser_log(parser)) != 0) { printf("Log:\n"); // Serialize parser logs with proper indentation. - (void) lxb_css_log_serialize(parser->log, callback, NULL, + (void) lxb_css_log_serialize(lxb_css_parser_log(parser), callback, NULL, indent, indent_length); printf("\n"); } @@ -130,7 +132,7 @@ if (lxb_css_log_length(lxb_css_parser_log(parser)) != 0) { Finally, the code destroys resources for the parser and frees memory allocated for the selector list. -```c +```C (void) lxb_css_parser_destroy(parser, true); lxb_css_selector_list_destroy_memory(list); ``` diff --git a/source/articles/part-1-html.md b/source/articles/part-1-html.md index 3dc9506..b1c3ae3 100644 --- a/source/articles/part-1-html.md +++ b/source/articles/part-1-html.md @@ -1,5 +1,7 @@ # Part one: HTML +**Note:** This article was written during the early development of the Lexbor HTML parser. Some code examples and internal values (such as token type flags) may differ from the current implementation. For up-to-date API reference, see the [HTML module documentation](../modules/html.md). + Hello, everyone! In this article, I will explain how to create a superfast HTML parser that @@ -61,6 +63,8 @@ windows-874, windows-1250, windows-1251, windows-1252, windows-1254, windows-1255, windows-1256, windows-1257, windows-1258, gb18030, Big5, ISO-2022-JP, Shift_JIS, EUC-KR, UTF-16BE, UTF-16LE, and x-user-defined. +For details on Lexbor's encoding support, see the [Encoding module documentation](../modules/encoding.md). + ## Preprocessing Once we have decoded the bytes into Unicode characters, we need to perform a @@ -446,7 +450,7 @@ be the corresponding ID from the enumeration. Example: -```c +```C typedef enum { LXB_TAG__UNDEF = 0x0000, LXB_TAG__END_OF_FILE = 0x0001, @@ -469,7 +473,7 @@ the DOM (Document Object Model) will include a `Tag ID`. This approach avoids the need for two comparisons: one for the node type and one for the element. Instead, a single check can be performed: -```c +```C if (node->tag_id == LXB_TAG_DIV) { /* Optimal code */ } @@ -477,7 +481,7 @@ if (node->tag_id == LXB_TAG_DIV) { Alternatively, you could use: -```c +```C if (node->type == LXB_DOM_NODE_TYPE_ELEMENT && node->tag_id == LXB_TAG_DIV) { /* Oh my code */ } @@ -534,7 +538,7 @@ tree. This is achieved using the Flags bitmap field. The field can contain the following values: -```c +```C enum { LXB_HTML_TOKEN_TYPE_OPEN = 0x0000, LXB_HTML_TOKEN_TYPE_CLOSE = 0x0001, @@ -548,6 +552,9 @@ enum { LXB_HTML_TOKEN_TYPE_DONE = 0x0100 }; ``` + +**Note:** This enum reflects an earlier version of the codebase. In the current implementation (see `source/lexbor/html/token.h`), the `TEXT`, `DATA`, `RCDATA`, `CDATA`, and `NULL` token types have been removed, and the remaining values have been renumbered. + Besides the opening/closing token type, there are additional values for the data converter. Only the tokenizer knows how to correctly convert data, and it marks the token to indicate how the data should be processed. @@ -675,7 +682,7 @@ tree_build_in_body_character(token) { ``` In Lexbor HTML: -```c +```C tree_build_in_body_character(token) { lexbor_str_t str = {0}; lxb_html_parser_char_t pc = {0}; @@ -698,7 +705,7 @@ As illustrated by the example, we have removed all character-based conditions and created a common function for text processing. This function takes an argument with data transformation settings: -```c +```C pc.replace_null /* Replace each '\0' with REPLACEMENT CHARACTER (U+FFFD) */ pc.drop_null /* Delete all '\0's */ pc.is_attribute /* If data is an attribute value, we need smarter parsing */ diff --git a/source/articles/part-2-css.md b/source/articles/part-2-css.md index 4ab8a03..b27e86b 100644 --- a/source/articles/part-2-css.md +++ b/source/articles/part-2-css.md @@ -1,5 +1,7 @@ # Part Two: CSS +**Note:** This article was written during the early development of the Lexbor CSS parser. Some internal details may differ from the current implementation. For up-to-date API reference, see the [CSS module documentation](../modules/css.md). For current project status, see the [Roadmap](../roadmap.md). + Hello, everyone! We continue our series on developing a browser engine. Better late than never! @@ -90,7 +92,7 @@ div {width: 10px !important} ``` The tokenizer generates tokens: -```html +``` "div" — " " — "{" — @@ -180,7 +182,7 @@ to these callbacks. It would look something like this: div {width: 10px !important} ``` -```html +``` "div" — callback_qualified_rule_prelude() " " — callback_qualified_rule_prelude() — callback_qualified_rule_prelude() @@ -230,7 +232,7 @@ It might look something like this: div {width: 10px !important} ``` -```html +``` "div {" — Selectors parse "width: 10px !important}" — Declarations parse ``` @@ -353,6 +355,8 @@ like `Qualified Rule`, `At-Rule`, etc., as well as different system phases. There is also a stack due to the recursive nature of CSS structures, which avoids recursion directly. +**Note:** The `LXB_CSS_SYNTAX_TOKEN__TERMINATED` token and the `lxb_css_syntax_parser_token()` function described above reflect the internal parsing architecture. For the public API, see the [CSS module documentation](../modules/css.md). + **Pros:** 1. Complete control over the tokenizer. 2. Speed, as everything happens on the fly. @@ -372,7 +376,7 @@ is structured. Values in grammars can include combinators and multipliers. **Sequential Order** -```html +``` = a b c ``` @@ -380,7 +384,7 @@ is structured. Values in grammars can include combinators and multipliers. - ` = a b c` **One Value from the List**: -```html +``` = a | b | c ``` @@ -390,7 +394,7 @@ is structured. Values in grammars can include combinators and multipliers. - ` = c` **One or All Values from the List in Any Order**: -```html +``` = a || b || c ``` @@ -435,7 +439,7 @@ For those familiar with regular expressions, this concept will be immediately clear. **Zero or Infinite Number of Times**: -```html +``` = a* ``` @@ -445,7 +449,7 @@ clear. - ` = ` **One or Infinite Number of Times**: -```html +``` = a+ ``` @@ -454,7 +458,7 @@ clear. - ` = a a a a a a a a a a a a a` **May or May Not be Present**: -```html +``` = a? ``` @@ -463,7 +467,7 @@ clear. - ` = ` **May be Present from `A` to `B` Times, Period**: -```html +``` = a{1,4} ``` @@ -474,7 +478,7 @@ clear. - ` = a a a a` **One or Infinite Number of Times Separated by Comma**: -```html +``` = a# ``` @@ -485,7 +489,7 @@ clear. - ` = a, a, a, a` **Exactly One Value Must be Present**: -```html +``` = [a? | b? | c?]! ``` @@ -495,7 +499,7 @@ error. **Multipliers can be Combined**: -```html +``` = a#{1,5} ``` @@ -545,7 +549,7 @@ The main problems I encountered: For example, consider this grammar: -```html +``` = none | [ underline || overline || line-through || blink ] = solid | double | dotted | dashed | wavy = @@ -561,7 +565,7 @@ To manage this, I implemented a limiter for group options using `/1`. This notation indicates how many options should be selected from the group. As a result, `` was transformed into: -```html +``` = || || /1 ``` @@ -572,7 +576,7 @@ spaces between them. This approach is insufficient; we need to address this directly in the grammar. To handle this, the `^WS` modifier (Without Spaces) was introduced: -```html +``` = ^WS < @@ -595,7 +599,7 @@ For example: Tests would be generated as follows: -```html +``` = a b c = a c b = b a c diff --git a/source/documentation.md b/source/documentation.md index 6b59baf..5bcfe8a 100644 --- a/source/documentation.md +++ b/source/documentation.md @@ -7,7 +7,7 @@ These steps show how to use `lexbor` in your code. They assume you are using Lin 2. Let's parse some sample HTML markup. Save the following code as `myhtml.c`: - ```c + ```C #include #include @@ -82,7 +82,7 @@ Optional flags recognized by the `cmake` command: | LEXBOR_BUILD_SEPARATELY | OFF | Build all modules separately. Each module will have its own library (shared and static). | | LEXBOR_BUILD_EXAMPLES | OFF | Build example programs. | | LEXBOR_BUILD_TESTS | OFF | Build tests. | -| LEXBOR_BUILD_TESTS_CPP | ON | Build C++ tests to verify library operation in C++. Requires `LEXBOR_BUILD_TESTS`. | +| LEXBOR_BUILD_TESTS_CPP | OFF | Build C++ tests to verify library operation in C++. Requires `LEXBOR_BUILD_TESTS`. | | LEXBOR_TEST_AMALGAMATION | OFF | Build tests for the amalgamation file. Requires `LEXBOR_BUILD_TESTS`. | | LEXBOR_BUILD_UTILS | OFF | Build project utilities and helpers. | | LEXBOR_BUILD_WITH_ASAN | OFF | Enable Address Sanitizer if possible. | @@ -163,15 +163,15 @@ make We focus on minimal dependencies, custom algorithms, and platform-specific solutions: -- The project is written in pure `C` without external prerequisites. We believe - in a "go hard or go home" approach. +- The project is written in pure `C` without external prerequisites. We aim for + self-contained, minimal-dependency code. - While we're not reinventing every algorithm known to humankind, we handle object creation and memory management in our own way. Many classic algorithms used in `lexbor` are adapted to meet the specific needs of the project. -- We're open to using third-party code, but it’s often simpler to start from - scratch than to add extra dependencies (looking at you, Node.js). +- We're open to using third-party code, but it's often simpler to start from + scratch than to add extra dependencies. - Some functions are platform-dependent, such as threading, timers, I/O, and blocking primitives (spinlocks, mutexes). For these, we have a separate `port` @@ -183,7 +183,7 @@ We focus on minimal dependencies, custom algorithms, and platform-specific solut There are four main dynamic memory functions: -```c +```C void * lexbor_malloc(size_t size); @@ -199,10 +199,10 @@ lexbor_free(void *dst); These functions: -- Are defined in `/source/lexbor/core/lexbor.h` (in the [core](#core) module). +- Are defined in `source/lexbor/core/lexbor.h` (in the [core](#core) module). -- Are implemented in `/source/port/*/lexbor/core/memory.c` (in the `port` - module). +- Are implemented in `source/lexbor/ports/*/lexbor/core/memory.c` (in the + `ports` module). - Can be redefined if needed. @@ -212,7 +212,7 @@ As the names suggest, they serve as replacements for the standard `malloc`, function returns a `void *` that is always `NULL`. This simplifies the process of nullifying freed variables: -```c +```C if (object->table != NULL) { object->table = lexbor_free(object->table); } @@ -220,7 +220,7 @@ if (object->table != NULL) { Without this, you'd need to explicitly nullify `object->table`: -```c +```C if (object->table != NULL) { lexbor_free(object->table); object->table = NULL; @@ -243,8 +243,8 @@ If a function can fail, it should report the failure. We follow two main rules w Status codes are passed as `lxb_status_t`. This type is defined throughout the -codebase in `/source/lexbor/core/types.h`, and all available status codes are -listed in `/source/lexbor/core/base.h`. +codebase in `source/lexbor/core/types.h`, and all available status codes are +listed in `source/lexbor/core/base.h`. ## Function Naming @@ -260,7 +260,7 @@ Most functions follow this naming pattern: -The exception is the [core](#core) module (`/source/lexbor/core/`), which uses a +The exception is the [core](#core) module (`source/lexbor/core/`), which uses a different pattern: [naming2]: img/naming2.png @@ -278,11 +278,11 @@ without exceptions. ## Header Locations -All paths are relative to the `/source/` directory. For example, to include a -header file from the [html](#html) module located in `/source/lexbor/html/`, +All paths are relative to the `source/` directory. For example, to include a +header file from the [html](#html) module located in `source/lexbor/html/`, use: -```c +```C #include "lexbor/html/tree.h" ``` @@ -291,7 +291,7 @@ use: Most structures and objects have an API for creating, initializing, cleaning, and deleting them. This follows the general pattern: -```c +```C * _create(void); @@ -315,7 +315,7 @@ void typically return `void`. - If `NULL` is passed as the first argument (the object) to the `*_init` - function, it returns `LXB_STATUS_ERROR_OBJECT_NULL`. + function, it returns `LXB_STATUS_ERROR_OBJECT_IS_NULL`. - When the `*_destroy` function is called with `self_destroy` set to `true`, the returned value is always `NULL`; otherwise, the object (`obj`) is returned. @@ -329,36 +329,38 @@ void Typical usage: -```c +```C lexbor_avl_t *avl = lexbor_avl_create(); -lxb_status_t status = lexbor_avl_init(avl, 1024); +lxb_status_t status = lexbor_avl_init(avl, 1024, + sizeof(lexbor_avl_node_t)); if (status != LXB_STATUS_OK) { - lexbor_avl_node_destroy(avl, true); + lexbor_avl_destroy(avl, true); exit(EXIT_FAILURE); } /* Do something super useful */ -lexbor_avl_node_destroy(avl, true); +lexbor_avl_destroy(avl, true); ``` Now, with an object on the stack: -```c +```C lexbor_avl_t avl = {0}; -lxb_status_t status = lexbor_avl_init(&avl, 1024); +lxb_status_t status = lexbor_avl_init(&avl, 1024, + sizeof(lexbor_avl_node_t)); if (status != LXB_STATUS_OK) { - lexbor_avl_node_destroy(&avl, false); + lexbor_avl_destroy(&avl, false); exit(EXIT_FAILURE); } /* Do something even more useful */ -lexbor_avl_node_destroy(&avl, false); +lexbor_avl_destroy(&avl, false); ``` Note that this approach is not an absolute requirement, even though it is @@ -371,23 +373,24 @@ The `lexbor` project is designed to be modular, allowing each module to be built separately if desired. Modules can depend on each other; for instance, all modules currently rely on the [core](#core) module. -Each module is located in a subdirectory within the `/source/` directory of the +Each module is located in a subdirectory within the `source/` directory of the project. ### Module Versioning Each module records its version in the `base.h` file located at the module root. -For example, see `/source/lexbor/html/base.h`: +For example, see `source/lexbor/html/base.h`: -```c +```C #define _VERSION_MAJOR 1 #define _VERSION_MINOR 0 #define _VERSION_PATCH 3 -#define _VERSION_STRING LXB_STR(_VERSION_MAJOR) LXB_STR(.) \ - LXB_STR(_VERSION_MINOR) LXB_STR(.) \ - LXB_STR(_VERSION_PATCH) +#define _VERSION_STRING \ + LEXBOR_STRINGIZE(_VERSION_MAJOR) "." \ + LEXBOR_STRINGIZE(_VERSION_MINOR) "." \ + LEXBOR_STRINGIZE(_VERSION_PATCH) ``` @@ -398,7 +401,7 @@ as AVL and BST trees, arrays, and strings. It also handles memory management. The module is continuously evolving with new algorithms being added and existing ones optimized. -Documentation for this module will be available later. +See the [Core module documentation](modules/core) for API reference. ### DOM @@ -406,7 +409,7 @@ Documentation for this module will be available later. This module implements the [DOM specification](https://dom.spec.whatwg.org/). Its functions manage the DOM tree, including its nodes, attributes, and events. -Documentation for this module will be available later. +See the [DOM module documentation](modules/dom) for API reference. ### HTML @@ -417,8 +420,7 @@ specification](https://html.spec.whatwg.org/multipage/). Current implementations include: Tokenizer, Tree Builder, Parser, Fragment Parser, and Interfaces for HTML Elements. -Documentation for this module will be available later. For guidance, refer to -the +See the [HTML module documentation](modules/html) for API reference. Also see the [HTML examples](https://github.com/lexbor/lexbor/tree/master/examples/lexbor/html) in our repo or the corresponding [articles](articles/index). @@ -440,9 +442,8 @@ windows-1255, windows-1256, windows-1257, windows-1258, windows-874, x-mac-cyrillic, x-user-defined ``` -Documentation for this module will be available later. For guidance, refer to -the [Encoding -examples](https://github.com/lexbor/lexbor/tree/master/examples/lexbor/encoding) +See the [Encoding module documentation](modules/encoding) for API reference. Also see the +[Encoding examples](https://github.com/lexbor/lexbor/tree/master/examples/lexbor/encoding) in our repo or the corresponding [articles](articles/index). @@ -450,7 +451,6 @@ in our repo or the corresponding [articles](articles/index). This module implements the [CSS specification](https://drafts.csswg.org/). -Documentation for this module will be available later. For guidance, refer to -the [CSS -examples](https://github.com/lexbor/lexbor/tree/master/examples/lexbor/css) in +See the [CSS module documentation](modules/css) for API reference. Also see the +[CSS examples](https://github.com/lexbor/lexbor/tree/master/examples/lexbor/css) in our repo or the corresponding [articles](articles/index). diff --git a/source/download.md b/source/download.md index cad48aa..903bdab 100644 --- a/source/download.md +++ b/source/download.md @@ -4,19 +4,20 @@ The `lexbor` binaries are available for: -* [CentOS](#centos) 6, 7, 8 +* [CentOS](#centos) 7 -* [Debian](#debian) 8, 9, 10, 11 +* [Debian](#debian) 11, 12 -* [Fedora](#fedora) 28, 29, 30, 31, 32, 33, 34, 36, 37 +* [Fedora](#fedora) 39, 40, 41 -* [RHEL](#rhel) 7, 8 +* [RHEL](#rhel) 8, 9 -* [Ubuntu](#ubuntu) 14.04, 16.04, 18.04, 18.10, 19.04, 19.10, 20.04, 20.10, - 21.04, 22.04 +* [Ubuntu](#ubuntu) 20.04, 22.04, 24.04 * [macOS](#macos) +**Note:** Older distribution versions that have reached end-of-life are no longer listed. If you need packages for an older version, check the repository at `packages.lexbor.com` directly. + ### CentOS @@ -52,14 +53,14 @@ curl https://lexbor.com/keys/lexbor_signing.key | \ ``` 2. To configure the `lexbor` repository, create the following file named - `/etc/apt/sources.list.d/lexbor.list`. For Debian 11: + `/etc/apt/sources.list.d/lexbor.list`. For Debian 12: ```ini -deb-src [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/debian/ bullseye liblexbor -deb [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/debian/ bullseye liblexbor +deb-src [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/debian/ bookworm liblexbor +deb [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/debian/ bookworm liblexbor ``` - Supported distros also include `buster` (10), `stretch` (9), and `jessie` (8). + Supported distros also include `bullseye` (11). 3. Install the core `lexbor` package and any additional packages you need: @@ -86,8 +87,8 @@ deb [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/debian/ 2. Install the core `lexbor` package and any additional packages you need: ```sh - yum install liblexbor - yum install liblexbor-dev + dnf install liblexbor + dnf install liblexbor-devel ``` @@ -108,7 +109,7 @@ deb [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/debian/ ```sh yum install liblexbor - yum install liblexbor-dev + yum install liblexbor-devel ``` @@ -125,16 +126,14 @@ curl https://lexbor.com/keys/lexbor_signing.key | \ ``` 2. To configure the `lexbor` repository, create the following file named - `/etc/apt/sources.list.d/lexbor.list`. For Ubuntu 20.04: + `/etc/apt/sources.list.d/lexbor.list`. For Ubuntu 22.04: ```ini -deb-src [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/ubuntu/ focal liblexbor -deb [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/ubuntu/ focal liblexbor +deb-src [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/ubuntu/ jammy liblexbor +deb [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/ubuntu/ jammy liblexbor ``` - Supported distros also include `hirsute` (21.04), `groovy` (20.10), `focal` - (20.04), `eoan` (19.10), `disco` (19.04), `cosmic` (18.10), `bionic` (18.04), - `xenial` (16.04), and `trusty` (14.04). + Supported distros also include `noble` (24.04) and `focal` (20.04). 3. Install the core `lexbor` package and any additional packages you need: @@ -145,7 +144,7 @@ deb [signed-by=/etc/apt/keyrings/lexbor.gpg] https://packages.lexbor.com/ubuntu/ ``` -## macOS +### macOS ### Homebrew diff --git a/source/modules/core.md b/source/modules/core.md index ba109a7..0e06472 100644 --- a/source/modules/core.md +++ b/source/modules/core.md @@ -8,41 +8,573 @@ ## Overview -The Core module is the foundation of lexbor. It implements essential data structures, algorithms, and memory management used by all other modules. +The Core module is the foundation of lexbor. It implements essential data structures, memory management, and utilities used by all other modules. Written in pure C99 with zero external dependencies. -Core provides the building blocks that all other modules depend on. It's written in pure C99 with zero external dependencies, making it highly portable and easy to embed. +## Key Features + +- **Zero Dependencies** — pure C99, no external libraries required +- **Object Lifecycle** — all types follow the `create`/`init`/`clean`/`destroy` pattern +- **Dual Function Variants** — performance-critical accessors have both inline and non-inline (`_noi`) versions for ABI stability +- **Pool Allocation** — `lexbor_dobject_t` recycles fixed-size objects; `lexbor_mraw_t` provides general-purpose pooled allocation +- **Platform Abstraction** — portable across operating systems via `source/lexbor/ports/` ## What's Inside -- **Memory Management** — custom allocators optimized for parser performance - - `lexbor_malloc`, `lexbor_calloc`, `lexbor_realloc`, `lexbor_free` - - Memory pools for fast object allocation - -- **Data Structures** - - AVL trees — self-balancing binary search trees - - BST trees — binary search trees - - Arrays — dynamic arrays with automatic growth - - Strings — efficient string handling with SSO (Small String Optimization) - - Hash tables — fast key-value lookups - - Vectors — generic dynamic arrays - -- **Base Types** — common types used across all modules - - `lxb_status_t` — status codes for error handling - - `lxb_char_t` — character type (unsigned char) - - `lxb_codepoint_t` — Unicode code point - - and more... - -- **Utilities** - - String operations (case conversion, comparison, hashing) - - Number parsing and conversion - - Bit operations - - Debugging helpers +- **[Status Codes](#status-codes-lxb_status_t)** — error handling codes used by all modules +- **[Action Type](#action-type-lexbor_action_t)** — callback return values for iteration control +- **[Base Types](#base-types)** — common types used across all modules +- **[Memory Allocator](#memory-allocator-lexbor_mraw_t)** — pooled memory allocator with caching +- **[Dynamic Object Pool](#dynamic-object-pool-lexbor_dobject_t)** — fixed-size object pool allocator +- **[Array](#array-lexbor_array_t)** — dynamic array of void pointers +- **[Object Array](#object-array-lexbor_array_obj_t)** — dynamic array storing objects by value +- **[String](#string-lexbor_str_t)** — dynamically resizable string +- **[Hash Table](#hash-table-lexbor_hash_t)** — hash table with short string optimization +- **[AVL Tree](#avl-tree-lexbor_avl_t)** — self-balancing AVL tree -## Key Features +## Quick Start + +### Using Arrays and Strings + +```C +#include + +int main(void) +{ + lxb_status_t status; + + /* Create a memory allocator */ + lexbor_mraw_t *mraw = lexbor_mraw_create(); + status = lexbor_mraw_init(mraw, 256); + if (status != LXB_STATUS_OK) { + return EXIT_FAILURE; + } + + /* Create and populate an array */ + lexbor_array_t *array = lexbor_array_create(); + status = lexbor_array_init(array, 4); + if (status != LXB_STATUS_OK) { + lexbor_mraw_destroy(mraw, true); + return EXIT_FAILURE; + } + + /* Initialize a string */ + lexbor_str_t str = {0}; + lxb_char_t *data = lexbor_str_init(&str, mraw, 32); + if (data == NULL) { + lexbor_array_destroy(array, true); + lexbor_mraw_destroy(mraw, true); + return EXIT_FAILURE; + } + + /* Append text to the string */ + lexbor_str_append(&str, mraw, + (const lxb_char_t *) "Hello, lexbor!", 14); + + /* Store the string pointer in the array */ + lexbor_array_push(array, &str); + + printf("String: %.*s (length: %zu)\n", + (int) str.length, str.data, str.length); + printf("Array length: %zu\n", lexbor_array_length(array)); + + /* Cleanup */ + lexbor_str_destroy(&str, mraw, false); + lexbor_array_destroy(array, true); + lexbor_mraw_destroy(mraw, true); + + return EXIT_SUCCESS; +} +``` + +**Output:** +``` +String: Hello, lexbor! (length: 14) +Array length: 1 +``` + +## Status Codes (`lxb_status_t`) + +All lexbor functions return `lxb_status_t` for error handling. Defined in `source/lexbor/core/base.h`. + +### Location + +Status codes are defined in `source/lexbor/core/base.h`. + +```C +typedef enum { + LXB_STATUS_OK = 0x0000, + LXB_STATUS_ERROR = 0x0001, + LXB_STATUS_ERROR_MEMORY_ALLOCATION, + LXB_STATUS_ERROR_OBJECT_IS_NULL, + LXB_STATUS_ERROR_SMALL_BUFFER, + LXB_STATUS_ERROR_INCOMPLETE_OBJECT, + LXB_STATUS_ERROR_NO_FREE_SLOT, + LXB_STATUS_ERROR_TOO_SMALL_SIZE, + LXB_STATUS_ERROR_NOT_EXISTS, + LXB_STATUS_ERROR_WRONG_ARGS, + LXB_STATUS_ERROR_WRONG_STAGE, + LXB_STATUS_CONTINUE, + LXB_STATUS_STOP, + LXB_STATUS_ABORTED, + LXB_STATUS_STOPPED, + LXB_STATUS_NEXT, + LXB_STATUS_WARNING +} lxb_status_t; +``` + +`LXB_STATUS_OK` (`0x0000`) indicates success. All other values indicate errors or control flow signals. + + +## Action Type (`lexbor_action_t`) + +Used as callback return values to control iteration: + +### Location + +Defined in `source/lexbor/core/base.h`. + +```C +typedef enum { + LEXBOR_ACTION_OK = 0x00, /* continue */ + LEXBOR_ACTION_STOP = 0x01, /* stop iteration */ + LEXBOR_ACTION_NEXT = 0x02 /* skip to next */ +} lexbor_action_t; +``` + + +## Base Types + +Common types used across all modules (defined in `source/lexbor/core/base.h`): + +### Location + +Defined in `source/lexbor/core/base.h`. + +- `lxb_char_t` — character type (`unsigned char`) +- `lxb_codepoint_t` — Unicode code point +- `lexbor_serialize_cb_f` — serialization callback: `lxb_status_t (*)(const lxb_char_t *data, size_t len, void *ctx)` +- `lexbor_callback_f` — general callback: `lxb_status_t (*)(const lxb_char_t *data, size_t len, void *ctx)` + + +## Memory Allocator (`lexbor_mraw_t`) + +A pooled memory allocator with caching for reallocation. Used throughout lexbor for efficient allocation. Defined in `source/lexbor/core/mraw.h`. + +### Location + +Declared in `source/lexbor/core/mraw.h`. + +```C +typedef struct { + lexbor_mem_t *mem; + lexbor_bst_t *cache; + size_t ref_count; +} lexbor_mraw_t; +``` + +### Lifecycle + +```C +lexbor_mraw_t * +lexbor_mraw_create(void); + +lxb_status_t +lexbor_mraw_init(lexbor_mraw_t *mraw, size_t chunk_size); + +void +lexbor_mraw_clean(lexbor_mraw_t *mraw); + +lexbor_mraw_t * +lexbor_mraw_destroy(lexbor_mraw_t *mraw, bool destroy_self); +``` + +### Allocation + +```C +void *lexbor_mraw_alloc(lexbor_mraw_t *mraw, size_t size); +void *lexbor_mraw_calloc(lexbor_mraw_t *mraw, size_t size); +void *lexbor_mraw_realloc(lexbor_mraw_t *mraw, void *data, size_t new_size); +void lexbor_mraw_free(lexbor_mraw_t *mraw, void *data); +``` + +### Utility + +```C +/* Duplicate a memory block */ +void *lexbor_mraw_dup(lexbor_mraw_t *mraw, const void *src, size_t size); + +/* Get the allocated size of a block */ +size_t lexbor_mraw_data_size(void *data); + +/* Get reference count */ +size_t lexbor_mraw_reference_count(lexbor_mraw_t *mraw); +``` + + +## Dynamic Object Pool (`lexbor_dobject_t`) + +A pool allocator for frequently created and destroyed fixed-size objects. Allocates objects from chunks and recycles freed objects via an internal cache. Defined in `source/lexbor/core/dobject.h`. + +### Location + +Declared in `source/lexbor/core/dobject.h`. + +```C +typedef struct { + lexbor_mem_t *mem; + lexbor_array_t *cache; + size_t allocated; + size_t struct_size; +} lexbor_dobject_t; +``` + +### Lifecycle + +```C +lexbor_dobject_t * +lexbor_dobject_create(void); + +lxb_status_t +lexbor_dobject_init(lexbor_dobject_t *dobject, size_t chunk_size, size_t struct_size); + +void +lexbor_dobject_clean(lexbor_dobject_t *dobject); + +lexbor_dobject_t * +lexbor_dobject_destroy(lexbor_dobject_t *dobject, bool destroy_self); +``` + +### Operations + +```C +void *lexbor_dobject_alloc(lexbor_dobject_t *dobject); /* allocate (uninitialized) */ +void *lexbor_dobject_calloc(lexbor_dobject_t *dobject); /* allocate (zeroed) */ +void *lexbor_dobject_free(lexbor_dobject_t *dobject, void *data); /* return to pool */ + +void *lexbor_dobject_by_absolute_position(lexbor_dobject_t *dobject, size_t pos); + +size_t lexbor_dobject_allocated(lexbor_dobject_t *dobject); /* total allocated */ +size_t lexbor_dobject_cache_length(lexbor_dobject_t *dobject); /* cached (free) count */ +``` + + +## Array (`lexbor_array_t`) + +A dynamic array of `void *` pointers. Defined in `source/lexbor/core/array.h`. + +### Location + +Declared in `source/lexbor/core/array.h`. + +```C +typedef struct { + void **list; + size_t size; /* capacity */ + size_t length; /* current count */ +} lexbor_array_t; +``` + +### Lifecycle + +```C +lexbor_array_t * +lexbor_array_create(void); + +lxb_status_t +lexbor_array_init(lexbor_array_t *array, size_t size); + +void +lexbor_array_clean(lexbor_array_t *array); + +lexbor_array_t * +lexbor_array_destroy(lexbor_array_t *array, bool self_destroy); +``` + +### Operations + +```C +void ** lexbor_array_expand(lexbor_array_t *array, size_t up_to); +lxb_status_t lexbor_array_push(lexbor_array_t *array, void *value); +void * lexbor_array_pop(lexbor_array_t *array); +lxb_status_t lexbor_array_insert(lexbor_array_t *array, size_t idx, void *value); +lxb_status_t lexbor_array_set(lexbor_array_t *array, size_t idx, void *value); +void lexbor_array_delete(lexbor_array_t *array, size_t begin, size_t length); + +void * lexbor_array_get(const lexbor_array_t *array, size_t idx); /* NULL if out of bounds */ +size_t lexbor_array_length(lexbor_array_t *array); +size_t lexbor_array_size(lexbor_array_t *array); +``` + + +## Object Array (`lexbor_array_obj_t`) + +A dynamic array that stores objects by value (not by pointer). Elements are stored in a contiguous byte buffer, accessed by index and struct size. Defined in `source/lexbor/core/array_obj.h`. + +### Location + +Declared in `source/lexbor/core/array_obj.h`. + +```C +typedef struct { + uint8_t *list; + size_t size; /* capacity */ + size_t length; /* current count */ + size_t struct_size; /* size of each element */ +} lexbor_array_obj_t; +``` + +### Lifecycle + +```C +lexbor_array_obj_t * +lexbor_array_obj_create(void); + +lxb_status_t +lexbor_array_obj_init(lexbor_array_obj_t *array, size_t size, size_t struct_size); + +void +lexbor_array_obj_clean(lexbor_array_obj_t *array); + +lexbor_array_obj_t * +lexbor_array_obj_destroy(lexbor_array_obj_t *array, bool self_destroy); +``` + +### Operations + +```C +void *lexbor_array_obj_push(lexbor_array_obj_t *array); /* allocate and zero at end */ +void *lexbor_array_obj_push_wo_cls(lexbor_array_obj_t *array); /* allocate without zeroing */ +void *lexbor_array_obj_push_n(lexbor_array_obj_t *array, size_t count); /* allocate N */ +void *lexbor_array_obj_pop(lexbor_array_obj_t *array); /* remove last */ +void lexbor_array_obj_delete(lexbor_array_obj_t *array, size_t begin, size_t length); + +void * lexbor_array_obj_get(const lexbor_array_obj_t *array, size_t idx); +void * lexbor_array_obj_last(lexbor_array_obj_t *array); +size_t lexbor_array_obj_length(lexbor_array_obj_t *array); +size_t lexbor_array_obj_size(lexbor_array_obj_t *array); +size_t lexbor_array_obj_struct_size(lexbor_array_obj_t *array); +``` + +Note: `push()` returns a pointer to the newly allocated slot in the array. The caller writes the object data into this slot. + + +## String (`lexbor_str_t`) + +Dynamically resizable string. Uses `lexbor_mraw_t` for memory allocation. Defined in `source/lexbor/core/str.h`. + +### Location + +Declared in `source/lexbor/core/str.h`. + +```C +typedef struct { + lxb_char_t *data; + size_t length; +} lexbor_str_t; +``` + +### Lifecycle + +```C +lexbor_str_t * +lexbor_str_create(void); + +lxb_char_t * +lexbor_str_init(lexbor_str_t *str, lexbor_mraw_t *mraw, size_t size); + +lxb_char_t * +lexbor_str_init_append(lexbor_str_t *str, lexbor_mraw_t *mraw, + const lxb_char_t *data, size_t length); + +void +lexbor_str_clean(lexbor_str_t *str); + +void +lexbor_str_clean_all(lexbor_str_t *str); + +lexbor_str_t * +lexbor_str_destroy(lexbor_str_t *str, lexbor_mraw_t *mraw, bool destroy_obj); +``` + +### Operations + +```C +/* Resize */ +lxb_char_t *lexbor_str_realloc(lexbor_str_t *str, lexbor_mraw_t *mraw, size_t new_size); +lxb_char_t *lexbor_str_check_size(lexbor_str_t *str, lexbor_mraw_t *mraw, size_t plus_len); + +/* Append */ +lxb_char_t *lexbor_str_append(lexbor_str_t *str, lexbor_mraw_t *mraw, + const lxb_char_t *data, size_t length); +lxb_char_t *lexbor_str_append_before(lexbor_str_t *str, lexbor_mraw_t *mraw, + const lxb_char_t *buff, size_t length); +lxb_char_t *lexbor_str_append_one(lexbor_str_t *str, lexbor_mraw_t *mraw, lxb_char_t data); +lxb_char_t *lexbor_str_append_lowercase(lexbor_str_t *str, lexbor_mraw_t *mraw, + const lxb_char_t *data, size_t length); + +/* Copy */ +lxb_char_t *lexbor_str_copy(lexbor_str_t *dest, const lexbor_str_t *target, + lexbor_mraw_t *mraw); + +/* Whitespace */ +void lexbor_str_stay_only_whitespace(lexbor_str_t *target); +void lexbor_str_strip_collapse_whitespace(lexbor_str_t *target); +void lexbor_str_crop_whitespace_from_begin(lexbor_str_t *target); +``` + +### Accessors + +```C +lxb_char_t *lexbor_str_data(lexbor_str_t *str); +size_t lexbor_str_length(lexbor_str_t *str); +size_t lexbor_str_size(lexbor_str_t *str); +``` + +### Data Comparison Functions + +```C +/* Exact match */ +bool lexbor_str_data_ncmp(const lxb_char_t *first, + const lxb_char_t *sec, size_t size); +bool lexbor_str_data_cmp(const lxb_char_t *first, const lxb_char_t *sec); + +/* Case-insensitive */ +bool lexbor_str_data_ncasecmp(const lxb_char_t *first, + const lxb_char_t *sec, size_t size); +bool lexbor_str_data_casecmp(const lxb_char_t *first, const lxb_char_t *sec); + +/* Substring search */ +bool lexbor_str_data_ncmp_contain(const lxb_char_t *where, size_t where_size, + const lxb_char_t *what, size_t what_size); +bool lexbor_str_data_ncasecmp_contain(const lxb_char_t *where, size_t where_size, + const lxb_char_t *what, size_t what_size); + +/* Case conversion */ +void lexbor_str_data_to_lowercase(lxb_char_t *to, const lxb_char_t *from, size_t len); +void lexbor_str_data_to_uppercase(lxb_char_t *to, const lxb_char_t *from, size_t len); +``` + + +## Hash Table (`lexbor_hash_t`) + +Hash table with configurable key handling, collision chaining, and short string optimization for keys. Defined in `source/lexbor/core/hash.h`. + +### Location + +Declared in `source/lexbor/core/hash.h`. + +### Key Types + +```C +typedef struct { + lexbor_dobject_t *entries; + lexbor_mraw_t *mraw; + lexbor_hash_entry_t **table; + size_t table_size; + size_t struct_size; +} lexbor_hash_t; + +typedef struct { + union { + lxb_char_t *long_str; + lxb_char_t short_str[LEXBOR_HASH_SHORT_SIZE + 1]; /* 17 bytes inline */ + } u; + size_t length; + lexbor_hash_entry_t *next; +} lexbor_hash_entry_t; +``` + +Hash entries use short string optimization: keys up to 16 bytes are stored inline in `short_str`, avoiding a separate allocation. `LEXBOR_HASH_SHORT_SIZE` is `16`. + +### Lifecycle + +```C +lexbor_hash_t * +lexbor_hash_create(void); + +lxb_status_t +lexbor_hash_init(lexbor_hash_t *hash, size_t table_size, size_t struct_size); + +void +lexbor_hash_clean(lexbor_hash_t *hash); + +lexbor_hash_t * +lexbor_hash_destroy(lexbor_hash_t *hash, bool destroy_obj); +``` + +The `struct_size` parameter allows embedding custom data after the hash entry header. Pass `sizeof(lexbor_hash_entry_t)` for entries with no extra data. + +### Operations + +```C +void * +lexbor_hash_insert(lexbor_hash_t *hash, const lexbor_hash_insert_t *insert, + const lxb_char_t *key, size_t length); + +lexbor_hash_entry_t * +lexbor_hash_search(lexbor_hash_t *hash, const lexbor_hash_search_t *search, + const lxb_char_t *key, size_t length); + +void +lexbor_hash_remove(lexbor_hash_t *hash, const lexbor_hash_search_t *search, + const lxb_char_t *key, size_t length); +``` + +Pre-defined insert/search strategies: + +- `lexbor_hash_insert_raw` / `lexbor_hash_search_raw` — exact key matching +- `lexbor_hash_insert_lower` / `lexbor_hash_search_lower` — case-insensitive (lowercase) +- `lexbor_hash_insert_upper` / `lexbor_hash_search_upper` — case-insensitive (uppercase) + + +## AVL Tree (`lexbor_avl_t`) + +Self-balancing AVL tree for ordered data. Defined in `source/lexbor/core/avl.h`. + +### Location + +Declared in `source/lexbor/core/avl.h`. + +```C +typedef struct { + lexbor_dobject_t *nodes; + lexbor_avl_node_t *last_right; +} lexbor_avl_t; + +typedef struct lexbor_avl_node { + size_t type; /* key */ + short height; + void *value; + struct lexbor_avl_node *left; + struct lexbor_avl_node *right; + struct lexbor_avl_node *parent; +} lexbor_avl_node_t; +``` + +### Lifecycle + +```C +lexbor_avl_t *lexbor_avl_create(void); +lxb_status_t lexbor_avl_init(lexbor_avl_t *avl, size_t chunk_len, + size_t struct_size); +void lexbor_avl_clean(lexbor_avl_t *avl); +lexbor_avl_t *lexbor_avl_destroy(lexbor_avl_t *avl, bool self_destroy); +``` + +### Operations + +```C +lexbor_avl_node_t *lexbor_avl_insert(lexbor_avl_t *avl, lexbor_avl_node_t **scope, + size_t type, void *value); +lexbor_avl_node_t *lexbor_avl_search(lexbor_avl_t *avl, lexbor_avl_node_t *scope, + size_t type); +void * lexbor_avl_remove(lexbor_avl_t *avl, lexbor_avl_node_t **scope, + size_t type); + +lxb_status_t lexbor_avl_foreach(lexbor_avl_t *avl, lexbor_avl_node_t **scope, + lexbor_avl_node_f cb, void *ctx); +``` + +The `scope` parameter is a pointer to the root node pointer, allowing the tree to update the root during balancing. -- **Zero Dependencies** — pure C99, no external libraries required -- **Performance-Optimized** — custom algorithms tuned for parser workloads -- **Memory Efficient** — pooled allocation reduces fragmentation -- **Platform Abstraction** — portable across different operating systems -*(Documentation is currently being developed, details will be available here soon.)* diff --git a/source/modules/css.md b/source/modules/css.md index c68642f..e2c3232 100644 --- a/source/modules/css.md +++ b/source/modules/css.md @@ -3,31 +3,514 @@ * **Version:** 1.4.0 * **Path:** `source/lexbor/css` * **Base Includes:** `lexbor/css/css.h` -* **Examples:** `source/examples/css` -* **Specification:** [CSS](https://www.w3.org/Style/CSS/) +* **Examples:** `examples/lexbor/css` +* **Specification:** [CSS Syntax Level 3](https://www.w3.org/TR/css-syntax-3/), [Selectors Level 4](https://www.w3.org/TR/selectors-4/), [CSSOM](https://www.w3.org/TR/cssom-1/) ## Overview -The CSS module provides a complete CSS parser implementing `CSS Syntax Module Level 3`. It can parse stylesheets, inline styles, and build CSSOM trees. +The CSS module provides a complete CSS parser implementing CSS Syntax Module Level 3. It can parse stylesheets, individual style rules, and declarations, building a rule tree that can be serialized back to CSS text. -Full-featured CSS parser supporting CSS Syntax, Selectors, CSSOM, and gradual implementation of various CSS modules. +The module includes: + +- **Syntax Tokenizer** — converts CSS text into tokens per CSS Syntax Level 3 +- **Parser** — builds a CSS rule tree from tokens +- **Stylesheet** — parses and holds a complete stylesheet's rule tree +- **Rule Tree** — CSSOM-style representation of style rules, at-rules, and declarations +- **Property Parsing** — parses CSS property values into typed structures +- **Selectors** — CSS Selectors Level 4 (documented separately in the [Selectors module](selectors.md)) +- **Log** — collects warnings and errors during parsing + +## Key Features + +- **CSS Syntax Level 3** — complete tokenizer and parser implementation +- **CSS Selectors Level 4** — full selector parsing (see [Selectors module](selectors.md)) +- **CSS Namespaces Level 3** — complete namespace support +- **CSSOM** — CSS Object Model (in progress) +- **Property Value Parsing** — typed parsing for display, position, color, opacity, dimensions, margin, padding, border, background, font, text, flexbox, and more ## What's Inside -- **Syntax Tokenizer** — converts CSS text into tokens -- **Parser** — builds CSS rule tree from tokens -- **CSSOM** — CSS Object Model -- **Property Parsing** — parses CSS properties and values -- **Value Types** — handles lengths, colors, functions +- **[Quick Start](#quick-start)** — minimal working example to parse and serialize CSS +- **[Parser](#parser-lxb_css_parser_t)** — core CSS parser entry point +- **[Stylesheet](#stylesheet-lxb_css_stylesheet_t)** — parsed stylesheet representation +- **[Rule Tree](#rule-tree)** — CSSOM-style rule node hierarchy +- **[Serialization](#serialization)** — callback-based CSS text output +- **[Log](#log-lxb_css_log_t)** — parser warnings and errors +- **[Memory Management](#memory-management-lxb_css_memory_t)** — shared memory pool +- **[Examples](#examples)** — complete working programs + +## Quick Start + +### Parsing and Serializing CSS + +```C +#include + +static lxb_status_t +serializer_callback(const lxb_char_t *data, size_t len, void *ctx) +{ + printf("%.*s", (int) len, data); + return LXB_STATUS_OK; +} + +int main(void) +{ + lxb_status_t status; + + static const lxb_char_t css[] = "div { color: red; display: flex; }"; + + /* Create and initialize the parser */ + lxb_css_parser_t *parser = lxb_css_parser_create(); + status = lxb_css_parser_init(parser, NULL); + if (status != LXB_STATUS_OK) { + return EXIT_FAILURE; + } + + /* Create a stylesheet and parse CSS into it */ + lxb_css_stylesheet_t *sst = lxb_css_stylesheet_create(NULL); + status = lxb_css_stylesheet_parse(sst, parser, css, sizeof(css) - 1); + if (status != LXB_STATUS_OK) { + lxb_css_stylesheet_destroy(sst, true); + lxb_css_parser_destroy(parser, true); + return EXIT_FAILURE; + } + + /* Serialize back to CSS text */ + lxb_css_rule_serialize(sst->root, serializer_callback, NULL); + printf("\n"); + + lxb_css_stylesheet_destroy(sst, true); + lxb_css_parser_destroy(parser, true); + + return EXIT_SUCCESS; +} +``` + + +## Parser (`lxb_css_parser_t`) + +The CSS parser is the core entry point. Defined in `source/lexbor/css/parser.h`. + +### Location + +Declared in `source/lexbor/css/parser.h`. + +### Lifecycle + +```C +lxb_css_parser_t * +lxb_css_parser_create(void); + +lxb_status_t +lxb_css_parser_init(lxb_css_parser_t *parser, lxb_css_syntax_tokenizer_t *tkz); + +void +lxb_css_parser_clean(lxb_css_parser_t *parser); + +void +lxb_css_parser_erase(lxb_css_parser_t *parser); + +lxb_css_parser_t * +lxb_css_parser_destroy(lxb_css_parser_t *parser, bool self_destroy); +``` + +- `lxb_css_parser_init()`: If `tkz` is `NULL`, the parser creates and manages its own tokenizer. +- `lxb_css_parser_clean()`: Resets state but keeps allocated memory for reuse. +- `lxb_css_parser_erase()`: Resets state and releases internal allocations. +- `lxb_css_parser_destroy()`: If `self_destroy` is `true`, frees the parser object itself. + +### Selectors Integration + +To parse CSS that contains selectors (which is most CSS), initialize the selectors module: + +```C +lxb_status_t +lxb_css_parser_selectors_init(lxb_css_parser_t *parser); + +void +lxb_css_parser_selectors_destroy(lxb_css_parser_t *parser); +``` + +If the selectors module is not initialized when parsing a stylesheet, one is created temporarily for each parse call. For better performance when parsing multiple stylesheets, initialize it once. + +### Status + +```C +lxb_status_t +lxb_css_parser_status(lxb_css_parser_t *parser); + +lxb_css_log_t * +lxb_css_parser_log(lxb_css_parser_t *parser); +``` + + +## Stylesheet (`lxb_css_stylesheet_t`) + +Represents a parsed CSS stylesheet. Defined in `source/lexbor/css/stylesheet.h`. + +### Location + +Declared in `source/lexbor/css/stylesheet.h`. + +### Lifecycle + +```C +lxb_css_stylesheet_t * +lxb_css_stylesheet_create(lxb_css_memory_t *memory); + +lxb_css_stylesheet_t * +lxb_css_stylesheet_destroy(lxb_css_stylesheet_t *sst, bool destroy_memory); +``` + +- `lxb_css_stylesheet_create()`: If `memory` is `NULL`, the stylesheet creates its own memory pool. +- `lxb_css_stylesheet_destroy()`: If `destroy_memory` is `true`, also destroys the associated memory pool. + +### Parsing + +```C +lxb_status_t +lxb_css_stylesheet_parse(lxb_css_stylesheet_t *sst, lxb_css_parser_t *parser, + const lxb_char_t *data, size_t length); +``` + +Parses CSS text into the stylesheet's rule tree. Only returns errors for severe failures (e.g., out of memory). Invalid CSS is handled gracefully — broken rules are recorded as `lxb_css_rule_bad_style_t`. + +After parsing, the rule tree is available at `sst->root`. + + +## Rule Tree + +The parsed CSS is represented as a tree of rule nodes. All rule types share a common base `lxb_css_rule_t`. Defined in `source/lexbor/css/rule.h`. + +### Location + +Defined in `source/lexbor/css/rule.h`. + +### Rule Types + +```C +typedef enum { + LXB_CSS_RULE_UNDEF = 0, + LXB_CSS_RULE_STYLESHEET, + LXB_CSS_RULE_LIST, + LXB_CSS_RULE_AT_RULE, + LXB_CSS_RULE_STYLE, + LXB_CSS_RULE_BAD_STYLE, + LXB_CSS_RULE_DECLARATION_LIST, + LXB_CSS_RULE_DECLARATION +} lxb_css_rule_type_t; +``` + +### Key Rule Structures + +**`lxb_css_rule_style_t`** — A CSS style rule (selector + declarations): + +```C +struct lxb_css_rule_style { + lxb_css_rule_t rule; + lxb_css_selector_list_t *selector; + lxb_css_rule_declaration_list_t *declarations; + /* ... */ +}; +``` + +**`lxb_css_rule_declaration_t`** — A single CSS declaration (property: value): + +```C +struct lxb_css_rule_declaration { + lxb_css_rule_t rule; + uintptr_t type; /* property ID from LXB_CSS_PROPERTY_* */ + union { /* typed property value */ } u; + bool important; +}; +``` + +The `type` field holds the property ID (e.g., `LXB_CSS_PROPERTY_DISPLAY`), and the union `u` holds the parsed value in a type-safe structure. + +**`lxb_css_rule_at_t`** — An at-rule (@media, @font-face, @namespace): + +```C +struct lxb_css_rule_at { + lxb_css_rule_t rule; + uintptr_t type; /* at-rule ID from LXB_CSS_AT_RULE_* */ + union { /* typed at-rule data */ } u; +}; +``` + +**`lxb_css_rule_bad_style_t`** — A style rule whose selector failed to parse: + +```C +struct lxb_css_rule_bad_style { + lxb_css_rule_t rule; + lexbor_str_t selectors; /* raw selector text */ + lxb_css_rule_declaration_list_t *declarations; +}; +``` + +### Casting Macros + +```C +lxb_css_rule(obj) /* cast to lxb_css_rule_t * */ +lxb_css_rule_style(obj) /* cast to lxb_css_rule_style_t * */ +lxb_css_rule_at(obj) /* cast to lxb_css_rule_at_t * */ +lxb_css_rule_declaration(obj) /* cast to lxb_css_rule_declaration_t * */ +lxb_css_rule_declaration_list(obj) /* cast to lxb_css_rule_declaration_list_t * */ +``` + +### Traversal + +Rules form a linked list via `next`/`prev` pointers. List containers (`lxb_css_rule_list_t`, `lxb_css_rule_declaration_list_t`) have `first`/`last` pointers. + +```C +/* Iterate over rules in a list */ +lxb_css_rule_t *rule = list->first; +while (rule != NULL) { + /* process rule */ + rule = rule->next; +} +``` + + +## Serialization + +All rule types support callback-based serialization back to CSS text. + +### Location + +Serialization functions are declared in `source/lexbor/css/rule.h`. + +```C +lxb_status_t +lxb_css_rule_serialize(const lxb_css_rule_t *rule, + lexbor_serialize_cb_f cb, void *ctx); + +lxb_status_t +lxb_css_rule_serialize_chain(const lxb_css_rule_t *rule, + lexbor_serialize_cb_f cb, void *ctx); +``` + +- `lxb_css_rule_serialize()`: Serializes a single rule. +- `lxb_css_rule_serialize_chain()`: Serializes a rule and all its `next` siblings. + +Type-specific serialization functions: + +```C +lxb_css_rule_style_serialize(style, cb, ctx); +lxb_css_rule_at_serialize(at, cb, ctx); +lxb_css_rule_declaration_serialize(decl, cb, ctx); +lxb_css_rule_declaration_list_serialize(list, cb, ctx); +``` + +The callback signature is `lexbor_serialize_cb_f`: + +```C +typedef lxb_status_t +(*lexbor_serialize_cb_f)(const lxb_char_t *data, size_t len, void *ctx); +``` + + +## Log (`lxb_css_log_t`) + +The CSS parser log collects messages generated during parsing. Defined in `source/lexbor/css/log.h`. + +### Location + +Declared in `source/lexbor/css/log.h`. + +### Message Types + +```C +typedef enum { + LXB_CSS_LOG_INFO = 0, + LXB_CSS_LOG_WARNING, + LXB_CSS_LOG_ERROR, + LXB_CSS_LOG_SYNTAX_ERROR +} lxb_css_log_type_t; +``` + +### Lifecycle + +```C +lxb_css_log_t * +lxb_css_log_create(void); + +lxb_status_t +lxb_css_log_init(lxb_css_log_t *log, lexbor_mraw_t *mraw); + +void +lxb_css_log_clean(lxb_css_log_t *log); + +lxb_css_log_t * +lxb_css_log_destroy(lxb_css_log_t *log, bool self_destroy); +``` + +### Usage + +```C +/* Get the number of log messages */ +size_t +lxb_css_log_length(lxb_css_log_t *log); + +/* Serialize all log messages */ +lxb_status_t +lxb_css_log_serialize(lxb_css_log_t *log, lexbor_serialize_cb_f cb, void *ctx, + const lxb_char_t *indent, size_t indent_length); + +/* Serialize to a string (caller must free with lexbor_free) */ +lxb_char_t * +lxb_css_log_serialize_char(lxb_css_log_t *log, size_t *out_length, + const lxb_char_t *indent, size_t indent_length); +``` + + +## Memory Management (`lxb_css_memory_t`) + +The CSS module uses a shared memory pool for all allocations. Defined in `source/lexbor/css/base.h`. + +### Location + +Declared in `source/lexbor/css/base.h`. + +```C +lxb_css_memory_t * +lxb_css_memory_create(void); + +lxb_status_t +lxb_css_memory_init(lxb_css_memory_t *memory, size_t prepare_count); + +void +lxb_css_memory_clean(lxb_css_memory_t *memory); + +lxb_css_memory_t * +lxb_css_memory_destroy(lxb_css_memory_t *memory, bool self_destroy); +``` + +The memory pool uses reference counting: + +```C +lxb_css_memory_t * +lxb_css_memory_ref_inc(lxb_css_memory_t *memory); + +void +lxb_css_memory_ref_dec(lxb_css_memory_t *memory); + +lxb_css_memory_t * +lxb_css_memory_ref_dec_destroy(lxb_css_memory_t *memory); +``` + + +## Examples + +### Parsing and Serializing a Stylesheet + +```C +#include + +static lxb_status_t +serializer_callback(const lxb_char_t *data, size_t len, void *ctx) +{ + printf("%.*s", (int) len, data); + return LXB_STATUS_OK; +} + +int +main(void) +{ + lxb_status_t status; + lxb_css_parser_t *parser; + lxb_css_stylesheet_t *sst; + + static const lxb_char_t css[] = + "div { color: red; display: flex; }" + "p.intro { font-size: 16px; margin: 10px; }"; + + /* Create and initialize the parser */ + parser = lxb_css_parser_create(); + status = lxb_css_parser_init(parser, NULL); + if (status != LXB_STATUS_OK) { + goto failed; + } + + /* Create a stylesheet and parse CSS into it */ + sst = lxb_css_stylesheet_create(NULL); + status = lxb_css_stylesheet_parse(sst, parser, css, sizeof(css) - 1); + + lxb_css_parser_destroy(parser, true); + + if (status != LXB_STATUS_OK) { + lxb_css_stylesheet_destroy(sst, true); + return EXIT_FAILURE; + } + + /* Serialize the parsed stylesheet back to CSS text */ + lxb_css_rule_serialize(sst->root, serializer_callback, NULL); + printf("\n"); + + lxb_css_stylesheet_destroy(sst, true); + return EXIT_SUCCESS; + +failed: + lxb_css_parser_destroy(parser, true); + return EXIT_FAILURE; +} +``` + +### Walking the Rule Tree + +```C +#include + +static lxb_status_t +print_cb(const lxb_char_t *data, size_t len, void *ctx) +{ + printf("%.*s", (int) len, data); + return LXB_STATUS_OK; +} + +int +main(void) +{ + lxb_status_t status; + lxb_css_parser_t *parser; + lxb_css_stylesheet_t *sst; + lxb_css_rule_t *rule; + + static const lxb_char_t css[] = + ".header { color: blue; } .footer { margin: 0; }"; + + parser = lxb_css_parser_create(); + status = lxb_css_parser_init(parser, NULL); + if (status != LXB_STATUS_OK) { + return EXIT_FAILURE; + } + + sst = lxb_css_stylesheet_create(NULL); + status = lxb_css_stylesheet_parse(sst, parser, css, sizeof(css) - 1); + lxb_css_parser_destroy(parser, true); + + if (status != LXB_STATUS_OK) { + lxb_css_stylesheet_destroy(sst, true); + return EXIT_FAILURE; + } + + /* Walk the rule list */ + lxb_css_rule_list_t *list = lxb_css_rule_list(sst->root); + rule = list->first; -## Supported Features + while (rule != NULL) { + printf("Rule type: %d\n", rule->type); -- ✅ CSS Syntax Level 3 -- ✅ CSS Selectors Level 4 -- ✅ CSSOM -- 🚧 CSS Values (in progress) -- 🚧 CSS Box Model (in progress) -- 🚧 CSS Display, Fonts, Flexbox (in progress) + if (rule->type == LXB_CSS_RULE_STYLE) { + printf(" Style rule: "); + lxb_css_rule_style_serialize(lxb_css_rule_style(rule), + print_cb, NULL); + printf("\n"); + } + rule = rule->next; + } -*(Documentation is currently being developed, details will be available here soon.)* + lxb_css_stylesheet_destroy(sst, true); + return EXIT_SUCCESS; +} +``` diff --git a/source/modules/dom.md b/source/modules/dom.md index 391d431..0756559 100644 --- a/source/modules/dom.md +++ b/source/modules/dom.md @@ -3,22 +3,845 @@ * **Version:** 2.0.0 * **Path:** `source/lexbor/dom` * **Base Includes:** `lexbor/dom/dom.h` -* **Examples:** not present +* **Examples:** `examples/lexbor/html` (DOM is used through the HTML module) * **Specification:** [WHATWG DOM Living Standard](https://dom.spec.whatwg.org/) ## Overview -The DOM module implements the Document Object Model specification, providing a tree structure for representing and manipulating HTML documents. +The DOM module implements the Document Object Model specification, providing a tree structure for representing and manipulating HTML documents. It defines the node hierarchy, tree operations, element attributes, and namespace handling used throughout lexbor. -Complete implementation of WHATWG DOM standard with efficient tree operations and namespace support. +In practice, the DOM module is most commonly used via the [HTML module](html.md). After parsing HTML with `lxb_html_document_parse()`, you interact with the resulting tree using DOM types and functions. + +## Key Features + +- **Specification Compliant** — implements WHATWG DOM Living Standard +- **Full Node Hierarchy** — elements, text, comments, processing instructions, document fragments +- **Tree Operations** — insert, remove, replace, clone, walk with callbacks +- **Element Search** — find by ID, tag name, class name, attribute values +- **Attribute Operations** — get, set, remove, iterate attributes +- **Collections** — dynamic arrays for holding multiple node references +- **Namespace Support** — HTML, SVG, MathML, XLink, XML, XMLNS +- **Inheritance Pattern** — "poor man's inheritance" with safe casting macros ## What's Inside -- **Node Hierarchy** — all DOM node types (Element, Text, Comment, etc.) -- **Node Operations** — create, append, insert, remove, replace -- **Element Operations** — attribute manipulation -- **Tree Traversal** — parent, children, siblings navigation -- **Namespace Support** — HTML, SVG, MathML, XML +- **[Quick Start](#quick-start)** — minimal working example for DOM traversal +- **[Key Types](#key-types)** — node types, interface hierarchy, and casting macros +- **[Node](#node-lxb_dom_node_t)** — traversal, properties, modification, text content, walking +- **[Element](#element-lxb_dom_element_t)** — names, attributes, search, lifecycle +- **[Attribute](#attribute-lxb_dom_attr_t)** — attribute name and value access +- **[Document](#document-lxb_dom_document_t)** — document node, factory methods, lifecycle +- **[Collection](#collection-lxb_dom_collection_t)** — dynamic array for holding node references +- **[Namespace Support](#namespace-support)** — XML namespace handling +- **[Examples](#examples)** — complete working programs + +## Quick Start + +### Iterating Child Elements + +```C +#include +#include + +int main(void) +{ + lxb_status_t status; + lxb_html_document_t *document; + lxb_dom_node_t *child; + + static const lxb_char_t html[] = + "
First

Second

Third"; + + document = lxb_html_document_create(); + status = lxb_html_document_parse(document, html, sizeof(html) - 1); + if (status != LXB_STATUS_OK) { + lxb_html_document_destroy(document); + return EXIT_FAILURE; + } + + lxb_dom_element_t *body = lxb_dom_interface_element(document->body); + + child = lxb_dom_node_first_child(lxb_dom_interface_node(body)); + while (child != NULL) { + if (lxb_dom_node_type(child) == LXB_DOM_NODE_TYPE_ELEMENT) { + const lxb_char_t *name; + name = lxb_dom_element_local_name( + lxb_dom_interface_element(child), NULL); + printf("Element: %s\n", (const char *) name); + } + child = lxb_dom_node_next(child); + } + + lxb_html_document_destroy(document); + return EXIT_SUCCESS; +} +``` + +**Output:** +``` +Element: div +Element: p +Element: span +``` + +## Key Types + +### Node Types + +Every node in the DOM tree has a type defined by `lxb_dom_node_type_t`: + +```C +typedef enum { + LXB_DOM_NODE_TYPE_UNDEF = 0x00, + LXB_DOM_NODE_TYPE_ELEMENT = 0x01, + LXB_DOM_NODE_TYPE_ATTRIBUTE = 0x02, + LXB_DOM_NODE_TYPE_TEXT = 0x03, + LXB_DOM_NODE_TYPE_CDATA_SECTION = 0x04, + LXB_DOM_NODE_TYPE_ENTITY_REFERENCE = 0x05, // historical + LXB_DOM_NODE_TYPE_ENTITY = 0x06, // historical + LXB_DOM_NODE_TYPE_PROCESSING_INSTRUCTION = 0x07, + LXB_DOM_NODE_TYPE_COMMENT = 0x08, + LXB_DOM_NODE_TYPE_DOCUMENT = 0x09, + LXB_DOM_NODE_TYPE_DOCUMENT_TYPE = 0x0A, + LXB_DOM_NODE_TYPE_DOCUMENT_FRAGMENT = 0x0B, + LXB_DOM_NODE_TYPE_NOTATION = 0x0C, // historical + LXB_DOM_NODE_TYPE_CHARACTER_DATA, + LXB_DOM_NODE_TYPE_SHADOW_ROOT, + LXB_DOM_NODE_TYPE_LAST_ENTRY +} lxb_dom_node_type_t; +``` + +### Interface Hierarchy + +The DOM module uses a "poor man's inheritance" pattern where each structure embeds its parent as the first field, allowing safe casting between types: + +``` +lxb_dom_event_target_t + └── lxb_dom_node_t + ├── lxb_dom_element_t + ├── lxb_dom_document_t + ├── lxb_dom_character_data_t + │ ├── lxb_dom_text_t + │ ├── lxb_dom_comment_t + │ ├── lxb_dom_cdata_section_t + │ └── lxb_dom_processing_instruction_t + ├── lxb_dom_document_type_t + ├── lxb_dom_document_fragment_t + ├── lxb_dom_shadow_root_t + └── lxb_dom_attr_t +``` + +### Interface Casting Macros + +Because of the inheritance pattern, casting macros are provided in `source/lexbor/dom/interface.h`: + +```C +lxb_dom_interface_node(obj) /* cast to lxb_dom_node_t * */ +lxb_dom_interface_element(obj) /* cast to lxb_dom_element_t * */ +lxb_dom_interface_document(obj) /* cast to lxb_dom_document_t * */ +lxb_dom_interface_text(obj) /* cast to lxb_dom_text_t * */ +lxb_dom_interface_comment(obj) /* cast to lxb_dom_comment_t * */ +lxb_dom_interface_attr(obj) /* cast to lxb_dom_attr_t * */ +``` + +For example, to get the node type of an element: + +```C +lxb_dom_element_t *element = /* ... */; +lxb_dom_node_type_t type = lxb_dom_node_type(lxb_dom_interface_node(element)); +``` + + +## Node (`lxb_dom_node_t`) + +The fundamental type for all DOM tree nodes. Defined in `source/lexbor/dom/interfaces/node.h`. + +### Location + +Declared in `source/lexbor/dom/interfaces/node.h`. + +### Tree Traversal + +Navigate the tree using these inline functions: + +```C +lxb_dom_node_t * +lxb_dom_node_first_child(lxb_dom_node_t *node); + +lxb_dom_node_t * +lxb_dom_node_last_child(lxb_dom_node_t *node); + +lxb_dom_node_t * +lxb_dom_node_next(lxb_dom_node_t *node); + +lxb_dom_node_t * +lxb_dom_node_prev(lxb_dom_node_t *node); + +lxb_dom_node_t * +lxb_dom_node_parent(lxb_dom_node_t *node); +``` + +All return `NULL` when no such node exists. + +### Node Properties + +```C +/* Get the node type */ +lxb_dom_node_type_t +lxb_dom_node_type(lxb_dom_node_t *node); + +/* Get the tag ID (element local name as numeric ID) */ +lxb_tag_id_t +lxb_dom_node_tag_id(lxb_dom_node_t *node); + +/* Get the node name as a string */ +const lxb_char_t * +lxb_dom_node_name(lxb_dom_node_t *node, size_t *len); +``` + +### Tree Modification + +**Low-level operations** — These insert/remove nodes directly without DOM spec validation: + +```C +/* Insert node as the last child of 'to' */ +void +lxb_dom_node_insert_child(lxb_dom_node_t *to, lxb_dom_node_t *node); + +/* Insert node immediately before 'to' */ +void +lxb_dom_node_insert_before(lxb_dom_node_t *to, lxb_dom_node_t *node); + +/* Insert node immediately after 'to' */ +void +lxb_dom_node_insert_after(lxb_dom_node_t *to, lxb_dom_node_t *node); + +/* Remove node from its parent */ +void +lxb_dom_node_remove(lxb_dom_node_t *node); +``` + +**Spec-compliant operations** — These perform DOM spec validation before modifying the tree, returning an exception code: + +```C +/* Node.appendChild(node) — validates, then appends child */ +lxb_dom_exception_code_t +lxb_dom_node_append_child(lxb_dom_node_t *parent, lxb_dom_node_t *node); + +/* Node.insertBefore(node, child) — validates, then inserts */ +lxb_dom_exception_code_t +lxb_dom_node_insert_before_spec(lxb_dom_node_t *dst, lxb_dom_node_t *node, + lxb_dom_node_t *child); + +/* Node.removeChild(child) — validates, then removes */ +lxb_dom_exception_code_t +lxb_dom_node_remove_child(lxb_dom_node_t *parent, lxb_dom_node_t *child); + +/* Node.replaceChild(node, child) — validates, then replaces */ +lxb_dom_exception_code_t +lxb_dom_node_replace_child(lxb_dom_node_t *parent, lxb_dom_node_t *node, + lxb_dom_node_t *child); +``` + +Returns `LXB_DOM_EXCEPTION_OK` on success. + +### Text Content + +```C +/* Get text content of the node and its descendants. + * Memory is freed when the document is destroyed. + * To free earlier, call lxb_dom_document_destroy_text(). */ +lxb_char_t * +lxb_dom_node_text_content(lxb_dom_node_t *node, size_t *len); + +/* Set text content, replacing all children */ +lxb_status_t +lxb_dom_node_text_content_set(lxb_dom_node_t *node, + const lxb_char_t *content, size_t len); +``` + +### Tree Walking + +Walk all descendants of a node using a callback: + +```C +typedef lexbor_action_t +(*lxb_dom_node_simple_walker_f)(lxb_dom_node_t *node, void *ctx); + +void +lxb_dom_node_simple_walk(lxb_dom_node_t *root, + lxb_dom_node_simple_walker_f walker_cb, void *ctx); +``` + +The callback should return `LEXBOR_ACTION_OK` to continue or `LEXBOR_ACTION_STOP` to stop. + +### Search Functions + +Find nodes within a subtree: + +```C +/* Find the first element with the given ID */ +lxb_dom_node_t * +lxb_dom_node_by_id(lxb_dom_node_t *root, + const lxb_char_t *qualified_name, size_t len); + +/* Collect all elements with the given tag name */ +lxb_status_t +lxb_dom_node_by_tag_name(lxb_dom_node_t *root, lxb_dom_collection_t *collection, + const lxb_char_t *qualified_name, size_t len); + +/* Collect all elements with the given class name */ +lxb_status_t +lxb_dom_node_by_class_name(lxb_dom_node_t *root, + lxb_dom_collection_t *collection, + const lxb_char_t *class_name, size_t len); + +/* Collect elements by attribute name and value (exact match) */ +lxb_status_t +lxb_dom_node_by_attr(lxb_dom_node_t *root, lxb_dom_collection_t *collection, + const lxb_char_t *qualified_name, size_t qname_len, + const lxb_char_t *value, size_t value_len, + bool case_insensitive); + +/* Collect elements by attribute value prefix */ +lxb_status_t +lxb_dom_node_by_attr_begin(lxb_dom_node_t *root, + lxb_dom_collection_t *collection, + const lxb_char_t *qualified_name, size_t qname_len, + const lxb_char_t *value, size_t value_len, + bool case_insensitive); + +/* Collect elements by attribute value suffix */ +lxb_status_t +lxb_dom_node_by_attr_end(lxb_dom_node_t *root, lxb_dom_collection_t *collection, + const lxb_char_t *qualified_name, size_t qname_len, + const lxb_char_t *value, size_t value_len, + bool case_insensitive); + +/* Collect elements by attribute value substring */ +lxb_status_t +lxb_dom_node_by_attr_contain(lxb_dom_node_t *root, + lxb_dom_collection_t *collection, + const lxb_char_t *qualified_name, size_t qname_len, + const lxb_char_t *value, size_t value_len, + bool case_insensitive); +``` + +### Destroy + +```C +/* Destroy a single node (does not remove children) */ +lxb_dom_node_t * +lxb_dom_node_destroy(lxb_dom_node_t *node); + +/* Destroy a node and all its descendants */ +lxb_dom_node_t * +lxb_dom_node_destroy_deep(lxb_dom_node_t *root); + +/* Clone a node, optionally with all descendants */ +lxb_dom_node_t * +lxb_dom_node_clone(lxb_dom_node_t *node, bool deep); +``` + + +## Element (`lxb_dom_element_t`) + +Extends `lxb_dom_node_t` for elements. Defined in `source/lexbor/dom/interfaces/element.h`. + +### Location + +Declared in `source/lexbor/dom/interfaces/element.h`. + +### Element Names + +```C +/* Original qualified name (e.g. "LalAla:DiV") */ +const lxb_char_t * +lxb_dom_element_qualified_name(lxb_dom_element_t *element, size_t *len); + +/* Uppercase qualified name */ +const lxb_char_t * +lxb_dom_element_qualified_name_upper(lxb_dom_element_t *element, size_t *len); + +/* Local name only (without prefix) */ +const lxb_char_t * +lxb_dom_element_local_name(lxb_dom_element_t *element, size_t *len); + +/* Tag name (uppercase qualified name) */ +const lxb_char_t * +lxb_dom_element_tag_name(lxb_dom_element_t *element, size_t *len); + +/* Namespace prefix */ +const lxb_char_t * +lxb_dom_element_prefix(lxb_dom_element_t *element, size_t *len); + +/* Tag ID and namespace ID as numeric values */ +lxb_tag_id_t +lxb_dom_element_tag_id(lxb_dom_element_t *element); + +lxb_ns_id_t +lxb_dom_element_ns_id(lxb_dom_element_t *element); +``` + +### Attribute Operations + +```C +/* Set or create an attribute */ +lxb_dom_attr_t * +lxb_dom_element_set_attribute(lxb_dom_element_t *element, + const lxb_char_t *qualified_name, size_t qn_len, + const lxb_char_t *value, size_t value_len); + +/* Get an attribute value */ +const lxb_char_t * +lxb_dom_element_get_attribute(lxb_dom_element_t *element, + const lxb_char_t *qualified_name, size_t qn_len, + size_t *value_len); + +/* Remove an attribute */ +lxb_status_t +lxb_dom_element_remove_attribute(lxb_dom_element_t *element, + const lxb_char_t *qualified_name, size_t qn_len); + +/* Check if attribute exists */ +bool +lxb_dom_element_has_attribute(lxb_dom_element_t *element, + const lxb_char_t *qualified_name, size_t qn_len); + +/* Check if element has any attributes */ +bool +lxb_dom_element_has_attributes(lxb_dom_element_t *element); +``` + +### Attribute Iteration + +```C +lxb_dom_attr_t * +lxb_dom_element_first_attribute(lxb_dom_element_t *element); + +lxb_dom_attr_t * +lxb_dom_element_last_attribute(lxb_dom_element_t *element); + +lxb_dom_attr_t * +lxb_dom_element_next_attribute(lxb_dom_attr_t *attr); + +lxb_dom_attr_t * +lxb_dom_element_prev_attribute(lxb_dom_attr_t *attr); +``` + +### ID and Class Access + +```C +/* Get the element's "id" attribute value */ +const lxb_char_t * +lxb_dom_element_id(lxb_dom_element_t *element, size_t *len); + +/* Get the element's "class" attribute value */ +const lxb_char_t * +lxb_dom_element_class(lxb_dom_element_t *element, size_t *len); + +/* Direct access to the id/class attribute objects */ +lxb_dom_attr_t * +lxb_dom_element_id_attribute(lxb_dom_element_t *element); + +lxb_dom_attr_t * +lxb_dom_element_class_attribute(lxb_dom_element_t *element); +``` + +### Element Search + +These functions search from the element downward and collect results into a collection: + +```C +/* Find the first element with the given ID */ +lxb_dom_element_t * +lxb_dom_element_by_id(lxb_dom_element_t *root, + const lxb_char_t *qualified_name, size_t len); + +/* Collect elements by tag name */ +lxb_status_t +lxb_dom_elements_by_tag_name(lxb_dom_element_t *root, + lxb_dom_collection_t *collection, + const lxb_char_t *qualified_name, size_t len); + +/* Collect elements by class name */ +lxb_status_t +lxb_dom_elements_by_class_name(lxb_dom_element_t *root, + lxb_dom_collection_t *collection, + const lxb_char_t *class_name, size_t len); + +/* Collect elements by attribute (exact, prefix, suffix, substring) */ +lxb_status_t +lxb_dom_elements_by_attr(lxb_dom_element_t *root, + lxb_dom_collection_t *collection, + const lxb_char_t *qualified_name, size_t qname_len, + const lxb_char_t *value, size_t value_len, + bool case_insensitive); +``` + +Variants `lxb_dom_elements_by_attr_begin()`, `lxb_dom_elements_by_attr_end()`, and `lxb_dom_elements_by_attr_contain()` match by attribute value prefix, suffix, and substring respectively. + +### Lifecycle + +```C +lxb_dom_element_t * +lxb_dom_element_create(lxb_dom_document_t *document, + const lxb_char_t *local_name, size_t lname_len, + const lxb_char_t *ns_name, size_t ns_len, + const lxb_char_t *prefix, size_t prefix_len, + const lxb_char_t *is, size_t is_len, + bool sync_custom); + +lxb_dom_element_t * +lxb_dom_element_destroy(lxb_dom_element_t *element); +``` + +In most cases, prefer `lxb_dom_document_create_element()` (see below) instead of calling `lxb_dom_element_create()` directly. + + +## Attribute (`lxb_dom_attr_t`) + +Represents a single attribute on an element. Defined in `source/lexbor/dom/interfaces/attr.h`. + +### Location + +Declared in `source/lexbor/dom/interfaces/attr.h`. + +```C +/* Get the local name of the attribute */ +const lxb_char_t * +lxb_dom_attr_local_name(lxb_dom_attr_t *attr, size_t *len); + +/* Get the qualified name (including prefix) */ +const lxb_char_t * +lxb_dom_attr_qualified_name(lxb_dom_attr_t *attr, size_t *len); + +/* Get the attribute value */ +const lxb_char_t * +lxb_dom_attr_value(lxb_dom_attr_t *attr, size_t *len); + +/* Set the attribute value */ +lxb_status_t +lxb_dom_attr_set_value(lxb_dom_attr_t *attr, + const lxb_char_t *value, size_t value_len); +``` + + +## Document (`lxb_dom_document_t`) + +The document node — the root of the DOM tree. Defined in `source/lexbor/dom/interfaces/document.h`. + +### Location + +Declared in `source/lexbor/dom/interfaces/document.h`. + +When working with HTML, you typically use `lxb_html_document_t` (from the [HTML module](html.md)) rather than `lxb_dom_document_t` directly. + +### Compatibility Mode + +```C +typedef enum { + LXB_DOM_DOCUMENT_CMODE_NO_QUIRKS = 0x00, + LXB_DOM_DOCUMENT_CMODE_QUIRKS = 0x01, + LXB_DOM_DOCUMENT_CMODE_LIMITED_QUIRKS = 0x02 +} lxb_dom_document_cmode_t; +``` + +### Factory Methods + +Create new DOM nodes owned by the document: + +```C +lxb_dom_element_t * +lxb_dom_document_create_element(lxb_dom_document_t *document, + const lxb_char_t *local_name, size_t lname_len, + void *reserved_for_opt); + +lxb_dom_text_t * +lxb_dom_document_create_text_node(lxb_dom_document_t *document, + const lxb_char_t *data, size_t len); + +lxb_dom_comment_t * +lxb_dom_document_create_comment(lxb_dom_document_t *document, + const lxb_char_t *data, size_t len); + +lxb_dom_cdata_section_t * +lxb_dom_document_create_cdata_section(lxb_dom_document_t *document, + const lxb_char_t *data, size_t len); + +lxb_dom_processing_instruction_t * +lxb_dom_document_create_processing_instruction(lxb_dom_document_t *document, + const lxb_char_t *target, size_t target_len, + const lxb_char_t *data, size_t data_len); + +lxb_dom_document_fragment_t * +lxb_dom_document_create_document_fragment(lxb_dom_document_t *document); +``` + +### Document Access + +```C +/* Get the root node of the document tree */ +lxb_dom_node_t * +lxb_dom_document_root(lxb_dom_document_t *document); + +/* Get the document element (e.g. ) */ +lxb_dom_element_t * +lxb_dom_document_element(lxb_dom_document_t *document); + +/* Import a node from another document */ +lxb_dom_node_t * +lxb_dom_document_import_node(lxb_dom_document_t *doc, lxb_dom_node_t *node, + bool deep); +``` + +### Lifecycle + +```C +lxb_dom_document_t * +lxb_dom_document_create(lxb_dom_document_t *owner); + +lxb_status_t +lxb_dom_document_init(lxb_dom_document_t *document, lxb_dom_document_t *owner, + lxb_dom_interface_create_f create_interface, + lxb_dom_interface_clone_f clone_interface, + lxb_dom_interface_destroy_f destroy_interface, + lxb_dom_document_dtype_t type, unsigned int ns); + +lxb_status_t +lxb_dom_document_clean(lxb_dom_document_t *document); + +lxb_dom_document_t * +lxb_dom_document_destroy(lxb_dom_document_t *document); +``` + + +## Collection (`lxb_dom_collection_t`) + +A dynamic array for holding references to multiple DOM nodes. Used with search functions that return multiple results. Defined in `source/lexbor/dom/collection.h`. + +### Location + +Declared in `source/lexbor/dom/collection.h`. + +### Lifecycle + +```C +lxb_dom_collection_t * +lxb_dom_collection_create(lxb_dom_document_t *document); + +lxb_status_t +lxb_dom_collection_init(lxb_dom_collection_t *col, size_t start_list_size); + +lxb_dom_collection_t * +lxb_dom_collection_destroy(lxb_dom_collection_t *col, bool self_destroy); +``` + +Or use the convenience function that creates and initializes in one call: + +```C +lxb_dom_collection_t * +lxb_dom_collection_make(lxb_dom_document_t *document, size_t start_list_size); +``` + +### Usage + +```C +void +lxb_dom_collection_clean(lxb_dom_collection_t *col); + +lxb_status_t +lxb_dom_collection_append(lxb_dom_collection_t *col, void *value); + +lxb_dom_element_t * +lxb_dom_collection_element(lxb_dom_collection_t *col, size_t idx); + +lxb_dom_node_t * +lxb_dom_collection_node(lxb_dom_collection_t *col, size_t idx); + +size_t +lxb_dom_collection_length(lxb_dom_collection_t *col); +``` + + +## Namespace Support + +The DOM module supports six XML namespaces, managed by the NS module: + +- **HTML** (`LXB_NS_HTML`) +- **SVG** (`LXB_NS_SVG`) +- **MathML** (`LXB_NS_MATH`) +- **XLink** (`LXB_NS_XLINK`) +- **XML** (`LXB_NS_XML`) +- **XMLNS** (`LXB_NS_XMLNS`) + +Namespace IDs are accessed via `lxb_dom_element_ns_id()` or the `ns` field of `lxb_dom_node_t`. + + +## Examples + +### Iterating Child Elements + +```C +#include +#include + +int +main(void) +{ + lxb_status_t status; + lxb_html_document_t *document; + lxb_dom_element_t *body; + lxb_dom_node_t *child; + + static const lxb_char_t html[] = + "
First

Second

Third"; + + document = lxb_html_document_create(); + status = lxb_html_document_parse(document, html, sizeof(html) - 1); + if (status != LXB_STATUS_OK) { + goto failed; + } + + body = lxb_dom_interface_element(document->body); + + child = lxb_dom_node_first_child(lxb_dom_interface_node(body)); + while (child != NULL) { + if (lxb_dom_node_type(child) == LXB_DOM_NODE_TYPE_ELEMENT) { + const lxb_char_t *name; + name = lxb_dom_element_local_name(lxb_dom_interface_element(child), + NULL); + printf("Element: %s\n", (const char *) name); + } + + child = lxb_dom_node_next(child); + } + + lxb_html_document_destroy(document); + return EXIT_SUCCESS; + +failed: + lxb_html_document_destroy(document); + return EXIT_FAILURE; +} +``` + +**Output:** +``` +Element: div +Element: p +Element: span +``` + +### Searching by Attribute + +```C +#include +#include + +int +main(void) +{ + lxb_status_t status; + lxb_html_document_t *document; + lxb_dom_collection_t *collection; + + static const lxb_char_t html[] = + "
One
" + "

Two

" + "Three"; + + document = lxb_html_document_create(); + status = lxb_html_document_parse(document, html, sizeof(html) - 1); + if (status != LXB_STATUS_OK) { + goto failed; + } + + collection = lxb_dom_collection_make( + lxb_dom_interface_document(document), 16); + if (collection == NULL) { + goto failed; + } + + status = lxb_dom_elements_by_class_name( + lxb_dom_interface_element(document->body), + collection, + (const lxb_char_t *) "active", 6); + if (status != LXB_STATUS_OK) { + goto cleanup; + } + + for (size_t i = 0; i < lxb_dom_collection_length(collection); i++) { + lxb_dom_element_t *el = lxb_dom_collection_element(collection, i); + const lxb_char_t *name = lxb_dom_element_local_name(el, NULL); + printf("Found: %s\n", (const char *) name); + } + + lxb_dom_collection_destroy(collection, true); + lxb_html_document_destroy(document); + return EXIT_SUCCESS; + +cleanup: + lxb_dom_collection_destroy(collection, true); +failed: + lxb_html_document_destroy(document); + return EXIT_FAILURE; +} +``` + +**Output:** +``` +Found: div +Found: p +``` + +### Walking the DOM Tree + +```C +#include +#include + +static lexbor_action_t +walker(lxb_dom_node_t *node, void *ctx) +{ + size_t *count = (size_t *) ctx; + if (lxb_dom_node_type(node) == LXB_DOM_NODE_TYPE_ELEMENT) { + (*count)++; + } + return LEXBOR_ACTION_OK; +} + +int +main(void) +{ + lxb_status_t status; + lxb_html_document_t *document; + size_t count = 0; + + static const lxb_char_t html[] = + "

text

  • item
"; + + document = lxb_html_document_create(); + status = lxb_html_document_parse(document, html, sizeof(html) - 1); + if (status != LXB_STATUS_OK) { + lxb_html_document_destroy(document); + return EXIT_FAILURE; + } + + lxb_dom_node_simple_walk( + lxb_dom_interface_node(document->body), walker, &count); + + /* count includes itself */ + printf("Elements in body: %zu\n", count); + + lxb_html_document_destroy(document); + return EXIT_SUCCESS; +} +``` +**Output:** +``` +Elements in body: 6 +``` -*(Documentation is currently being developed, details will be available here soon.)* +The count includes ``, `
`, `

`, ``, `