Skip to content

Comments

The new XmlReader which is able to read from several sources#948

Draft
Mingun wants to merge 11 commits intotafia:masterfrom
Mingun:new-reader
Draft

The new XmlReader which is able to read from several sources#948
Mingun wants to merge 11 commits intotafia:masterfrom
Mingun:new-reader

Conversation

@Mingun
Copy link
Collaborator

@Mingun Mingun commented Feb 22, 2026

This is an implementation that I have already mentioned several times. I decided to abandon renaming the Reader -> RawReader because it can break dependents too much, and instead add a new XmlReader that will take over most of the low-level things usually not required in XML processing:

  • skipping of comments
  • text merging (although this is not done in this implementation)
  • some checks of the XML structure, in particular, that we have only one XML declaration and Doctype and only at the beginning of the document
  • automatic determination of the encoding and switching to the corresponding decoder (not implemented, but with the presented structure it should be easy to do)

It is more or less working, although there are several questions that it seems to me still need to be worked out:

  • I'm not sure that all unwrap()s are safe
  • I'm not sure if introduced EntityResolver trait is good enough. I would like to optimize some special cases, such as borrowing from parsed document

At least the implementation is already mature enough to show it to the world and could be giving someone ideas. Currently I'm not sure when I can continue to working on it.

Mingun and others added 11 commits February 22, 2026 21:54
… of several XML sources

TODO: remaining questions about namespace resolving
…e new XmlReader

Now deserializing from BufRead requires 'static type (is that still true?)

failures:
  serde-de:
    resolve::resolve_custom_entity
  --doc:
    src\de\mod.rs - de::Deserializer<'de,R,E>::get_ref (line 2580)
    src\de\resolver.rs - de::resolver::EntityResolver (line 13)
The new XmlReader captures DTD and resolves references, so that events never produced
…are parsed

Remove previous test because it is covered by the new one
After removing `R` generic, some impl blocks began to have the same bounds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant