Skip to content

clone_with_prefixes produces non-deterministic namespace node ordering#46

Open
frankarensmeier wants to merge 1 commit into
faassen:mainfrom
frankarensmeier:fix/deterministic-clone-prefixes
Open

clone_with_prefixes produces non-deterministic namespace node ordering#46
frankarensmeier wants to merge 1 commit into
faassen:mainfrom
frankarensmeier:fix/deterministic-clone-prefixes

Conversation

@frankarensmeier

Copy link
Copy Markdown

clone_with_prefixes iterates over the Prefixes map returned by inherited_prefixes() when inserting inherited namespace nodes onto the cloned element. Since Prefixes is AHashMap<PrefixId, NamespaceId>, the iteration order varies across process runs due to random hash seeding.

This causes namespace nodes to be inserted in different tree order on each run, which has observable consequences:

  1. prefix_for_namespace() returns different results — it walks namespace nodes in tree order and returns the first match, so different namespace node ordering means different prefix selection for the same namespace.
  2. XPath name() becomes non-deterministic — since the chosen prefix varies, the qualified name of an element can change between runs.
  3. Serialized output variesxmlns declarations appear in different order, and elements may use different prefixes.

Reproducer:

Parse an XML document with multiple namespace declarations on a parent element, then clone_with_prefixes a child element. Run the program multiple times — the namespace declarations on the cloned element appear in a different order each time.

use xot::Xot;

fn main() {
    let mut xot = Xot::new();
    let root = xot.parse(
        r#"<root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c"><a:child/></root>"#
    ).unwrap();
    let doc_el = xot.document_element(root).unwrap();
    let child = xot.first_child(doc_el).unwrap();
    let cloned = xot.clone_with_prefixes(child);
    println!("{}", xot.to_string(cloned).unwrap());
}

Run this several times — the xmlns: attributes will appear in varying order.

Fix:

Sort the inherited prefixes by PrefixId before inserting, so the namespace node order is deterministic regardless of hash seed:

let mut sorted_prefixes: Vec<_> = prefixes.into_iter().collect();
sorted_prefixes.sort_by_key(|(prefix, _)| *prefix);

let mut namespaces = self.namespaces_mut(clone);
for (prefix, ns) in sorted_prefixes {
    // ...
}

inherited_prefixes() returns an AHashMap whose iteration order varies
across process runs due to random seeding. This caused
clone_with_prefixes() to insert namespace nodes in non-deterministic
order, which in turn made prefix_for_namespace() — which walks
namespace nodes in tree order and returns the first match — return
different prefixes for the same namespace across runs.

Observable effects:
- Serialized output had varying xmlns attribute order
- XPath name() returned different QNames for the same element
- XSLT conformance tests failed intermittently

Fix: sort the inherited prefixes by PrefixId before inserting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant