fix: parser selector correctness and XPath injection (#5 #14)#21
Conversation
- Use NodeId identity in get_nth_of_type instead of comparing outer_html, so identical sibling HTML no longer collapses every position to 1 - Escape attribute values when embedding into XPath: single quotes by default, double quotes when the value contains single quotes, and concat() when it contains both. Adversarial id values can no longer inject XPath operators - Expose Selector::node_id() for the identity check Closes #5 Closes #14 https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds a public ChangesSelector node identity and XPath generation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/parser/selector.rs`:
- Around line 69-78: The doc comment for tag() got attached to node_id() because
node_id() was inserted between the comment and the tag() method; to fix,
relocate the pub fn node_id(&self) -> NodeId { ... } (and its own doc comment)
so the tag() doc comment immediately precedes pub fn tag(&self) -> &str { ... }
— either move node_id() above the existing tag() doc block or place node_id()
after the tag() method, ensuring the tag() doc comment is directly above the
tag() function and node_id() retains its own doc comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 5f8f13e6-e11c-4534-a026-01886be94dde
📒 Files selected for processing (3)
src/parser/selector.rssrc/parser/selector_generation.rstests/parser_selector_generation.rs
Address CodeRabbit review on #21: the tag() doc comment was hovering above node_id() because the new method was inserted between them. Move the tag() doc back to immediately above tag(). https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
|
Actionable comments posted: 0 |
Summary
Two parser fixes:
id:idattribute values were embedded directly into XPath single-quoted strings, so a value likefoo' or '1'='1produced a syntactically valid but semantically attacker-controlled expression. Added axpath_string_literalhelper that picks the right delimiter — single quotes by default, double quotes when the value contains single quotes, andconcat()when it contains both.nth_of_type()wrong for identical siblings: The function comparedouter_html()strings, so three<li>Apple</li>siblings all resolved tonth-of-type(1). ExposedSelector::node_id()(alreadypub(crate)) and usedNodeIdequality for identity instead.Test plan
xpath_string_literal(plain, single-quote, both-quotes)<li>siblings now yield distinct CSS/XPath positionsid="foo' or '1'='1"produces a properly delimited XPath literalcargo test— full suite greencargo clippy --all-targets -- -D warningscleancargo fmt --checkcleanCloses #5
Closes #14
Generated by Claude Code
Summary by CodeRabbit
New Features
Tests