Skip to content

fix: parser selector correctness and XPath injection (#5 #14)#21

Merged
Liohtml merged 2 commits into
masterfrom
claude/parser-selector-fixes
May 28, 2026
Merged

fix: parser selector correctness and XPath injection (#5 #14)#21
Liohtml merged 2 commits into
masterfrom
claude/parser-selector-fixes

Conversation

@Liohtml
Copy link
Copy Markdown
Owner

@Liohtml Liohtml commented May 28, 2026

Summary

Two parser fixes:

Test plan

  • 3 unit tests for xpath_string_literal (plain, single-quote, both-quotes)
  • Integration test: 3 identical <li> siblings now yield distinct CSS/XPath positions
  • Integration test: adversarial id="foo' or '1'='1" produces a properly delimited XPath literal
  • cargo test — full suite green
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --check clean

Closes #5
Closes #14


Generated by Claude Code

Summary by CodeRabbit

  • New Features

    • Selector now exposes a stable DOM node identity for more reliable element comparisons.
    • Improved selector generation to correctly disambiguate identical siblings and safely handle IDs containing quotes.
  • Tests

    • Added tests covering sibling disambiguation and ID quoting/escaping in generated selectors.

Review Change Stack

- Use NodeId identity in get_nth_of_type instead of comparing outer_html, so
  identical sibling HTML no longer collapses every position to 1
- Escape attribute values when embedding into XPath: single quotes by
  default, double quotes when the value contains single quotes, and
  concat() when it contains both. Adversarial id values can no longer
  inject XPath operators
- Expose Selector::node_id() for the identity check

Closes #5
Closes #14

https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f02ae930-0413-4398-b63a-26565d786a40

📥 Commits

Reviewing files that changed from the base of the PR and between 37e70f6 and 8298c1f.

📒 Files selected for processing (1)
  • src/parser/selector.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/parser/selector.rs

📝 Walkthrough

Walkthrough

This PR adds a public Selector::node_id() accessor, updates get_nth_of_type to disambiguate siblings by NodeId identity, and introduces xpath_string_literal to safely quote ID values in XPath predicates. Tests validate identical-sibling indexing and XPath quoting for adversarial IDs.

Changes

Selector node identity and XPath generation

Layer / File(s) Summary
Public NodeId accessor
src/parser/selector.rs
Selector::node_id() method returns the underlying NodeId for DOM comparisons.
Selector generation with NodeId sibling matching and XPath escaping
src/parser/selector_generation.rs
get_nth_of_type now uses NodeId identity comparison instead of outer_html to disambiguate same-tag siblings. generate_xpath_selector uses a new xpath_string_literal helper that safely formats ID predicates by choosing single or double quotes when possible and falling back to concat(...) for edge cases containing both quote types.
Test coverage for NodeId disambiguation and XPath escaping
tests/parser_selector_generation.rs
Two new tests: one validates that identical <li> siblings receive distinct nth-of-type(1..3) positions in CSS and corresponding [1..3] indices in XPath; another validates XPath ID escaping when element IDs contain single quotes, ensuring the output uses a properly quoted literal containing the original value.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • Issue 14: This PR implements the requested change by exposing Selector::node_id() and updating get_nth_of_type to compare NodeId identity rather than outer_html.

Poem

🐰 I hop through nodes with nimble stride,
NodeId in paw, no ghosts to hide,
Siblings sorted by true identity,
XPath quoted with calm serenity,
A little rabbit cheers selectors wide.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: it addresses two specific parser bugs (XPath injection via unsanitized id and nth_of_type correctness) and includes issue references (#5 #14).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/parser-selector-fixes

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/parser/selector.rs`:
- Around line 69-78: The doc comment for tag() got attached to node_id() because
node_id() was inserted between the comment and the tag() method; to fix,
relocate the pub fn node_id(&self) -> NodeId { ... } (and its own doc comment)
so the tag() doc comment immediately precedes pub fn tag(&self) -> &str { ... }
— either move node_id() above the existing tag() doc block or place node_id()
after the tag() method, ensuring the tag() doc comment is directly above the
tag() function and node_id() retains its own doc comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5f8f13e6-e11c-4534-a026-01886be94dde

📥 Commits

Reviewing files that changed from the base of the PR and between 805f55b and 37e70f6.

📒 Files selected for processing (3)
  • src/parser/selector.rs
  • src/parser/selector_generation.rs
  • tests/parser_selector_generation.rs

Comment thread src/parser/selector.rs Outdated
Address CodeRabbit review on #21: the tag() doc comment was hovering above
node_id() because the new method was inserted between them. Move the
tag() doc back to immediately above tag().

https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@Liohtml Liohtml merged commit 33ce947 into master May 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants