Vision
Make screenshots the escape hatch for visual debugging, not the default way to "see" a page.
Problem
Agents use screenshots because they want to "see" the page. But most intents don't need pixels:
| Intent |
What They Actually Need |
Current Solution |
Better Solution |
| "What's on this page?" |
Structure, content |
Screenshot (expensive) |
DOM / A11y tree |
| "Where is the button?" |
Element location |
Screenshot (expensive) |
Bounding boxes |
| "What's below the fold?" |
More content |
Full-page screenshot |
Scroll + query |
| "Why does this look broken?" |
Visual rendering |
Screenshot |
Screenshot ✓ |
Only visual debugging truly needs pixels.
Proposed Commands
1. bdg dom layout [selector]
Returns element positions, sizes, and visibility without pixels.
bdg dom layout "button.submit"
{
"selector": "button.submit",
"count": 1,
"elements": [{
"index": 0,
"tag": "button",
"text": "Submit",
"bounds": { "x": 450, "y": 1200, "width": 120, "height": 40 },
"viewport": { "visible": false, "belowFold": true, "percentVisible": 0 },
"computed": { "display": "block", "visibility": "visible" }
}]
}
Use case: Agent needs to know where something is without burning tokens on screenshot.
2. bdg dom scroll <selector>
Scroll element into viewport.
bdg dom scroll "footer" # Scroll to footer
bdg dom scroll --to "bottom" # Scroll to page bottom
bdg dom scroll --by 500 # Scroll down 500px
bdg dom scroll --to "top" # Back to top
Use case: Navigate long pages without full-page screenshots.
3. Enhanced bdg dom a11y tree output
Add visual hints to accessibility tree:
[Button] "Submit" (below fold, y=1200)
[Link] "Learn more" (visible, y=450)
[Image] "Hero banner" (above fold, 1200×400)
Use case: Agent can understand page layout from a11y tree without screenshots.
Workflow Example
# Old way (expensive)
bdg dom screenshot page.png # 12,000 tokens burned
# New way (cheap)
bdg dom a11y tree # ~500 tokens, shows structure
bdg dom layout "form" # ~100 tokens, shows position
bdg dom scroll "form" # 0 tokens, brings into view
bdg dom screenshot form.png --selector "form" # ~500 tokens, element only
Implementation Notes
layout uses DOM.getBoxModel and DOM.getDocument
scroll uses Runtime.evaluate with scrollIntoView()
- A11y enhancement uses existing
Accessibility.getFullAXTree + position data
Acceptance Criteria
Priority
This is a strategic feature for token efficiency. Complements #116 (smart resize) as the long-term solution.
Labels
enhancement
agent-friendly
strategic
Vision
Make screenshots the escape hatch for visual debugging, not the default way to "see" a page.
Problem
Agents use screenshots because they want to "see" the page. But most intents don't need pixels:
Only visual debugging truly needs pixels.
Proposed Commands
1.
bdg dom layout [selector]Returns element positions, sizes, and visibility without pixels.
bdg dom layout "button.submit"{ "selector": "button.submit", "count": 1, "elements": [{ "index": 0, "tag": "button", "text": "Submit", "bounds": { "x": 450, "y": 1200, "width": 120, "height": 40 }, "viewport": { "visible": false, "belowFold": true, "percentVisible": 0 }, "computed": { "display": "block", "visibility": "visible" } }] }Use case: Agent needs to know where something is without burning tokens on screenshot.
2.
bdg dom scroll <selector>Scroll element into viewport.
Use case: Navigate long pages without full-page screenshots.
3. Enhanced
bdg dom a11y treeoutputAdd visual hints to accessibility tree:
Use case: Agent can understand page layout from a11y tree without screenshots.
Workflow Example
Implementation Notes
layoutusesDOM.getBoxModelandDOM.getDocumentscrollusesRuntime.evaluatewithscrollIntoView()Accessibility.getFullAXTree+ position dataAcceptance Criteria
bdg dom layout [selector]returns bounding boxes and visibilitybdg dom scroll <selector>scrolls element into viewbdg dom scroll --to top|bottomfor page navigationPriority
This is a strategic feature for token efficiency. Complements #116 (smart resize) as the long-term solution.
Labels
enhancementagent-friendlystrategic