Skip to content

Latest commit

 

History

History
669 lines (528 loc) · 26.8 KB

File metadata and controls

669 lines (528 loc) · 26.8 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Android Accessibility Inspector Service - exposes Android accessibility tree data through a WebSocket server for external inspection and automation tools. The service captures accessibility node information, screenshots, and enables remote UI automation through gestures and actions.

Security Warning: This service exposes all screen content through WebSocket. Disable when not in use.

Build Commands

# Build the app
./gradlew build

# Clean build
./gradlew clean

# Build debug APK
./gradlew assembleDebug

# Build release APK
./gradlew assembleRelease

# Install debug build to connected device
./gradlew installDebug

# Run tests
./gradlew test

# Run instrumented tests (requires connected device/emulator)
./gradlew connectedAndroidTest

Architecture

Core Components

  • AccessibilityInspector (AccessibilityInspector.java): Main accessibility service that captures UI tree data, handles screenshot capture, and processes automation commands
  • SocketService (SocketService.java): WebSocket server running on port 38301 that handles client connections and message routing
  • ServiceActivity (ServiceActivity.java): Simple launcher activity

Data Flow

  1. Capture Process: Client sends {"message":"capture"} → SocketService broadcasts intent → AccessibilityInspector captures tree using TreeDebug → JSON tree data sent to all connected WebSocket clients

  2. Action Process: Client sends action command → SocketService directly calls AccessibilityInspector methods → Result sent back through WebSocket

Key Features

  • Tree Capture: Captures accessibility node tree with/without non-important views
  • Screenshot Integration: Base64 encoded screenshots bundled with tree data
  • Accessibility Event Forwarding: Real-time forwarding of user interactions with "before" tree context
  • UI Automation:
    • Element actions (click, focus, text input) via resourceId or hashCode
    • Gesture automation (tap, swipe, scroll) via coordinates
    • Activity launching with multiple launch types
  • Real-time Communication: WebSocket server with JSON message format

Dependencies

  • AndroidAsync: WebSocket server implementation
  • Google Accessibility Utils: Tree traversal and node manipulation utilities
  • Auto-Value: Value class generation
  • Guava: Utility collections and functions

WebSocket API

Connection: ws://localhost:38301/ (use adb forward tcp:38301 tcp:38301)

Performance Characteristics

The service uses two different capture methods optimized for different use cases:

Manual Tree Capture

Commands: capture, captureNotImportant

  • Method: Full metadata extraction (TreeDebug.logNodeTrees())
  • Speed: 5-15 seconds per capture
  • Data: Complete debugging information
    • Screen coordinates (getBoundsInScreen())
    • Clickable text analysis (URLs, emails, phone numbers)
    • Locale/language detection
    • Action lists and complex metadata
    • DP scaling calculations
  • Use Case: Manual debugging, detailed analysis, development tools

Automatic Stable Trees

Messages: stableTree type

  • Method: Optimized structural capture (TreeDebug.logNodeTreesFast())
  • Speed: 200-300ms per capture (faster)
  • Data: Essential structure only
    • Node hierarchy and relationships
    • Text content and descriptions
    • Visibility state and basic properties
    • Node identification (hashCode, className, resourceId)
  • Use Case: Real-time monitoring, automation, change detection

Trade-off: Manual captures provide complete data but are slow. Stable trees provide fast updates but with minimal metadata.

Message Types

The service supports two distinct message flows:

1. Client-Initiated Messages

Flow: Client sends command → Service responds

Commands

{"message":"capture"}           // Request tree (important views only) - SLOW/FULL method
{"message":"capture", "visibleOnly":true}  // Request tree with invisible leaf filtering (reduces size)
{"message":"captureNotImportant"}  // Request tree (all views) - SLOW/FULL method  
{"message":"captureNotImportant", "visibleOnly":true}  // Request tree (all views) with filtering
{"message":"ping"}              // Connection test
{"message":"performAction", "resourceId":"...", "action":"CLICK"}
{"message":"performGesture", "gestureType":"TAP", "x":100, "y":200}
{"message":"launchActivity", "launchType":"PACKAGE", "packageName":"com.example.app"}
{"message":"findByViewId", "viewId":"com.example.app:id/button"}
{"message":"findByText", "text":"Submit"}              // DEPRECATED - use customFindByText
{"message":"customFindByText", "text":"Submit"}        // Exact text match (case-sensitive)
{"message":"findByRegex", "pattern":".*Submit.*"}      // Regex pattern matching
{"message":"customFindByViewId", "viewId":"..."}       // Alternative viewId search
{"message":"findByProps", "properties":{"text":"Submit","isClickable":true}} // Property-based search

// All find methods support optional verbose flag for additional node properties:
{"message":"findByViewId", "viewId":"com.example:id/button", "verbose":true}
{"message":"customFindByText", "text":"Submit", "verbose":true}
{"message":"findByRegex", "pattern":"[0-9]+", "verbose":true}
{"message":"findByProps", "properties":{"isClickable":true}, "verbose":true}

Response Messages

Tree Data Response:

{
  "type": "tree",
  "children": [...]  // TreeDebug format
}

Note: This format change has been tested for backward compatibility with the accompanying Inspector App.

Action/Gesture/Launch Results:

{
  "type": "actionResult|gestureResult|launchResult",
  "success": true,
  "message": "Description"
}

Ping Response:

{"message": "pong"}

Find Response (All Methods):

{
  "type": "findResult",
  "success": true,
  "viewId": "com.example.app:id/button",  // Only present for findByViewId
  "text": "Submit",                          // Only present for findByText/customFindByText
  "method": "customFindByText",           // Only present for custom methods
  "stats": "Window: Slack - Total nodes: 190, ...",  // Only present for customFindByText
  "count": 2,
  "nodes": [
    {
      "hashCode": 123456,
      "className": "android.widget.Button",
      "text": "Submit",
      "contentDescription": "",
      "viewIdResourceName": "com.example.app:id/button",
      "isClickable": true,
      "isEnabled": true,
      "isFocusable": true,
      "isFocused": false,
      "isScrollable": false,
      "isCheckable": false,
      "isChecked": false,
      "isSelected": false,
      "boundsInScreen": {"left": 100, "top": 200, "right": 300, "bottom": 250}
    }
  ]
}

Response Field Guide:

  • method: Present only for custom methods (identifies which implementation was used)
  • stats: Present only for customFindByText (provides tree analysis for debugging)
  • viewId: Present for findByViewId and customFindByViewId commands
  • text: Present for findByText and customFindByText commands
  • pattern: Present for findByRegex commands
  • properties: Present for findByProps commands

Text Search Methods Explained

customFindByText: Exact, case-sensitive matching in BOTH text and contentDescription fields

{"message": "customFindByText", "text": "Activity"}
// Searches both node.getText() and node.getContentDescription()
// Finds: "Activity" in either field ✅
// Misses: "activity", "ACTIVITY", "My Activity" ❌

findByRegex: Flexible pattern matching in BOTH text and contentDescription fields

{"message": "findByRegex", "pattern": "(?i)activity"}     // Case-insensitive in either field
{"message": "findByRegex", "pattern": ".*[0-9]+.*"}       // Contains numbers in either field
{"message": "findByRegex", "pattern": "^[A-Z][a-z]+$"}    // Capitalized words in either field
{"message": "findByRegex", "pattern": "(btn|button)"}     // Multiple options in either field
// Searches both node.getText() and node.getContentDescription()

findByProps: Multi-property matching

{"message": "findByProps", "properties": {"text": "Submit", "isClickable": true}}
{"message": "findByProps", "properties": {"viewIdResourceName": "com.Slack:id/button", "isEnabled": true}}
{"message": "findByProps", "properties": {"className": "Button", "text": "OK", "isClickable": true}}

Supported Properties:

  • String properties: text, contentDescription, className, viewIdResourceName/resourceId/viewId
  • Boolean properties: isClickable, isEnabled, isFocusable, isFocused, isScrollable, isCheckable, isChecked, isSelected
  • Integer properties: childCount

Note: Found nodes use a different format than tree nodes. Find results include:

  • Direct boolean properties (isClickable, isEnabled, etc.)
  • hashCode for element identification
  • Simplified structure for easier processing

Tree nodes use the TreeDebug format with nested metadata objects.

Verbose Mode for Find Methods

All find methods support an optional verbose flag (defaults to false). When verbose: true, additional properties are included:

Basic properties (always included):

  • hashCode, parentHashCode, className, text, contentDescription, viewIdResourceName
  • State: isClickable, isEnabled, isFocusable, isFocused, isScrollable, isCheckable, isChecked, isSelected
  • boundsInScreen (object with left, top, right, bottom)

Verbose properties (only with verbose: true):

  • Text properties: hintText, errorText, tooltipText, paneTitle
  • Reference properties: labeledByHashCode (hashCode of the node that labels this one)
  • Additional states: isLongClickable, isVisibleToUser, isImportantForAccessibility, isContentInvalid, isScreenReaderFocusable
  • Collection properties:
    • collectionInfo: {rowCount, columnCount} for grids/lists
    • collectionItemInfo: {rowIndex, columnIndex, rowSpan, columnSpan} for items in collections
  • Action list: Array of available actions with format: [{id: number, label: string}, ...]
  • Other properties: windowId, childCount

Note: Properties requiring API > 28 (stateDescription, roleDescription) are not currently available due to build configuration constraints.

Example verbose response:

{
  "hashCode": 123456,
  "parentHashCode": 789012,
  "className": "android.widget.Button",
  "text": "Submit",
  // ... basic properties ...
  "hintText": "Tap to submit form",
  "isVisibleToUser": true,
  "isLongClickable": false,
  "actionList": [
    {"id": 16, "label": null},  // ACTION_CLICK
    {"id": 1, "label": null}    // ACTION_FOCUS
  ],
  "childCount": 0
}

## **Tree Capture Size Optimization**

### **visibleOnly Parameter**

Both `capture` and `captureNotImportant` commands support an optional `visibleOnly` parameter to reduce tree size:

```json
{"message":"capture", "visibleOnly":true}
{"message":"captureNotImportant", "visibleOnly":true}

How it works:

  • Default behavior (visibleOnly: false or omitted): Full tree with all nodes
  • Filtered behavior (visibleOnly: true): Removes invisible leaf nodes while preserving tree structure

Benefits:

  • ✅ Reduces tree size by removing invisible UI elements that serve no functional purpose
  • ✅ Prevents Android Intent size limit (~973KB) failures on complex apps
  • ✅ Faster WebSocket transmission
  • ✅ Preserves structural containers (invisible nodes with children)

Use cases:

  • Large apps (news apps, social media): Use visibleOnly:true to avoid size limit failures
  • Debugging invisible elements: Use visibleOnly:false (default) to see all nodes
  • Production automation: Use visibleOnly:true for faster, more reliable captures

Note: The same filtering is automatically applied to stable trees to ensure reliable delivery.

Find Method Comparison

Method Type Status Use Case
findByViewId Native Recommended Finding elements by resource ID
findByText Native Deprecated Use customFindByText instead
customFindByText Custom Recommended Exact text match (case-sensitive)
findByRegex Custom Recommended Pattern matching with regex
findByProps Custom Recommended Multi-property search with JSON criteria
customFindByViewId Custom Alternative Debugging viewId issues

Why findByText is Deprecated

Through extensive testing with comparison scripts, we discovered that Android's native findAccessibilityNodeInfosByText() method has significant limitations:

  1. Semantic Filtering: Filters out navigation/UI labels while keeping "content" text
  2. Case Sensitivity: Strictly case-sensitive, unlike our custom implementation
  3. Missing Visible Elements: Skips prominent UI elements like tab names ("Activity", "Later") while finding app names ("Slack")

Test Results Example (from test_native_vs_custom.py):

Searching for: 'Activity' (visible tab in Slack UI)
  Native method: 0 nodes found    ❌ Misses visible UI
  Custom method: 1 nodes found    ✅ Finds visible UI

Testing Methodology

The deprecation decision was based on systematic testing using several diagnostic scripts:

  1. test_native_vs_custom.py: Direct comparison showing native method missing visible UI elements
  2. test_case_sensitivity.py: Ruled out case sensitivity as the sole issue
  3. test_find_differences.py: Identified specific nodes missed by native method
  4. test_viewid_consistency.py: Confirmed native findByViewId works correctly

Key Finding: Native findByText consistently missed 5-17 visible UI elements per search, while findByViewId showed perfect consistency between native and custom implementations.

2. AccessibilityEvent-Initiated Messages

Flow: User interacts with device → Android generates AccessibilityEvent → Service automatically broadcasts to all clients

Event Messages

{
  "type": "accessibilityEvent",
  "eventType": "VIEW_CLICKED|VIEW_SELECTED|VIEW_FOCUSED|SCROLL_SEQUENCE_END|TEXT_SEQUENCE_END|WINDOW_STATE_CHANGED",
  "timestamp": 1751544001234,
  "packageName": "com.example.app", 
  "className": "android.widget.Button",
  "source": {...}
}

Stable Tree Messages

The service automatically sends stable UI trees when the interface becomes stable:

{
  "type": "stableTree",
  "timestamp": 1751544000734,
  "children": [...]  // Current stable UI state
}

How Stable Trees Work: The service continuously monitors for UI changes via WINDOW_CONTENT_CHANGED events. After 1 second of UI stability (no content changes), it captures the current tree. To prevent duplicate messages, trees are compared semantically (ignoring volatile node IDs) and only sent when actual content changes occur. This provides clients with up-to-date UI snapshots without spam.

Error Messages

{
  "type": "error",
  "message": "Error description"
}

Development Guidelines

Accessibility Service Configuration

  • Service flags configured in onServiceConnected() - modify allFlags constant to adjust capture behavior
  • Two capture modes: important views only vs all views (controlled by hideNotImportant()/showNotImportant())

WebSocket Message Handling

  • All message parsing in SocketRequestCallback.onConnected()
  • Direct method calls to AccessibilityInspector instance (stored statically)
  • Error responses follow {"type":"[messageType]Result", "success":false, "message":"..."}

Tree Capture Process

  • Uses TreeDebug.logNodeTrees() from Google accessibility utils
  • JSON format for tree data transmission
  • Screenshot capture integrated with tree data

Stable Tree System

Implementation: The service uses a simplified approach for automatic tree broadcasting:

  1. UI Monitoring: WINDOW_CONTENT_CHANGED events reset a 1-second stability timer
  2. Tree Capture: After 1 second of stability, the current UI tree is captured
  3. Deduplication: Trees are compared semantically (excluding volatile node IDs) to prevent duplicate sends
  4. Automatic Broadcast: Only when content actually changes, a stableTree message is sent to all clients

Key Files:

  • AccessibilityInspector.java: Contains handleUIContentChange(), hasTreeChanged(), and removeNodeIds() methods
  • TreeDebug.java: Modified to include node IDs in tree data but exclude them from comparison

Node ID Handling:

  • Node IDs are included in JSON tree data sent to clients (for reference purposes)
  • Node IDs are stripped during tree comparison to avoid false positives from Android's volatile object references
  • Comparison uses removeNodeIds() to recursively clean trees before string comparison

Adding New Commands

  1. Add JSON parsing in SocketService.SocketRequestCallback
  2. Add method implementation in AccessibilityInspector
  3. Follow result pattern: send[ActionType]Result(boolean, String)

API Levels & Compatibility

  • Min SDK: 28 (Android 9)
  • Target SDK: 31 (Android 12)
  • Gesture Support: Requires API 24+ (checked at runtime)
  • Java Version: 11

Recent Changes

Stable Tree Performance Optimization (2025-07-12)

Problem: Stable tree capture taking 10-15 seconds, making the system unresponsive.

Solution: Created optimized fast capture path specifically for stable trees:

  1. Fast Tree Capture Method: Added TreeDebug.logNodeTreesFast() and nodeDebugDescriptionJsonFast()
  2. Eliminated Expensive Operations: Removed system calls and complex processing:
    • getBoundsInScreen() calls (2-5ms per node - was #1 bottleneck)
    • getNodeClickableStrings() (regex analysis: 1-3ms per node)
    • getNodeLocaleStrings() (linguistic analysis: 1-2ms per node)
    • ❌ Complex metadata extraction (action lists, DP scaling, object creation)
  3. Preserved Essential Data: Tree structure, node identification, basic text, visibility, core properties
  4. Dual Architecture: Fast path for stable trees, unchanged regular path for manual captures

Performance Results:

  • Before: 10-15 seconds for stable tree capture
  • After: 200-300ms for stable tree capture
  • Compatibility: Regular tree capture unchanged for debugging tools

Debug Enhancements:

  • Enhanced debug_client with timing measurements and detailed event properties
  • Added optional WINDOW_CONTENT_CHANGED event forwarding (controlled by SEND_WINDOW_CONTENT_CHANGED_EVENTS)
  • Improved event type mapping for better debugging visibility

Previous: Stable Tree System Implementation (2025-07-12)

Problem: The original complex event-tree association system sent repeated identical trees and complicated the codebase.

Solution: Simplified to automatic stable tree broadcasting with smart deduplication:

  1. Removed Complex Event Association: Eliminated treeBeforeEvent messages and event-specific tree handlers
  2. Simplified Message Flow: Events and trees are now sent independently
  3. Added Smart Deduplication: Trees are compared semantically, ignoring volatile Android node IDs
  4. Improved Performance: Clients only receive trees when UI content actually changes

Code Changes:

  • Removed sendClickEventWithBeforeTree(), sendFocusEventWithBeforeTree(), sendSelectionEventWithBeforeTree() methods
  • Added hasTreeChanged() with semantic comparison (strips node IDs)
  • Added removeNodeIds() and removeNodeIdsFromArray() for recursive ID removal
  • Modified handleUIContentChange() to use content-based comparison
  • Kept node IDs in client data for reference while excluding them from comparison

Benefits:

  • Dramatically reduced duplicate tree messages
  • Simplified client implementations (no need to correlate events with trees)
  • Maintained backward compatibility with existing tree structure
  • Added comprehensive diff logging for debugging

Known Issues

  • WebSocket server can become unresponsive; may require service restart or device reboot
  • Null pointer exceptions possible during active screen updates
  • Service process may not terminate properly when accessibility service is disabled
  • Samsung Phone app doesn't generate scroll events (app-specific limitation)
  • Modern apps typically don't provide scroll delta data (totalScrollX/Y are often 0)
  • VIEW_SELECTED events are rare in modern apps (most use clicks instead)

Tree Capture Limitations

System UI Filtering: The TreeDebug.logNodeTrees() method intentionally filters out system UI elements from captured trees, including:

  • Status bar (contains clock, battery, signal indicators, notifications)
  • Navigation bar
  • Windows that are not isActive()
  • Windows with pane titles "Status bar" or "Notification shade."

Root Window Selection: TreeDebug uses getRootInActiveWindow() instead of each window's actual root, which may miss content in non-active windows.

Workaround: Use findByViewId and findByText commands to access system UI elements that don't appear in tree captures:

// These work for system UI elements:
{"message": "findByViewId", "viewId": "com.android.systemui:id/clock"}
{"message": "findByText", "text": "7:45"}

// These miss system UI elements:
{"message": "capture"}
{"message": "captureNotImportant"}

Future Solution: To capture system UI elements in trees, TreeDebug.logNodeTrees() would need modification to:

  1. Remove status bar/navigation bar filtering (lines 104-105, 113 in TreeDebug.java)
  2. Use each window's actual root instead of getRootInActiveWindow() (lines 76-77)
  3. Include non-active windows if desired (lines 58-60)

Scroll Event Behavior Patterns

User-initiated vs App-initiated Scrolling:

  • User scrolls: Multiple timestamps in scrollTimestamps array (continuous gesture)
  • App scrolls: Single timestamp (programmatic animation, like ViewPager transitions)

Scroll Direction Detection:

  • Horizontal scrolls: May have totalScrollX ≠ 0 (ViewPager page changes)
  • Vertical scrolls: Usually totalScrollX = 0, totalScrollY = 0 regardless of source
  • Note: Scroll delta reliability varies by app implementation

Examples:

  • Clicking ViewPager tab → Single timestamp + horizontal scroll values
  • User swiping list → Multiple timestamps + zero scroll values
  • App page transitions → Single timestamp + variable scroll values

Development Tools

Debug Clients

tests/debug_client.py: Primary debugging tool that shows detailed message analysis

cd tests && python3 debug_client.py
  • Displays message type in header (🎯 ACCESSIBILITY EVENT, 🌳 TREE MESSAGE, etc.)
  • Shows JSON preview (truncated at 1000 characters)
  • Handles both string and byte messages

tests/quick_test.py: Simple connectivity test for basic functionality

cd tests && python3 quick_test.py
  • Minimal output showing message types and sizes
  • Good for quick connection verification
  • Lighter output for basic testing

Test Scripts

tests/test_capture_commands.py: Tests tree capture functionality

cd tests && python3 test_capture_commands.py
  • Tests capture (important nodes only) and captureNotImportant (all nodes)
  • Shows JSON size, node counts, and response times
  • Verifies WebSocket connection health with ping/pong

tests/test_find_interactive.py: Interactive element search tool

cd tests && python3 test_find_interactive.py
  • Menu-driven interface for searching elements
  • Supports findByViewId and findByText searches
  • Shows all node properties including bounds, states, and IDs
  • Filters out accessibility events automatically

tests/test_find_and_click.py: Demonstrates find + action workflow

cd tests && python3 test_find_and_click.py
  • Searches for elements by text
  • Allows selection from multiple results
  • Performs click actions on selected elements
  • Shows detailed properties for all found elements

tests/test_actions.py: Tests performAction commands

cd tests && python3 test_actions.py
  • Tests various action types (CLICK, FOCUS, LONG_CLICK, SET_TEXT)
  • Uses found elements from findByText/findByViewId
  • Tests both hashCode and resourceId targeting
  • Tests error cases (missing parameters)

tests/test_gestures.py: Tests performGesture commands

cd tests && python3 test_gestures.py
  • Tests all gesture types (TAP, SWIPE, SCROLL, LONG_PRESS, DOUBLE_TAP)
  • Tests predefined scroll directions (UP, DOWN, LEFT, RIGHT)
  • Tests optional parameters (duration, end coordinates)
  • Tests error cases (missing gestureType, invalid coordinates)

tests/test_launch.py: Tests launchActivity commands

cd tests && python3 test_launch.py
  • Tests various launch types (PACKAGE, ACTIVITY, INTENT)
  • Tests common apps (Settings, Calculator, Browser)
  • Tests different intent actions and data formats
  • Tests error cases (missing parameters, invalid packages)

tests/test_findByProps.py: Tests property-based searching

cd tests && python3 test_findByProps.py
  • Tests JSON criteria matching across multiple properties
  • Supports string, boolean, and integer property matching
  • Tests error cases and complex property combinations

tests/test_findByRegex.py: Tests regex pattern matching

cd tests && python3 test_findByRegex.py
  • Tests regex patterns in both text and contentDescription fields
  • Supports case-insensitive matching and complex patterns
  • Tests error cases (invalid regex patterns)

tests/test_verbose_find.py: Tests verbose mode for find methods

cd tests && python3 test_verbose_find.py
  • Tests additional properties returned with verbose flag
  • Compares basic vs verbose response formats
  • Tests across all find method types

tests/test_visible_only.py: Tests tree size optimization

cd tests && python3 test_visible_only.py
  • Tests visibleOnly parameter for capture commands
  • Compares tree sizes with/without filtering
  • Demonstrates size reduction benefits

Diagnostic Scripts

tests/test_native_vs_custom.py: Compare native vs custom find methods

cd tests && python3 test_native_vs_custom.py
  • Direct performance and accuracy comparison
  • Reveals native method limitations
  • Used to identify findByText deprecation need

tests/test_viewid_consistency.py: Verify viewId method reliability

cd tests && python3 test_viewid_consistency.py
  • Confirms native findByViewId works correctly
  • Shows why only findByText needs deprecation
  • Tests various viewId formats and edge cases

tests/test_bounds_comparison.py: Tests coordinate bounds accuracy

cd tests && python3 test_bounds_comparison.py
  • Compares bounds data across different methods
  • Verifies coordinate consistency
  • Useful for gesture targeting validation

Connection Setup:

# Forward port from Android device
adb forward tcp:38301 tcp:38301

# Then run any test script
cd tests && python3 test_script_name.py