Skip to content

CoReason-AI/coreason-navigator

Repository files navigation

coreason-navigator

License CI Code Style: Ruff Documentation

coreason-navigator is the "Eyes and Hands" of the CoReason platform for the World Wide Web. It bridges the gap between modern AI agents and legacy web interfaces using State-of-the-Art (SOTA) "Computer Use" techniques.

Features

  • Visual Navigation: Agents use a headless browser (Playwright) to "see" and interact with webpages just like a human, bypassing the need for fragile API integrations.
  • Vision-Language Model (VLM) Integration: Uses screenshots and Accessibility Trees (AX Tree) to ground LLM intent into precise screen coordinates.
  • Robust Orchestration: Handles dynamic content, session persistence, and stealth techniques (e.g., User-Agent rotation) to avoid detection.
  • Set-of-Marks (SoM): Overlays numeric tags on interactive elements to improve VLM accuracy.
  • Content Extraction: Converts noisy webpages into clean Markdown for LLM consumption.
  • Safety First: Includes rate limiting, domain allowlisting, and PII input protection.

Installation

pip install coreason-navigator

Or install from source:

git clone https://github.com/CoReason-AI/coreason_navigator.git
cd coreason_navigator
pip install .

You will also need to install Playwright browsers:

playwright install chromium

Usage

Here is a simple example of how to use the PlaywrightNavigator:

import asyncio
from coreason_navigator.driver import PlaywrightNavigator
from coreason_navigator.types import GotoAction, ClickAction

async def main():
    # Initialize the navigator (headless by default)
    navigator = PlaywrightNavigator(headless=True)

    try:
        # Launch the browser
        await navigator.launch()

        # Navigate to a URL
        print("Navigating to example.com...")
        state = await navigator.navigate("https://example.com")
        print(f"Title: {state.title}")

        # Take a screenshot (base64 encoded)
        # print(state.screenshot_base64[:50] + "...")

        # Extract content
        content = await navigator.extract_content(format="markdown")
        print("Page Content Summary:")
        print(content[:200])

    finally:
        # Always close resources
        await navigator.close()

if __name__ == "__main__":
    asyncio.run(main())

Architecture

  1. Observe: Captures screenshot and Accessibility Tree.
  2. Orient: Maps user intent to screen coordinates using VLM.
  3. Decide: Formulates browser actions (Click, Type, Scroll).
  4. Act: Executes actions via Playwright.

License

This software is proprietary and dual-licensed. Licensed under the Prosperity Public License 3.0. Commercial use beyond a 30-day trial requires a separate license.

About

coreason-navigator

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors