coreason-navigator is the "Eyes and Hands" of the CoReason platform for the World Wide Web. It bridges the gap between modern AI agents and legacy web interfaces using State-of-the-Art (SOTA) "Computer Use" techniques.
- Visual Navigation: Agents use a headless browser (Playwright) to "see" and interact with webpages just like a human, bypassing the need for fragile API integrations.
- Vision-Language Model (VLM) Integration: Uses screenshots and Accessibility Trees (AX Tree) to ground LLM intent into precise screen coordinates.
- Robust Orchestration: Handles dynamic content, session persistence, and stealth techniques (e.g., User-Agent rotation) to avoid detection.
- Set-of-Marks (SoM): Overlays numeric tags on interactive elements to improve VLM accuracy.
- Content Extraction: Converts noisy webpages into clean Markdown for LLM consumption.
- Safety First: Includes rate limiting, domain allowlisting, and PII input protection.
pip install coreason-navigatorOr install from source:
git clone https://github.com/CoReason-AI/coreason_navigator.git
cd coreason_navigator
pip install .You will also need to install Playwright browsers:
playwright install chromiumHere is a simple example of how to use the PlaywrightNavigator:
import asyncio
from coreason_navigator.driver import PlaywrightNavigator
from coreason_navigator.types import GotoAction, ClickAction
async def main():
# Initialize the navigator (headless by default)
navigator = PlaywrightNavigator(headless=True)
try:
# Launch the browser
await navigator.launch()
# Navigate to a URL
print("Navigating to example.com...")
state = await navigator.navigate("https://example.com")
print(f"Title: {state.title}")
# Take a screenshot (base64 encoded)
# print(state.screenshot_base64[:50] + "...")
# Extract content
content = await navigator.extract_content(format="markdown")
print("Page Content Summary:")
print(content[:200])
finally:
# Always close resources
await navigator.close()
if __name__ == "__main__":
asyncio.run(main())- Observe: Captures screenshot and Accessibility Tree.
- Orient: Maps user intent to screen coordinates using VLM.
- Decide: Formulates browser actions (Click, Type, Scroll).
- Act: Executes actions via Playwright.
This software is proprietary and dual-licensed. Licensed under the Prosperity Public License 3.0. Commercial use beyond a 30-day trial requires a separate license.