Skip to content

nandanadileep/MCP-Server-for-Browser-Automation

Repository files navigation

Browser Automation MCP Server

This project implements a browser automation server using the Model Context Protocol (MCP) with Playwright for web automation.

Project Structure

MCP-Server/
├── browser_automation_server.py    # Browser automation MCP server
├── browser_automation_config.json  # Configuration for the server
├── test_browser_automation.py      # Basic test client
├── browser_automation_agent.py     # Advanced client with LLM agent
├── mcp-env/                        # Virtual environment with dependencies
└── README.md                       # This file

Features

The browser automation server provides the following tools:

  • start_browser() - Start a new browser instance
  • navigate_to_url(url) - Navigate to a specific URL
  • click_element(selector) - Click on an element using CSS selector
  • type_text(selector, text) - Type text into an input field
  • get_page_content() - Get the current page content
  • take_screenshot(filename) - Take a screenshot of the current page
  • close_browser() - Close the browser
  • browser_automation_task(task_description, website_url) - Perform a complete automation task

Setup

  1. Virtual Environment: The project uses a virtual environment located in mcp-env/

  2. Dependencies: The following packages are installed:

    • fastmcp - For creating MCP servers
    • mcp_use - For creating MCP clients and agents
    • playwright - For browser automation
    • langchain-ollama - For Ollama integration (optional)
  3. Install Playwright Browsers:

    source mcp-env/bin/activate
    playwright install

Usage

1. Basic Test

Run the basic test to verify the browser automation works:

./mcp-env/bin/python test_browser_automation.py

This will:

  • Start a browser
  • Navigate to Google
  • Take a screenshot
  • Close the browser

2. Advanced Agent Mode

For LLM-powered browser automation, use the agent client:

./mcp-env/bin/python browser_automation_agent.py

This requires Ollama to be installed and running with the llama3.1:8b model.

3. Manual Tool Calls

You can also create your own client to make specific tool calls:

import asyncio
import json
from mcp_use import MCPClient

async def main():
    with open("browser_automation_config.json", "r") as f:
        config = json.load(f)

    client = MCPClient.from_dict(config)
    await client.create_all_sessions()
    
    session = client.get_session("browser_automation")
    
    # Start browser
    await session.call_tool("start_browser", {})
    
    # Navigate to a website
    await session.call_tool("navigate_to_url", {"url": "example.com"})
    
    # Take a screenshot
    await session.call_tool("take_screenshot", {"filename": "my_screenshot.png"})
    
    # Close browser
    await session.call_tool("close_browser", {})
    
    await client.close_all_sessions()

asyncio.run(main())

Configuration

The browser_automation_config.json file configures the MCP server:

{
  "mcpServers": {
    "browser_automation": {
      "command": "python",
      "args": ["browser_automation_server.py"]
    }
  }
}

Examples

Example 1: Simple Web Scraping

# Navigate to a website and get content
await session.call_tool("start_browser", {})
await session.call_tool("navigate_to_url", {"url": "https://example.com"})
content = await session.call_tool("get_page_content", {})
await session.call_tool("close_browser", {})

Example 2: Form Filling

# Fill out a form
await session.call_tool("start_browser", {})
await session.call_tool("navigate_to_url", {"url": "https://example.com/form"})
await session.call_tool("type_text", {"selector": "#name", "text": "John Doe"})
await session.call_tool("type_text", {"selector": "#email", "text": "john@example.com"})
await session.call_tool("click_element", {"selector": "#submit"})
await session.call_tool("close_browser", {})

Example 3: Screenshot Automation

# Take screenshots of multiple pages
urls = ["google.com", "github.com", "stackoverflow.com"]
for i, url in enumerate(urls):
    await session.call_tool("start_browser", {})
    await session.call_tool("navigate_to_url", {"url": url})
    await session.call_tool("take_screenshot", {"filename": f"screenshot_{i}.png"})
    await session.call_tool("close_browser", {})

Troubleshooting

Common Issues

  1. Browser not starting: Make sure Playwright browsers are installed:

    playwright install
  2. Import errors: Use the correct Python path:

    ./mcp-env/bin/python your_script.py
  3. Permission errors: Make sure the virtual environment is activated:

    source mcp-env/bin/activate

Security Notes

  • The browser automation can access any URL, including local files
  • Be careful when exposing this server to untrusted users
  • Consider running in a sandboxed environment for production use

Development

To modify the browser automation server:

  1. Edit browser_automation_server.py
  2. Add new tools using the @mcp.tool() decorator
  3. Test with test_browser_automation.py
  4. Update the README with new features

License

This project is open source and available under the MIT License.

About

MCP server exposing browser automation tools via Playwright for AI agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages