You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Proposal: BoxLite as VM infrastructure for hardware-isolated computer automation
Hi Google Gemini team! 👋
First off, excellent work on computer-use-preview—this is a fantastic demonstration of Gemini's computer use capabilities. The abstraction between Playwright and Browserbase is particularly well-designed, making it easy to swap execution environments.
I've been working on BoxLite (github.com/boxlite-labs/boxlite), an embeddable VM runtime, and I think there's a natural opportunity to add a third backend option that provides hardware-isolated, local VM-based computer automation—enabling Gemini to control entire desktop environments (not just browsers), especially useful for multi-tenant deployments, reproducible environments, and scenarios requiring stronger isolation than browser sandboxes.
Demo: BoxLite with Claude Code (AI Agent Desktop Automation)
This shows BoxLite providing full desktop automation for Claude Code (an AI coding agent). The same approach could work for Gemini controlling computer-use-preview environments.
Current State & Opportunity
computer-use-preview currently provides two excellent execution backend options:
Playwright (Local):
✅ Fast local execution
✅ Simple setup
✅ No cloud dependency
⚠️ Runs on host system (less isolation)
⚠️ Limited to browser context
⚠️ OS dropdown rendering issues
Browserbase (Cloud):
✅ Better OS element support
✅ Managed infrastructure
✅ Network isolation
⚠️ Cloud dependency
⚠️ Latency
⚠️ Limited to browser context
There's an opportunity for a third option that enables full computer automation:
Hardware-isolated like Browserbase (but local)
No cloud dependency like Playwright (but more isolated)
Full desktop environment (control any GUI application, not just browsers)
Native OS support (VMs render OS elements correctly)
What is BoxLite?
BoxLite (github.com/boxlite-labs/boxlite) takes an "embeddable library" approach to sandboxing—think SQLite for VMs. Instead of requiring Docker Desktop or a daemon, it's a library that provides hardware-level isolated environments.
Core characteristics:
Hardware virtualization (KVM/Hypervisor.framework) — Real VMs, not just browser sandboxes
No daemon dependency — Just a Python library (pip install boxlite)
OCI-compatible — Uses standard Docker images from any registry
Cross-platform — macOS (Apple Silicon) and Linux (x86_64, ARM64)
Full desktop environment — Run any GUI application, not just browsers
BoxLite would fit naturally into the existing computers/ abstraction as a new backend:
New Implementation
# computers/boxlite/boxlite_computer.pyimportboxlitefromcomputers.computerimportComputerclassBoxliteComputer(Computer):
"""Computer implementation using BoxLite VMs for hardware isolation"""asyncdef__init__(self, image="ubuntu-desktop:latest"):
# Start VM with desktop environmentself.desktop=awaitboxlite.ComputerBox(
cpu=2,
memory=2048,
image=image
).__aenter__()
asyncdefscreenshot(self) ->bytes:
"""Capture screenshot from VM desktop"""result=awaitself.desktop.screenshot()
returnresult['data'] # base64 encoded PNGasyncdefclick_at(self, x: int, y: int):
"""Click at normalized coordinates"""awaitself.desktop.mouse_move(x, y)
awaitself.desktop.left_click()
asyncdeftype_text_at(self, x: int, y: int, text: str):
"""Type text at coordinates"""awaitself.desktop.mouse_move(x, y)
awaitself.desktop.left_click()
awaitself.desktop.type(text)
asyncdefscroll_at(self, x: int, y: int, direction: str):
"""Scroll at coordinates"""awaitself.desktop.scroll(x, y, direction, amount=3)
asyncdefkey_combination(self, keys: str):
"""Execute keyboard shortcut"""awaitself.desktop.key(keys)
asyncdefclose(self):
"""Cleanup VM"""awaitself.desktop.__aexit__(None, None, None)
Usage
# Use BoxLite backend for browser automation
python main.py --computer=boxlite "Book a flight to Tokyo"# Or any desktop automation task
python main.py --computer=boxlite "Open VS Code and create a new Python file"
python main.py --computer=boxlite "Take a screenshot of the desktop"# Same agent, different infrastructure# All actions happen in hardware-isolated VM with full desktop
Production readiness: Early stage, used in production by some teams
Potential Next Steps
If this seems interesting, I'd be happy to:
Implement computers/boxlite/ - Create BoxliteComputer backend following existing patterns
Provide example OCI images - Pre-built desktop images with Chrome/Firefox, VS Code, Terminal for testing
Share benchmarks - Show startup time, memory usage, performance comparisons
Document integration - Add BoxLite setup instructions to README
Demo full computer automation - Show Gemini controlling entire desktop (not just browsers)
No pressure—mainly wanted to share this as a potential third backend option for scenarios requiring hardware isolation, local VM infrastructure, or full desktop automation capabilities.
Feedback Welcome
I'd love to hear your thoughts on:
Whether hardware-isolated local VMs would be valuable for computer-use-preview users
If the computers/ abstraction makes this integration straightforward
What scenarios you see benefiting most from VM-based computer automation (full desktop, not just browsers)
Any concerns about the approach
And if you're interested in BoxLite for other projects, feel free to check it out—we're building in public and feedback helps! A ⭐ on GitHub would be appreciated if you find it useful.
Disclosure: I'm one of the BoxLite maintainers, but I genuinely think there's natural synergy here—BoxLite's ComputerBox was designed for exactly this use case (AI agents controlling desktop environments), and your abstraction layer makes integration straightforward. Looking forward to your thoughts!
Proposal: BoxLite as VM infrastructure for hardware-isolated computer automation
Hi Google Gemini team! 👋
First off, excellent work on computer-use-preview—this is a fantastic demonstration of Gemini's computer use capabilities. The abstraction between Playwright and Browserbase is particularly well-designed, making it easy to swap execution environments.
I've been working on BoxLite (github.com/boxlite-labs/boxlite), an embeddable VM runtime, and I think there's a natural opportunity to add a third backend option that provides hardware-isolated, local VM-based computer automation—enabling Gemini to control entire desktop environments (not just browsers), especially useful for multi-tenant deployments, reproducible environments, and scenarios requiring stronger isolation than browser sandboxes.
Demo: BoxLite with Claude Code (AI Agent Desktop Automation)
boxlite-mcp-demo-compressed.mp4
This shows BoxLite providing full desktop automation for Claude Code (an AI coding agent). The same approach could work for Gemini controlling computer-use-preview environments.
Current State & Opportunity
computer-use-preview currently provides two excellent execution backend options:
Playwright (Local):
Browserbase (Cloud):
There's an opportunity for a third option that enables full computer automation:
What is BoxLite?
BoxLite (github.com/boxlite-labs/boxlite) takes an "embeddable library" approach to sandboxing—think SQLite for VMs. Instead of requiring Docker Desktop or a daemon, it's a library that provides hardware-level isolated environments.
Core characteristics:
pip install boxlite)Architecture:
Integration with Existing Architecture
BoxLite would fit naturally into the existing
computers/abstraction as a new backend:New Implementation
Usage
Use Cases
1. Multi-Tenant SaaS 🏢
Challenge: Running multiple users' computer automation tasks safely
With BoxLite:
2. Reproducible Environments ✅
Challenge: "Works on my machine" issues
With BoxLite:
3. Native OS Elements 🎯
Challenge: Playwright can't capture OS-rendered dropdowns
With BoxLite:
4. Local Development 💻
Challenge: Need Browserbase-like isolation locally
With BoxLite:
5. Full Computer Automation 🚀
True computer use: BoxLite ComputerBox provides full desktop environment
Working Integration Example
BoxLite already has a working integration with Claude Code via MCP:
The same pattern applies here—computer-use-preview using BoxLite's ComputerBox for hardware-isolated, full-desktop computer automation.
Potential Benefits
1. Enhanced Security
2. Reproducibility
mycompany/browser-env:v1.03. Developer Experience
4. Flexibility
Trade-offs & Considerations
When Playwright is Better
When Browserbase is Better
When BoxLite Helps
Recommendation: Offer all three options—users choose based on their needs.
BoxLite Status
Potential Next Steps
If this seems interesting, I'd be happy to:
computers/boxlite/- Create BoxliteComputer backend following existing patternsNo pressure—mainly wanted to share this as a potential third backend option for scenarios requiring hardware isolation, local VM infrastructure, or full desktop automation capabilities.
Feedback Welcome
I'd love to hear your thoughts on:
computers/abstraction makes this integration straightforwardAnd if you're interested in BoxLite for other projects, feel free to check it out—we're building in public and feedback helps! A ⭐ on GitHub would be appreciated if you find it useful.
Disclosure: I'm one of the BoxLite maintainers, but I genuinely think there's natural synergy here—BoxLite's ComputerBox was designed for exactly this use case (AI agents controlling desktop environments), and your abstraction layer makes integration straightforward. Looking forward to your thoughts!