Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
embedded input/backend knobs.

### Fixed
- Bounded screenshot payloads by default before returning them to MCP hosts,
while exposing opt-in screenshot sizing controls and coordinate metadata for
downscaled captures.
- Added opt-in JPEG screenshot output with a caller-selected quality so agents
can choose compression before the byte cap forces additional resizing.
- Ported downstream Linux readiness fixes: `doctor` now treats direct
`/dev/uinput` and the XDG RemoteDesktop portal as valid development-input
backends instead of requiring `ydotoold` in every ready setup.
Expand Down
17 changes: 17 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ base64 = "0.22.1"
evdev = "0.13.2"
cosmic-protocols = { version = "0.2.0", default-features = false, features = ["client"] }
futures-util = "0.3.32"
image = { version = "0.25", default-features = false, features = ["png"] }
image = { version = "0.25", default-features = false, features = ["jpeg", "png"] }
rmcp = { version = "1.5.0", features = ["transport-io"] }
schemars = "1.0"
serde = { version = "1.0.228", features = ["derive"] }
Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,13 @@ MCP tools exposed by the server:
- `list_windows` — compositor windows with title, app id, wm_class, focus state, client type (Wayland/X11), and bounds
- `focused_window` — the window currently holding keyboard focus
- `get_app_state` — combined screenshot + accessibility tree for a chosen app, with element indices that the input tools accept
- `screenshot` — capture the screen as a PNG; can target a window, which is raised to the front and cropped to just that window
- `screenshot` — capture the screen as a bounded PNG or JPEG image; can target a window, which is raised to the front and cropped to just that window

Screenshot payloads are size-bounded by default before they are returned to the MCP host: max 1920 px width/height and 2 MiB image bytes, with hard caps even when callers request more. Agents that need more detail can pass `max_width`, `max_height`, `max_bytes`, `scale`, `format: "jpeg"`, or `quality`, preferably with a window target or crop. PNG remains the default; JPEG lets callers trade lossless pixels for a smaller payload before the byte cap forces further resizing. Returned screenshot metadata includes `coordinate_width`, `coordinate_height`, `scale`, `format`, and `quality` so callers can convert from a downscaled preview to desktop coordinate pixels.

**Input**
- `click` — by element index, semantic selector, or pixel coordinates
- `drag` — pixel-coordinate drag (start / end)
- `click` — by element index, semantic selector, or desktop coordinate pixels
- `drag` — desktop coordinate drag (start / end)
- `scroll` — page-based scroll on an element or at a pixel location
- `press_key` — keys / chords; can focus a window or terminal first
- `type_text` — literal text input, optionally targeted at a window or terminal
Expand Down
13 changes: 12 additions & 1 deletion src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ async fn main() -> Result<()> {
.nth(3)
.and_then(|s| s.parse().ok())
.unwrap_or(0);
let cap = screenshot::capture_screenshot().await?;
let cap = screenshot::capture_screenshot_raw().await?;
eprintln!("desktop logical size: {}x{}", cap.width, cap.height);
let mut p = abs_pointer::AbsPointer::create(cap.width as i32, cap.height as i32)?;
p.click(x, y, abs_pointer::PointerButton::Left, 1)?;
Expand All @@ -87,6 +87,17 @@ async fn main() -> Result<()> {
serde_json::to_string_pretty(&serde_json::json!({
"mime_type": capture.mime_type,
"source": capture.source,
"width": capture.width,
"height": capture.height,
"coordinate_width": capture.coordinate_width,
"coordinate_height": capture.coordinate_height,
"scale": capture.scale,
"resized": capture.resized,
"bytes": capture.bytes,
"original_bytes": capture.original_bytes,
"max_bytes": capture.max_bytes,
"format": capture.format,
"quality": capture.quality,
"data_url_length": capture.data_url.len()
}))
.context("failed to serialize screenshot report")?
Expand Down
Loading
Loading