Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

Zhiyuan Jiang^* · Shenghao Xie^* · Wenyi Li · Wenqiang Zu · Peihang Li · Jiahao Qiu · Siqi Pei · Lei Ma^† · Tiejun Huang · Mengdi Wang^† · Shilong Liu^†

* Equal contribution · † Corresponding authors

This repo provides the official implementaion for ZoomClick and GUIZoom-Bench.

Highlights

ZoomClick：Exploring zooming to dig out more grounding priors of both generalist VLMs and specialized GUI grounding models in a training-free, principled, effective way.
GUIZoom-Bench：Evaluating models’ zoom capability with explainable standards, supporting future research on zoom-based training and test-time scaling.
Strong Performance：With ZoomClick, UI-Venus-72B achieves a 73.1% success rate on ScreenSpot-Pro, establishing a new state-of-the-art performance.

Repository Structure

grounding/: Evaluation scripts for ZoomClick
- eval_sspro_zoomclick.py: Main script to evaluate ZoomClick on ScreenSpot-Pro.
- models/: Backbone wrappers and ZoomClick variants (Qwen3-VL, UI-Venus).
GUIZoom-Bench/: Scripts for building and evaluating GUIZoom-Bench
- build_guizoom.py: Re-organize ScreenSpot-Pro dataset into GUIZoom-Bench.
- collect_guizoom_accuracy.py: Compute accuracy and related metrics on GUIZoom-Bench based on grounding results on Screenspot-Pro.
results/sspro: Example JSON results used to reproduce tables and figures
- zoomclick_*_clip.json files are the results provided by default settings in run_zoomclick_*.slurm.
- venus_7b_depth_(1-4).json are results used to build GUIZoom-Bench.
scripts/: Utility and cluster (Slurm) scripts
- run_zoomclick_*.slurm: Example Slurm jobs for running ZoomClick evaluations on Screenspot-Pro.
- run_collect_guizoom.slurm: Slurm script used for GUIZoom-Bench result re-organization.
- run_build_guizoom.slurm: Slurm script used for building GUIZoom-Bench dataset.

Installation

Environment Setup

# Clone the repository
git clone https://github.com/Princeton-AI2-Lab/ZoomClick.git
cd ZoomClick

# (Recommended) Create a conda environment
conda create -n zoomclick python=3.10 -y
conda activate zoomclick

# Install dependencies: We are actively working on releasing a general, easy-to-use requirements file for this project.
pip install -r requirements.txt

Data Preparation

Screenspot-Pro
- Download Screenspot-Pro from its official repository or dataset release: https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding.
- Recommended directory layout:
```
/path/to/dataset/Screenspot-Pro/
  images/
  annotations/
```
  Set this path via --data-root (or equivalent) in grounding/eval_sspro_*.py or through command-line arguments.
GUIZoom-Bench
- GUIZoom-Bench is built from reorganization of Screenspot-Pro dataset.
```
python GUIZoom-Bench/build_guizoom.py \
   --src_dataset_root /path/to/dataset/Screenspot-Pro \
   --depth1 /path/to/depth1.json \
   --depth2 /path/to/depth2.json \
   --depth3 /path/to/depth3.json \
   --depth4 /path/to/depth4.json \
   --out_dir /path/to/dataset/GUIZoom-Bench
```
  - args --depthx: 'results/sspro/venus_7b_depth_x'
  This will create GUIZoom-Bench splits, annotations, images, and statistics under /path/to/dataset/GUIZoom-Bench. We recommend to store Screenspot-Pro and GUIZoom-Bench in the same dataset directory for convenient use.

Evaluation

We recommend using at least one A100 GPU for models up to 8B, and at least four A100 GPUs for models 32B and above.

Eval on Screenspot-Pro：
- On a cluster: Modify Basic paths in scripts/run_zoomclick_uivenus.slurm and scripts/run_zoomclick_qwen3.slurm according to your data structure and submit the slurm script.
- Otherwise:
  - Activate conda environment: conda activate path/to/your/conda/envs/zoomclick
  - Run evaluation according to your own setting:
```
python grounding/eval_sspro_zoomclick.py \
   --backend uivenus \
   --model_type ui_venus_ground_7b \
   --model_name_or_path "${MODEL_DIR}" \
   --screenspot_imgs "${DATA_DIR}/images" \
   --screenspot_test "${DATA_DIR}/annotations" \
   --task "all" \
   --inst_style "instruction" \
   --language "en" \
   --gt_type "positive" \
   --log_path "${LOG_DIR}/zoomclick_venus_7b_clip.json" \
   --in_depth 3 \
   --in_ratio 0.5 \
   --in_min_crop 768 \
   --patch_size 2 \
   --center_mode "clip" \
   --prezoom_px_thresh 50
```
    - --in_depth: The number of iterative zoom-in steps applied during evaluation.
    - --in_ratio: The shrink ratio for each zoom-in step.
    - --in_min_crop: The minimum crop size to retain sufficient visual context during zooming.
    - --patch_size: The grid resolution used when estimating the zoom center in Pre-Zoom. By default, patch_size=2 represents a 2*2 grid.
    - --center_mode: Mode of boundary handling. Choose from shift, 'clip', 'shrink'.
    - --prezoom_px_thresh: Threshold of pixel distance used in Pre-zoom. Refer to our paper for more details.
Eval on GUIZoom-Bench: There are two ways to evaluate your model on GUIZoom-Bench:
- Re-organize Screenspot-Pro data:
  - Build GUIZoom-Bench following commands in Data Preparation.
  - Directly follow the same commands as in Eval on ScreenSpot-Pro, but set DATA_DIR=${SCRATCH}/datasets/GUIZoom-Bench instead of DATA_DIR=${SCRATCH}/datasets/ScreenSpot-Pro.
- Re-organize Screenspot-Pro results (Recommended):
  - Because our benchmark is built from Screenspot-Pro, evaluation results on Screenspot-Pro can also be re-organized into GUIZoom-Bench results without additional computation. This design removes the need to store a duplicated dataset for benchmarking, effectively reducing storage usage and avoiding waste.
  - Simply adjust arguments in scripts/run_collect_guizoom.slurm:
    - --results: path to the JSON result to be re-orgainzed
    - --dataset: path to GUIZoom-Bench
    - --output: path to the re-orgainzed JSON result

Citation

If you find our work helpful, please leave us a star and cite our paper：

@misc{jiang2025zoominclickout,
      title={Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding}, 
      author={Zhiyuan Jiang and Shenghao Xie and Wenyi Li and Wenqiang Zu and Peihang Li and Jiahao Qiu and Siqi Pei and Lei Ma and Tiejun Huang and Mengdi Wang and Shilong Liu},
      year={2025},
      eprint={2512.05941},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.05941}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

Highlights

Repository Structure

Installation

Evaluation

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
GUIZoom-Bench		GUIZoom-Bench
grounding		grounding
results/sspro		results/sspro
scripts		scripts
README.md		README.md

Princeton-AI2-Lab/ZoomClick

Folders and files

Latest commit

History

Repository files navigation

Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

Highlights

Repository Structure

Installation

Evaluation

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages