This is a fork of ocap by the open-world-agents team that is set up to work on a VM, includes microphone capture, and is configured for automated graceful shutdown on Windows. Basically, it writes its process ID in a file, and then another python script can read it and send a signal interrupt, which is necessary to automate the shutdown of ocap.
High-performance desktop recorder for Windows. Captures screen, audio, keyboard, mouse, and window events.
This project was first introduced and developed for the D2E project. For more details, see D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI If you find this work useful, please cite our paper.
ocap (Omnimodal CAPture) captures all essential desktop signals in synchronized format. Records screen video, audio, keyboard/mouse input, and window events. Built for the open-world-agents project but works for any desktop recording needs.
TL;DR: Complete, high-performance desktop recording tool for Windows. Captures everything in one command.
demo.mp4
Citing the original work:
@article{choi2025d2e,
title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI},
author={Choi, Suwhan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung},
journal={arXiv preprint arXiv:2510.05684},
year={2025}
}