Skip to content

belarusian/computer-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

computer-lab

Local HTTP microservices for OCR and open-vocabulary UI detection.

computer-lab exposes EasyOCR and Grounding DINO through small Python servers, so automation systems can call vision models over HTTP instead of embedding those dependencies directly.

Services

Service Default port Endpoint Returns
OCR 9003 POST /ocr Text regions with text, box, center, confidence
Detection 9004 POST /detect Grounded boxes with label, box, center, confidence

Useful for:

  • screenshot-to-text pipelines
  • UI and desktop automation
  • agent systems that need OCR or visual grounding behind a stable local API

Quick Start

Python 3.10+ is required. A GPU is recommended for usable detection latency; OCR can run on CPU.

git clone https://github.com/belarusian/computer-lab.git
cd computer-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip wheel
pip install -e ".[gpu]"

For a CPU-oriented install:

pip install -e ".[cpu]"

Run the services in separate terminals:

python servers/ocr_server.py 9003
python servers/detect_server.py 9004

Health checks:

curl -s http://127.0.0.1:9003/health
curl -s http://127.0.0.1:9004/health

Smoke Tests

The repo includes manual integration scripts rather than a full automated test suite.

export OCR_URL=http://127.0.0.1:9003/ocr
python test_ocr_locate.py
export DETECT_URL=http://127.0.0.1:9004/detect
python test_detect_locate.py "close button"

Both scripts capture the current screen with pyautogui, so they require a local desktop session.

API

  • POST /ocr Request body: raw PNG bytes. Response: JSON list of {text, box, center, confidence}.
  • POST /detect Request body: JSON { "image": "<base64 PNG>", "query": "..." }. If query is omitted, the server runs a generic open detection pass.
  • POST /detect_raw Request body: raw PNG bytes. Optional header: X-Query.
  • GET /health Returns a simple JSON health response from each server.

Runtime Notes

  • EASYOCR_GPU=0 forces EasyOCR to stay on CPU.
  • The detection server uses CUDA automatically when available and falls back to CPU otherwise.

License

MIT. See LICENSE.

About

Small local OCR and visual-grounding servers for screen automation and agent workflows.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages