AI-Assisted YOLO Annotation Tool
Local GUI annotation platform with integrated llama.cpp vision model for intelligent pre-labeling, quality audit, and iterative correction learning.
- Quick Start
- Installation
- User Guide
- Keyboard Shortcuts
- Project Structure
- Data Format
- Backup & Recovery
- Troubleshooting
- FAQ
git clone https://github.com/a740022938/OpenAxiom.git
cd OpenAxiom
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
python launch.py| Component | Minimum | Recommended |
|---|---|---|
| Windows | 10 (64-bit) | 11 (64-bit) |
| Python | 3.8 | 3.11 |
| RAM | 8 GB | 16+ GB |
| GPU (for AI) | None | NVIDIA 8GB+ VRAM |
1. Install Python 3.11
Download from python.org. Check "Add Python to PATH" during installation.
Verify:
python --version
# Python 3.11.92. Clone the repository
git clone https://github.com/a740022938/OpenAxiom.git
cd OpenAxiom3. Create virtual environment
python -m venv .venv4. Activate and install
.\.venv\Scripts\activate
pip install -r requirements.txtThis installs:
PySide6(6.11.0) — Qt GUI frameworkPyYAML(6.0.3) — YAML dataset config parserPillow(12.2.0) — Image processing
5. Launch
python launch.pyAI features require a local llama.cpp server with a vision-capable model.
Recommended setup (Qwen 35B):
llama-server.exe `
-m "models/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf" `
--mmproj "models/mmproj-BF16.gguf" `
-ngl 28 `
--flash-attn on `
--jinja `
-c 16384 `
--host 127.0.0.1 `
--port 8080| Flag | Value | Purpose |
|---|---|---|
-m |
path to .gguf | Main model file |
--mmproj |
path to .gguf | Vision encoder (multimodal projection) |
-ngl |
28 | GPU layers (tune for your VRAM) |
-c |
16384 | Context window size |
--flash-attn |
on | Reduce KV cache memory usage |
--host |
127.0.0.1 | Listen on localhost only |
Then in OpenAxiom: AI tab → Configure → URL: http://127.0.0.1:8080
- Click 打开工程 on the top toolbar
- Select a YOLO dataset directory (must contain
images/andlabels/subdirectories) - The tool auto-detects
data.yaml, class names, and model files
On next launch, the last project auto-loads from ~/.openaxiom/config.json.
| Action | How |
|---|---|
| Pan | Middle-mouse drag |
| Zoom | Mouse wheel |
| Fit to window | Right-click canvas → Adapt to Window |
| Add box | Click +框 button, then drag on canvas |
| Resize box | Drag any of the 8 handles on a selected box |
| Select box | Click a box on canvas or in the left table |
| Delete box | Select box, press Delete or click 删除 |
| Change class | Select box, click 改类, choose or type |
| Bulk change class | Ctrl/Shift-select rows in left table → right-click → Bulk Change |
On a box:
✅ Confirm box → Mark as reviewed
✏️ Change class → Edit class name
🗑 Delete box → Remove box
🤖 AI identify → Ask AI what's in the box
📋 Copy YOLO coords → "class_id cx cy w h"
📋 Copy COCO coords → "[x, y, w, h]" in pixels
📍 Locate in properties → Jump to property panel
On empty canvas:
➕ New box (drag) → Enter draw mode
🤖 AI pre-annotate → AI detects all objects
🤖 AI smart hint → AI adds missing boxes only
🔍 Fit / reset view
💾 Save annotations
Configuration:
- Start llama.cpp server
- OpenAxiom → AI tab → 配置 AI button
- Set URL:
http://127.0.0.1:8080, leave API key empty - Select provider:
llama.cpp
Chat: Type in the input box at the bottom of the AI panel. Images are automatically included with your messages.
Box Identification: Right-click any box → "🤖 AI 识别此框". AI tells you what the object is. Click ✓ to confirm or ✏ to correct.
Correction Learning: When AI gets it wrong:
- Type "这是XX" in chat → auto-corrects the last identified box
- Or click ✏ and type the correct name
- Corrections persist across sessions and improve future responses
Skills: Click 技能 → Manage custom AI tools. Each skill has:
- Name: shorthand identifier
- Description: when AI should use it
- Prompt: detailed instructions
7 pre-built skills included:
| Skill | Purpose |
|---|---|
| YOLO Annotation Expert | Coordinate format guidance |
| Quality Inspector | Annotation error detection |
| Format Converter | YOLO↔COCO↔VOC conversion |
| Category Advisor | Class naming and hierarchy |
| Python Scripter | Batch processing scripts |
| Image Analyst | Scene analysis and annotation strategy |
| Annotation Standards | Edge case handling rules |
Memory: The 记 toggle enables cross-session memory. AI remembers past conversations when you restart.
Batch Pre-annotation: Click 批量预标 → choose start index and count → AI processes each image sequentially.
- Original labels are backed up automatically
- Progress shown in AI chat area
- Results saved directly to label files
Batch Save: Click ▾ 更多 → 分批 & 多批 → generate batch plan → safe save per batch.
- Max 20 batches × 20 images
- Per-batch audit trail
- Automatic backup before each batch
Click 导出 on the top toolbar.
| Format | Output |
|---|---|
| COCO JSON | Single annotations_coco.json with images/annotations/categories |
| VOC XML | One .xml per image in Annotations/ directory |
| Key | Action |
|---|---|
A |
Previous image |
D |
Next image |
Delete |
Delete selected box |
Enter |
Confirm current box |
Escape |
Cancel selection / exit draw mode |
Ctrl+Z |
Undo |
Ctrl+Y |
Redo |
| Middle-click drag | Pan canvas |
| Scroll wheel | Zoom in/out |
openaxiom/
├── launch.py # Entry point
├── main.py # Alternate entry
├── requirements.txt # pip install -r
├── VERSION # v1.2.0
├── .gitignore
├── README.md # This file
├── ui/
│ ├── __init__.py # Version constants
│ ├── main_window.py # Full GUI (4400+ lines)
│ └── settings_window.py # Settings dialog
├── core/
│ ├── context.py # Box, DatasetInfo, WorkbenchContext
│ ├── dataset_manager.py # YOLO dataset detection
│ ├── image_manager.py # Image file loading
│ ├── label_manager.py # Label file I/O
│ ├── export.py # COCO + VOC export
│ ├── logger.py # File logging system
│ └── ... # yaml_parser, project_inspector, etc.
├── adapter/
│ ├── aip_client.py # llama.cpp / OpenAI API client
│ ├── aip_readonly.py # Project scan utilities
│ └── aip_readonly_cli.py # CLI interface
├── scripts/
│ └── backup_openaxiom_source_only.ps1
└── docs/
├── TROUBLESHOOTING.md
└── RESTORE_AFTER_REINSTALL.md
YOUR_DATASET/
├── data.yaml # YOLO config (optional but recommended)
├── images/
│ └── train/
│ ├── img001.jpg
│ └── img002.jpg
└── labels/
└── train/
├── img001.txt # One .txt per image
└── img002.txt
class_id cx cy w h
One line per box. All values are normalized (0-1).
class_id: integer, 0-indexedcx, cy: center coordinatesw, h: width, height
Example:
0 0.503125 0.431250 0.090000 0.040000
1 0.750000 0.550000 0.060000 0.080000
| Item | Path | Size |
|---|---|---|
| Source code | GitHub repo | ~200 KB |
| AI memory & skills | %USERPROFILE%\.openaxiom\ |
~1 MB |
| llama.cpp model | E:\llama-*\models\ |
~30 GB |
# 1. Install Python 3.11
# 2. Clone
git clone https://github.com/a740022938/OpenAxiom.git
cd OpenAxiom
# 3. Setup
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
# 4. Restore AI data
# Copy your backed-up .openaxiom\ folder to %USERPROFILE%\
# 5. Restore llama.cpp + model to E:\
# 6. Run
python launch.py"OpenAxiom failed to start"
- Check
%USERPROFILE%\.openaxiom\app.log - Verify Python ≥ 3.8
- Reinstall:
pip install -r requirements.txt --force
"AI returns empty"
- Ensure llama.cpp server is running
- Check
http://127.0.0.1:8080/healthin browser - Reduce
-nglif VRAM insufficient
"Canvas freezes during AI chat"
- This is fixed in v1.2.0 (async chat)
- If still happening, the model may be overloaded; reduce
--parallel
"Box resize handles not appearing"
- A box must be selected (click it first)
- You must be in annotation mode (not browse mode)
Q: Can I use it without a GPU? Yes. The core annotation tool works on CPU only. AI features require llama.cpp server.
Q: Does it work on Linux/Mac?
Not tested officially. The code uses pathlib and PySide6, which are cross-platform. The llama.cpp integration uses Windows paths by default.
Q: How do I add more classes? When creating or editing a box, type a new class name directly in the dialog instead of selecting from the list.
Q: Where is my data stored?
All configuration, AI memory, skills, and backups are in %USERPROFILE%\.openaxiom\. Annotation data stays in your dataset directory.
MIT License. See LICENSE file (if present) or the repository for details.