prd: Annotation geometry filter and bulk correction in Data Manager

**Is your feature request related to a problem? Please describe.**

When working on large segmentation annotation projects with model-generated pre-annotations (2,000+ tasks), I consistently encounter the same recurring geometry errors that cannot be addressed through the existing Data Manager:

- **Nested same-class boxes:** a box of class X fully contains another box of class X (>85% overlap) — 8,982 instances found across 2,379 tasks in one project
- **Extreme aspect ratio slivers:** boxes with width/height ratio > 20 or < 0.05 — 7,375 instances
- **Tiny noise detections:** box area < 0.3% of total image area — 11,661 instances
- **Cross-class containment:** a region of class A fully contains a region of class B, violating annotation rules — 1,893 instances

In total, **99% of the 2,379 pre-annotated tasks had at least one geometric quality issue.** There is currently no way to filter tasks by these patterns in the Data Manager. My only option is to inject custom marker fields into each task via script so that metadata filters can act as a proxy, then open each task manually to apply the same correction repeatedly.

---

**Describe the solution you'd like**

Two complementary capabilities in the Data Manager:

**1. Annotation geometry filters** — filter tasks based on predicates applied to the geometric properties of their annotation regions, composable with existing metadata filters using AND/OR logic:

| Predicate | Example use |
|---|---|
| `any region: aspect_ratio > N` | Find sliver artifacts |
| `any region: area_pct < N` | Find sub-pixel noise boxes |
| `any region A contains region B, label(A) == label(B)` | Find same-class nesting |
| `any region A contains region B, label(A) != label(B)` | Find cross-class containment |
| `region_count > N` | Find over-segmented tasks |
| `any two regions: iou > N, same label` | Find near-duplicate boxes |

All predicates should combine with existing task data filters (e.g. filter tasks where `split == "train"` AND `any region: aspect_ratio > 20`).

**2. Bulk geometry correction** — apply a correction function to all tasks selected after filtering:

- Delete all regions matching a predicate
- On same-class containment: keep outer or inner box (configurable)
- Functions implemented as predefined JavaScript utilities (`aspect_ratio(region)`, `area_pct(region)`, `iou(r1, r2)`, `contains(r1, r2)`) so users can compose custom rules

---

**Describe alternatives you've considered**

The current workaround is a scripted pipeline outside Label Studio:
1. Export annotations via `GET /api/projects/{id}/export`
2. Compute geometric predicates in Python
3. Inject marker fields into task data via `PATCH /api/tasks/{id}/` to enable metadata-based filtering
4. Correct annotations via `PATCH /api/annotations/{id}/` with a replacement `result` array

This works but has significant drawbacks: it requires re-implementing geometry primitives that the annotation editor already computes internally, the full-replace PATCH semantics are error-prone (easy to accidentally drop valid regions), and it provides no interactive review step — corrections are applied blindly.

I also considered writing a standalone QA script as a pre-processing step, but this does not let reviewers inspect and approve corrections interactively before they are committed.

---

**Additional context**

This is most valuable for:
- **Pre-annotation QA:** model-generated annotations have systematic geometric artifacts (nesting, slivers) that must be batch-cleaned before human review
- **Semi-supervised / pseudo-label workflows:** cleaning model predictions before using them as training labels — in my case, applying 7 geometry rules to 2,238 pseudo-labeled tasks improved downstream model mAP50 from 0.874 to 0.955 (+8.1 pp)
- **Post-taxonomy-change re-annotation:** finding all regions that violate updated annotation rules after a class schema change

I have a working Python proof-of-concept implementing the geometric predicates with measured correction counts per pattern. Happy to share it as a reference implementation for the predicate functions, and to contribute the backend filter logic and frontend filter panel as a PR if the feature is accepted. https://github.com/Elvis-codeur/label-studio/tree/poc/annotation-geometry-filter/label_studio/poc/annotation_geometry_filter


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prd: Annotation geometry filter and bulk correction in Data Manager #9648

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Predicate	Example use
`any region: aspect_ratio > N`	Find sliver artifacts
`any region: area_pct < N`	Find sub-pixel noise boxes
`any region A contains region B, label(A) == label(B)`	Find same-class nesting
`any region A contains region B, label(A) != label(B)`	Find cross-class containment
`region_count > N`	Find over-segmented tasks
`any two regions: iou > N, same label`	Find near-duplicate boxes

prd: Annotation geometry filter and bulk correction in Data Manager #9648

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions