Is your feature request related to a problem? Please describe.
When working on large segmentation annotation projects with model-generated pre-annotations (2,000+ tasks), I consistently encounter the same recurring geometry errors that cannot be addressed through the existing Data Manager:
- Nested same-class boxes: a box of class X fully contains another box of class X (>85% overlap) — 8,982 instances found across 2,379 tasks in one project
- Extreme aspect ratio slivers: boxes with width/height ratio > 20 or < 0.05 — 7,375 instances
- Tiny noise detections: box area < 0.3% of total image area — 11,661 instances
- Cross-class containment: a region of class A fully contains a region of class B, violating annotation rules — 1,893 instances
In total, 99% of the 2,379 pre-annotated tasks had at least one geometric quality issue. There is currently no way to filter tasks by these patterns in the Data Manager. My only option is to inject custom marker fields into each task via script so that metadata filters can act as a proxy, then open each task manually to apply the same correction repeatedly.
Describe the solution you'd like
Two complementary capabilities in the Data Manager:
1. Annotation geometry filters — filter tasks based on predicates applied to the geometric properties of their annotation regions, composable with existing metadata filters using AND/OR logic:
| Predicate |
Example use |
any region: aspect_ratio > N |
Find sliver artifacts |
any region: area_pct < N |
Find sub-pixel noise boxes |
any region A contains region B, label(A) == label(B) |
Find same-class nesting |
any region A contains region B, label(A) != label(B) |
Find cross-class containment |
region_count > N |
Find over-segmented tasks |
any two regions: iou > N, same label |
Find near-duplicate boxes |
All predicates should combine with existing task data filters (e.g. filter tasks where split == "train" AND any region: aspect_ratio > 20).
2. Bulk geometry correction — apply a correction function to all tasks selected after filtering:
- Delete all regions matching a predicate
- On same-class containment: keep outer or inner box (configurable)
- Functions implemented as predefined JavaScript utilities (
aspect_ratio(region), area_pct(region), iou(r1, r2), contains(r1, r2)) so users can compose custom rules
Describe alternatives you've considered
The current workaround is a scripted pipeline outside Label Studio:
- Export annotations via
GET /api/projects/{id}/export
- Compute geometric predicates in Python
- Inject marker fields into task data via
PATCH /api/tasks/{id}/ to enable metadata-based filtering
- Correct annotations via
PATCH /api/annotations/{id}/ with a replacement result array
This works but has significant drawbacks: it requires re-implementing geometry primitives that the annotation editor already computes internally, the full-replace PATCH semantics are error-prone (easy to accidentally drop valid regions), and it provides no interactive review step — corrections are applied blindly.
I also considered writing a standalone QA script as a pre-processing step, but this does not let reviewers inspect and approve corrections interactively before they are committed.
Additional context
This is most valuable for:
- Pre-annotation QA: model-generated annotations have systematic geometric artifacts (nesting, slivers) that must be batch-cleaned before human review
- Semi-supervised / pseudo-label workflows: cleaning model predictions before using them as training labels — in my case, applying 7 geometry rules to 2,238 pseudo-labeled tasks improved downstream model mAP50 from 0.874 to 0.955 (+8.1 pp)
- Post-taxonomy-change re-annotation: finding all regions that violate updated annotation rules after a class schema change
I have a working Python proof-of-concept implementing the geometric predicates with measured correction counts per pattern. Happy to share it as a reference implementation for the predicate functions, and to contribute the backend filter logic and frontend filter panel as a PR if the feature is accepted. https://github.com/Elvis-codeur/label-studio/tree/poc/annotation-geometry-filter/label_studio/poc/annotation_geometry_filter
Is your feature request related to a problem? Please describe.
When working on large segmentation annotation projects with model-generated pre-annotations (2,000+ tasks), I consistently encounter the same recurring geometry errors that cannot be addressed through the existing Data Manager:
In total, 99% of the 2,379 pre-annotated tasks had at least one geometric quality issue. There is currently no way to filter tasks by these patterns in the Data Manager. My only option is to inject custom marker fields into each task via script so that metadata filters can act as a proxy, then open each task manually to apply the same correction repeatedly.
Describe the solution you'd like
Two complementary capabilities in the Data Manager:
1. Annotation geometry filters — filter tasks based on predicates applied to the geometric properties of their annotation regions, composable with existing metadata filters using AND/OR logic:
any region: aspect_ratio > Nany region: area_pct < Nany region A contains region B, label(A) == label(B)any region A contains region B, label(A) != label(B)region_count > Nany two regions: iou > N, same labelAll predicates should combine with existing task data filters (e.g. filter tasks where
split == "train"ANDany region: aspect_ratio > 20).2. Bulk geometry correction — apply a correction function to all tasks selected after filtering:
aspect_ratio(region),area_pct(region),iou(r1, r2),contains(r1, r2)) so users can compose custom rulesDescribe alternatives you've considered
The current workaround is a scripted pipeline outside Label Studio:
GET /api/projects/{id}/exportPATCH /api/tasks/{id}/to enable metadata-based filteringPATCH /api/annotations/{id}/with a replacementresultarrayThis works but has significant drawbacks: it requires re-implementing geometry primitives that the annotation editor already computes internally, the full-replace PATCH semantics are error-prone (easy to accidentally drop valid regions), and it provides no interactive review step — corrections are applied blindly.
I also considered writing a standalone QA script as a pre-processing step, but this does not let reviewers inspect and approve corrections interactively before they are committed.
Additional context
This is most valuable for:
I have a working Python proof-of-concept implementing the geometric predicates with measured correction counts per pattern. Happy to share it as a reference implementation for the predicate functions, and to contribute the backend filter logic and frontend filter panel as a PR if the feature is accepted. https://github.com/Elvis-codeur/label-studio/tree/poc/annotation-geometry-filter/label_studio/poc/annotation_geometry_filter