Skip to content

prd: Annotation geometry filter and bulk correction in Data Manager #9648

@Elvis-codeur

Description

@Elvis-codeur

Is your feature request related to a problem? Please describe.

When working on large segmentation annotation projects with model-generated pre-annotations (2,000+ tasks), I consistently encounter the same recurring geometry errors that cannot be addressed through the existing Data Manager:

  • Nested same-class boxes: a box of class X fully contains another box of class X (>85% overlap) — 8,982 instances found across 2,379 tasks in one project
  • Extreme aspect ratio slivers: boxes with width/height ratio > 20 or < 0.05 — 7,375 instances
  • Tiny noise detections: box area < 0.3% of total image area — 11,661 instances
  • Cross-class containment: a region of class A fully contains a region of class B, violating annotation rules — 1,893 instances

In total, 99% of the 2,379 pre-annotated tasks had at least one geometric quality issue. There is currently no way to filter tasks by these patterns in the Data Manager. My only option is to inject custom marker fields into each task via script so that metadata filters can act as a proxy, then open each task manually to apply the same correction repeatedly.


Describe the solution you'd like

Two complementary capabilities in the Data Manager:

1. Annotation geometry filters — filter tasks based on predicates applied to the geometric properties of their annotation regions, composable with existing metadata filters using AND/OR logic:

Predicate Example use
any region: aspect_ratio > N Find sliver artifacts
any region: area_pct < N Find sub-pixel noise boxes
any region A contains region B, label(A) == label(B) Find same-class nesting
any region A contains region B, label(A) != label(B) Find cross-class containment
region_count > N Find over-segmented tasks
any two regions: iou > N, same label Find near-duplicate boxes

All predicates should combine with existing task data filters (e.g. filter tasks where split == "train" AND any region: aspect_ratio > 20).

2. Bulk geometry correction — apply a correction function to all tasks selected after filtering:

  • Delete all regions matching a predicate
  • On same-class containment: keep outer or inner box (configurable)
  • Functions implemented as predefined JavaScript utilities (aspect_ratio(region), area_pct(region), iou(r1, r2), contains(r1, r2)) so users can compose custom rules

Describe alternatives you've considered

The current workaround is a scripted pipeline outside Label Studio:

  1. Export annotations via GET /api/projects/{id}/export
  2. Compute geometric predicates in Python
  3. Inject marker fields into task data via PATCH /api/tasks/{id}/ to enable metadata-based filtering
  4. Correct annotations via PATCH /api/annotations/{id}/ with a replacement result array

This works but has significant drawbacks: it requires re-implementing geometry primitives that the annotation editor already computes internally, the full-replace PATCH semantics are error-prone (easy to accidentally drop valid regions), and it provides no interactive review step — corrections are applied blindly.

I also considered writing a standalone QA script as a pre-processing step, but this does not let reviewers inspect and approve corrections interactively before they are committed.


Additional context

This is most valuable for:

  • Pre-annotation QA: model-generated annotations have systematic geometric artifacts (nesting, slivers) that must be batch-cleaned before human review
  • Semi-supervised / pseudo-label workflows: cleaning model predictions before using them as training labels — in my case, applying 7 geometry rules to 2,238 pseudo-labeled tasks improved downstream model mAP50 from 0.874 to 0.955 (+8.1 pp)
  • Post-taxonomy-change re-annotation: finding all regions that violate updated annotation rules after a class schema change

I have a working Python proof-of-concept implementing the geometric predicates with measured correction counts per pattern. Happy to share it as a reference implementation for the predicate functions, and to contribute the backend filter logic and frontend filter panel as a PR if the feature is accepted. https://github.com/Elvis-codeur/label-studio/tree/poc/annotation-geometry-filter/label_studio/poc/annotation_geometry_filter

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions