Skip to content

Support cloud-storage paths for Pascal VOC semantic segmentation imports#872

Merged
LeonardoRosaa merged 12 commits intomainfrom
leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation
Apr 4, 2026
Merged

Support cloud-storage paths for Pascal VOC semantic segmentation imports#872
LeonardoRosaa merged 12 commits intomainfrom
leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation

Conversation

@LeonardoRosaa
Copy link
Copy Markdown
Contributor

@LeonardoRosaa LeonardoRosaa commented Apr 1, 2026

What has changed and why?

This PR enables Pascal VOC semantic segmentation imports from cloud/object storage URIs.

How has it been tested?

Unit tests

Did you update CHANGELOG.md?

  • Yes
  • Not needed (internal change)

Summary by CodeRabbit

  • New Features

    • Cloud storage paths preserved for Pascal VOC semantic segmentation imports
    • Added example demonstrating S3-hosted semantic segmentation dataset usage
  • Tests

    • Added tests validating remote (in-memory/S3-like) dataset import, path preservation, and segmentation handling
  • Chores

    • Updated labelformat dependency to a pinned Git reference

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds cloud-storage (remote URI) support for Pascal VOC semantic segmentation imports by preserving remote paths, introduces a path-normalization helper, pins labelformat to a Git commit, adds an S3 example, and extends tests to cover remote-path preservation and annotations.

Changes

Cohort / File(s) Summary
Core Functionality
lightly_studio/src/lightly_studio/core/image/image_dataset.py
Added private _normalize_input_path() using fsspec + LocalFileSystem to preserve remote URIs and convert local inputs to absolute Path; updated add_samples_from_pascal_voc_segmentations() to use it.
Examples
lightly_studio/src/lightly_studio/examples/example_semantic_segmentation_s3.py
New example demonstrating creating a semantic-segmentation dataset from S3 Pascal VOC–style data; reads class mapping via fsspec.open and calls add_samples_from_pascal_voc_segmentations().
Tests
lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py
Refactored tests to parameterize local vs memory:// paths, write masks via Image.fromarray(...), and assert sample.file_path_abs preserves remote URIs along with existing embedding and annotation checks.
Changelog & Dependencies
CHANGELOG.md, lightly_studio/pyproject.toml
Changelog: added entry for cloud-storage Pascal VOC import support. pyproject.toml: changed labelformat dependency to a pinned Git commit reference.

Sequence Diagram(s)

sequenceDiagram
  participant ExampleScript as Example Script
  participant FSSpec as fsspec (S3 / memory)
  participant ImageDataset as ImageDataset.add_samples_from_pascal_voc_segmentations
  participant DB as DB / Storage

  ExampleScript->>FSSpec: resolve images_path, masks_path, class mapping
  ExampleScript->>ImageDataset: call add_samples_from_pascal_voc_segmentations(...)
  ImageDataset->>FSSpec: list/read image & mask files (preserve remote URIs)
  ImageDataset->>DB: create samples with file_path_abs and semantic annotations
  ImageDataset->>DB: generate and store embeddings
  ExampleScript->>DB: start GUI (reads samples/annotations)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Bump labelformat #866: Edits the same labelformat dependency entry in pyproject.toml (alternate approaches to pinning/version source).

Suggested reviewers

  • JonasWurst
  • MalteEbner
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately describes the main objective of the PR: adding support for cloud-storage paths in Pascal VOC semantic segmentation imports.
Description check ✅ Passed The PR description follows the template structure with all required sections completed: summary of changes, testing approach, and CHANGELOG update confirmation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@LeonardoRosaa LeonardoRosaa marked this pull request as ready for review April 1, 2026 13:25
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7693cb8d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated
Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lightly_studio/pyproject.toml`:
- Line 36: The labelformat dependency in pyproject.toml is pinned to a short
commit hash ("labelformat @
git+https://github.com/lightly-ai/labelformat.git@f9cf7d5"); replace the short
SHA with the full 40-character commit SHA
(f9cf7d51dd2b058d27d91d733a87475a404c1f18) so the dependency line uses the
complete hash to ensure deterministic installs and avoid collisions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6881afc8-6eb5-46e1-891f-363624d5dae5

📥 Commits

Reviewing files that changed from the base of the PR and between 2369350 and fdcf313.

⛔ Files ignored due to path filters (1)
  • lightly_studio/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • CHANGELOG.md
  • lightly_studio/pyproject.toml
  • lightly_studio/src/lightly_studio/core/image/image_dataset.py
  • lightly_studio/src/lightly_studio/examples/example_semantic_segmentation_s3.py
  • lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py

Comment thread lightly_studio/pyproject.toml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py (1)

200-237: Consider extracting shared annotation assertions into a helper.

The annotation checks are duplicated across tests; a small helper would reduce maintenance overhead and make future schema changes easier to update.

♻️ Optional refactor sketch
+def _assert_sample_annotations(sample, expected_file_path: str | None = None) -> None:
+    if expected_file_path is not None:
+        assert sample.file_path_abs == expected_file_path
+    annotations = sorted(sample.annotations, key=lambda ann: ann.label)
+    # shared assertions for bg/cat/dog ...
+
 # in both tests:
-annotations = sorted(samples[0].annotations, key=lambda ann: ann.label)
-... many repeated asserts ...
+_assert_sample_annotations(samples[0], expected_file_path=f"{images_path}/image1.jpg")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py` around
lines 200 - 237, Extract the repeated assertion blocks into a small helper
function (e.g., assert_segmentation_annotation) that accepts the annotation
object (or its fields: label, x, y, width, height, segmentation_mask) and
performs the isinstance check and all equality assertions; then replace the
duplicated blocks in this test with calls like
assert_segmentation_annotation(annotations[0], label="bg", x=0, y=0, width=4,
height=3, segmentation_mask=[...]) and similarly for other annotations and
samples[1].annotations[0]; ensure the helper references
SemanticSegmentationAnnotation and uses the same assertion semantics so behavior
is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py`:
- Around line 200-237: Extract the repeated assertion blocks into a small helper
function (e.g., assert_segmentation_annotation) that accepts the annotation
object (or its fields: label, x, y, width, height, segmentation_mask) and
performs the isinstance check and all equality assertions; then replace the
duplicated blocks in this test with calls like
assert_segmentation_annotation(annotations[0], label="bg", x=0, y=0, width=4,
height=3, segmentation_mask=[...]) and similarly for other annotations and
samples[1].annotations[0]; ensure the helper references
SemanticSegmentationAnnotation and uses the same assertion semantics so behavior
is unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0fc75f37-296b-412b-9c9c-b098d0129243

📥 Commits

Reviewing files that changed from the base of the PR and between fdcf313 and 6ad3d3d.

⛔ Files ignored due to path filters (1)
  • lightly_studio/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • lightly_studio/pyproject.toml
  • lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • lightly_studio/pyproject.toml

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py (1)

200-237: Consider extracting shared annotation assertions into a helper.

The annotation assertions (lines 200-237) duplicate those in the first test (lines 67-108). While test duplication is acceptable for readability, extracting a helper could reduce maintenance burden if assertion logic changes.

def _assert_expected_annotations(samples: list) -> None:
    """Verify semantic segmentation annotations for standard test masks."""
    annotations = sorted(samples[0].annotations, key=lambda ann: ann.label)
    # ... shared assertion logic

This is optional since test independence is also valuable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py` around
lines 200 - 237, Extract the repeated annotation assertion logic into a single
helper function (e.g., _assert_expected_annotations) that takes the samples list
and performs the shared checks currently duplicated in the block that inspects
samples[0].annotations and samples[1].annotations; move the
sorted(samples[0].annotations, key=lambda ann: ann.label) logic and all
assertions about SemanticSegmentationAnnotation, .label, .x, .y, .width,
.height, and .segmentation_mask into that helper and call it from both test
locations to eliminate duplication while keeping test behavior identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py`:
- Around line 200-237: Extract the repeated annotation assertion logic into a
single helper function (e.g., _assert_expected_annotations) that takes the
samples list and performs the shared checks currently duplicated in the block
that inspects samples[0].annotations and samples[1].annotations; move the
sorted(samples[0].annotations, key=lambda ann: ann.label) logic and all
assertions about SemanticSegmentationAnnotation, .label, .x, .y, .width,
.height, and .segmentation_mask into that helper and call it from both test
locations to eliminate duplication while keeping test behavior identical.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 569af699-563a-48fe-993b-dcdedc6b52fd

📥 Commits

Reviewing files that changed from the base of the PR and between 6ad3d3d and 24f1cec.

📒 Files selected for processing (1)
  • lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py

@LeonardoRosaa
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@MalteEbner MalteEbner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code mostly looks good to me, but some details should be improved.
Will you update the docs in a follow-up PR?

Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated
Comment thread CHANGELOG.md Outdated
Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated
LeonardoRosaa and others added 3 commits April 2, 2026 11:40
Co-authored-by: MalteEbner <malte.ebner@gmail.com>
…gmentation' of https://github.com/lightly-ai/lightly-studio into leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation
Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated
@LeonardoRosaa LeonardoRosaa requested a review from MalteEbner April 2, 2026 12:48
@LeonardoRosaa LeonardoRosaa merged commit ec24a74 into main Apr 4, 2026
14 checks passed
@LeonardoRosaa LeonardoRosaa deleted the leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation branch April 4, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants