Support cloud-storage paths for Pascal VOC semantic segmentation imports by LeonardoRosaa · Pull Request #872 · lightly-ai/lightly-studio

LeonardoRosaa · 2026-04-01T12:45:37Z

What has changed and why?

This PR enables Pascal VOC semantic segmentation imports from cloud/object storage URIs.

How has it been tested?

Unit tests

Did you update CHANGELOG.md?

Yes
Not needed (internal change)

Summary by CodeRabbit

New Features
- Cloud storage paths preserved for Pascal VOC semantic segmentation imports
- Added example demonstrating S3-hosted semantic segmentation dataset usage
Tests
- Added tests validating remote (in-memory/S3-like) dataset import, path preservation, and segmentation handling
Chores
- Updated labelformat dependency to a pinned Git reference

…semantic-segmentation

coderabbitai · 2026-04-01T12:45:45Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds cloud-storage (remote URI) support for Pascal VOC semantic segmentation imports by preserving remote paths, introduces a path-normalization helper, pins labelformat to a Git commit, adds an S3 example, and extends tests to cover remote-path preservation and annotations.

Changes

Cohort / File(s)	Summary
Core Functionality `lightly_studio/src/lightly_studio/core/image/image_dataset.py`	Added private `_normalize_input_path()` using `fsspec` + `LocalFileSystem` to preserve remote URIs and convert local inputs to absolute `Path`; updated `add_samples_from_pascal_voc_segmentations()` to use it.
Examples `lightly_studio/src/lightly_studio/examples/example_semantic_segmentation_s3.py`	New example demonstrating creating a semantic-segmentation dataset from S3 Pascal VOC–style data; reads class mapping via `fsspec.open` and calls `add_samples_from_pascal_voc_segmentations()`.
Tests `lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py`	Refactored tests to parameterize local vs `memory://` paths, write masks via `Image.fromarray(...)`, and assert `sample.file_path_abs` preserves remote URIs along with existing embedding and annotation checks.
Changelog & Dependencies `CHANGELOG.md`, `lightly_studio/pyproject.toml`	Changelog: added entry for cloud-storage Pascal VOC import support. `pyproject.toml`: changed `labelformat` dependency to a pinned Git commit reference.

Sequence Diagram(s)

sequenceDiagram
  participant ExampleScript as Example Script
  participant FSSpec as fsspec (S3 / memory)
  participant ImageDataset as ImageDataset.add_samples_from_pascal_voc_segmentations
  participant DB as DB / Storage

  ExampleScript->>FSSpec: resolve images_path, masks_path, class mapping
  ExampleScript->>ImageDataset: call add_samples_from_pascal_voc_segmentations(...)
  ImageDataset->>FSSpec: list/read image & mask files (preserve remote URIs)
  ImageDataset->>DB: create samples with file_path_abs and semantic annotations
  ImageDataset->>DB: generate and store embeddings
  ExampleScript->>DB: start GUI (reads samples/annotations)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Bump labelformat #866: Edits the same labelformat dependency entry in pyproject.toml (alternate approaches to pinning/version source).

Suggested reviewers

JonasWurst
MalteEbner

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and accurately describes the main objective of the PR: adding support for cloud-storage paths in Pascal VOC semantic segmentation imports.
Description check	✅ Passed	The PR description follows the template structure with all required sections completed: summary of changes, testing approach, and CHANGELOG update confirmation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…semantic-segmentation

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7693cb8d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lightly_studio/pyproject.toml`:
- Line 36: The labelformat dependency in pyproject.toml is pinned to a short
commit hash ("labelformat @
git+https://github.com/lightly-ai/labelformat.git@f9cf7d5"); replace the short
SHA with the full 40-character commit SHA
(f9cf7d51dd2b058d27d91d733a87475a404c1f18) so the dependency line uses the
complete hash to ensure deterministic installs and avoid collisions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6881afc8-6eb5-46e1-891f-363624d5dae5

📥 Commits

Reviewing files that changed from the base of the PR and between 2369350 and fdcf313.

⛔ Files ignored due to path filters (1)

lightly_studio/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

CHANGELOG.md
lightly_studio/pyproject.toml
lightly_studio/src/lightly_studio/core/image/image_dataset.py
lightly_studio/src/lightly_studio/examples/example_semantic_segmentation_s3.py
lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py

coderabbitai

🧹 Nitpick comments (1)

lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py (1)

200-237: Consider extracting shared annotation assertions into a helper.

The annotation checks are duplicated across tests; a small helper would reduce maintenance overhead and make future schema changes easier to update.

♻️ Optional refactor sketch

+def _assert_sample_annotations(sample, expected_file_path: str | None = None) -> None:
+    if expected_file_path is not None:
+        assert sample.file_path_abs == expected_file_path
+    annotations = sorted(sample.annotations, key=lambda ann: ann.label)
+    # shared assertions for bg/cat/dog ...
+
 # in both tests:
-annotations = sorted(samples[0].annotations, key=lambda ann: ann.label)
-... many repeated asserts ...
+_assert_sample_annotations(samples[0], expected_file_path=f"{images_path}/image1.jpg")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py` around
lines 200 - 237, Extract the repeated assertion blocks into a small helper
function (e.g., assert_segmentation_annotation) that accepts the annotation
object (or its fields: label, x, y, width, height, segmentation_mask) and
performs the isinstance check and all equality assertions; then replace the
duplicated blocks in this test with calls like
assert_segmentation_annotation(annotations[0], label="bg", x=0, y=0, width=4,
height=3, segmentation_mask=[...]) and similarly for other annotations and
samples[1].annotations[0]; ensure the helper references
SemanticSegmentationAnnotation and uses the same assertion semantics so behavior
is unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py`:
- Around line 200-237: Extract the repeated assertion blocks into a small helper
function (e.g., assert_segmentation_annotation) that accepts the annotation
object (or its fields: label, x, y, width, height, segmentation_mask) and
performs the isinstance check and all equality assertions; then replace the
duplicated blocks in this test with calls like
assert_segmentation_annotation(annotations[0], label="bg", x=0, y=0, width=4,
height=3, segmentation_mask=[...]) and similarly for other annotations and
samples[1].annotations[0]; ensure the helper references
SemanticSegmentationAnnotation and uses the same assertion semantics so behavior
is unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0fc75f37-296b-412b-9c9c-b098d0129243

📥 Commits

Reviewing files that changed from the base of the PR and between fdcf313 and 6ad3d3d.

⛔ Files ignored due to path filters (1)

lightly_studio/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

lightly_studio/pyproject.toml
lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py

🚧 Files skipped from review as they are similar to previous changes (1)

lightly_studio/pyproject.toml

coderabbitai

🧹 Nitpick comments (1)

lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py (1)

200-237: Consider extracting shared annotation assertions into a helper.

The annotation assertions (lines 200-237) duplicate those in the first test (lines 67-108). While test duplication is acceptable for readability, extracting a helper could reduce maintenance burden if assertion logic changes.

def _assert_expected_annotations(samples: list) -> None:
    """Verify semantic segmentation annotations for standard test masks."""
    annotations = sorted(samples[0].annotations, key=lambda ann: ann.label)
    # ... shared assertion logic

This is optional since test independence is also valuable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py` around
lines 200 - 237, Extract the repeated annotation assertion logic into a single
helper function (e.g., _assert_expected_annotations) that takes the samples list
and performs the shared checks currently duplicated in the block that inspects
samples[0].annotations and samples[1].annotations; move the
sorted(samples[0].annotations, key=lambda ann: ann.label) logic and all
assertions about SemanticSegmentationAnnotation, .label, .x, .y, .width,
.height, and .segmentation_mask into that helper and call it from both test
locations to eliminate duplication while keeping test behavior identical.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py`:
- Around line 200-237: Extract the repeated annotation assertion logic into a
single helper function (e.g., _assert_expected_annotations) that takes the
samples list and performs the shared checks currently duplicated in the block
that inspects samples[0].annotations and samples[1].annotations; move the
sorted(samples[0].annotations, key=lambda ann: ann.label) logic and all
assertions about SemanticSegmentationAnnotation, .label, .x, .y, .width,
.height, and .segmentation_mask into that helper and call it from both test
locations to eliminate duplication while keeping test behavior identical.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 569af699-563a-48fe-993b-dcdedc6b52fd

📥 Commits

Reviewing files that changed from the base of the PR and between 6ad3d3d and 24f1cec.

📒 Files selected for processing (1)

lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py

LeonardoRosaa · 2026-04-01T14:24:06Z

/review

MalteEbner

Code mostly looks good to me, but some details should be improved.
Will you update the docs in a follow-up PR?

Co-authored-by: MalteEbner <malte.ebner@gmail.com>

…gmentation' of https://github.com/lightly-ai/lightly-studio into leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation

…semantic-segmentation

LeonardoRosaa added 2 commits April 1, 2026 09:40

add cloud-storage to pascalvoc

2ba9653

Merge branch 'main' into leonardo-lig-8784-support-cloud-storage-for-…

f91f6b9

…semantic-segmentation

fix uv.lock

e7693cb

LeonardoRosaa marked this pull request as ready for review April 1, 2026 13:25

LeonardoRosaa and others added 2 commits April 1, 2026 10:26

Merge branch 'main' into leonardo-lig-8784-support-cloud-storage-for-…

d37c9b3

…semantic-segmentation

update changelog

fdcf313

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated

Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated

Comment thread lightly_studio/src/lightly_studio/examples/example_semantic_segmentation_s3.py

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lightly_studio/pyproject.toml Outdated

LeonardoRosaa added 2 commits April 1, 2026 11:10

fix uuid and SHA

6ad3d3d

format file

24f1cec

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

MalteEbner reviewed Apr 2, 2026

View reviewed changes

Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated

Comment thread CHANGELOG.md Outdated

Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated

LeonardoRosaa and others added 3 commits April 2, 2026 11:40

Update CHANGELOG.md

88e3690

Co-authored-by: MalteEbner <malte.ebner@gmail.com>

use pytest.mark.parametrize

baa01dd

Merge branch 'leonardo-lig-8784-support-cloud-storage-for-semantic-se…

0c0e17b

…gmentation' of https://github.com/lightly-ai/lightly-studio into leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation

MalteEbner reviewed Apr 2, 2026

View reviewed changes

Comment thread lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py Outdated

avoid duplication

96266d9

LeonardoRosaa requested a review from MalteEbner April 2, 2026 12:48

MalteEbner approved these changes Apr 4, 2026

View reviewed changes

Merge branch 'main' into leonardo-lig-8784-support-cloud-storage-for-…

979c49f

…semantic-segmentation

LeonardoRosaa merged commit ec24a74 into main Apr 4, 2026
14 checks passed

LeonardoRosaa deleted the leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation branch April 4, 2026 08:28

Conversation

LeonardoRosaa commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What has changed and why?

How has it been tested?

Did you update CHANGELOG.md?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

LeonardoRosaa commented Apr 1, 2026

Uh oh!

MalteEbner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LeonardoRosaa commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading