Support cloud-storage paths for Pascal VOC semantic segmentation imports#872
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds cloud-storage (remote URI) support for Pascal VOC semantic segmentation imports by preserving remote paths, introduces a path-normalization helper, pins Changes
Sequence Diagram(s)sequenceDiagram
participant ExampleScript as Example Script
participant FSSpec as fsspec (S3 / memory)
participant ImageDataset as ImageDataset.add_samples_from_pascal_voc_segmentations
participant DB as DB / Storage
ExampleScript->>FSSpec: resolve images_path, masks_path, class mapping
ExampleScript->>ImageDataset: call add_samples_from_pascal_voc_segmentations(...)
ImageDataset->>FSSpec: list/read image & mask files (preserve remote URIs)
ImageDataset->>DB: create samples with file_path_abs and semantic annotations
ImageDataset->>DB: generate and store embeddings
ExampleScript->>DB: start GUI (reads samples/annotations)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…semantic-segmentation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7693cb8d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lightly_studio/pyproject.toml`:
- Line 36: The labelformat dependency in pyproject.toml is pinned to a short
commit hash ("labelformat @
git+https://github.com/lightly-ai/labelformat.git@f9cf7d5"); replace the short
SHA with the full 40-character commit SHA
(f9cf7d51dd2b058d27d91d733a87475a404c1f18) so the dependency line uses the
complete hash to ensure deterministic installs and avoid collisions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6881afc8-6eb5-46e1-891f-363624d5dae5
⛔ Files ignored due to path filters (1)
lightly_studio/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
CHANGELOG.mdlightly_studio/pyproject.tomllightly_studio/src/lightly_studio/core/image/image_dataset.pylightly_studio/src/lightly_studio/examples/example_semantic_segmentation_s3.pylightly_studio/tests/core/image/test_image_dataset__pascal_voc.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py (1)
200-237: Consider extracting shared annotation assertions into a helper.The annotation checks are duplicated across tests; a small helper would reduce maintenance overhead and make future schema changes easier to update.
♻️ Optional refactor sketch
+def _assert_sample_annotations(sample, expected_file_path: str | None = None) -> None: + if expected_file_path is not None: + assert sample.file_path_abs == expected_file_path + annotations = sorted(sample.annotations, key=lambda ann: ann.label) + # shared assertions for bg/cat/dog ... + # in both tests: -annotations = sorted(samples[0].annotations, key=lambda ann: ann.label) -... many repeated asserts ... +_assert_sample_annotations(samples[0], expected_file_path=f"{images_path}/image1.jpg")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py` around lines 200 - 237, Extract the repeated assertion blocks into a small helper function (e.g., assert_segmentation_annotation) that accepts the annotation object (or its fields: label, x, y, width, height, segmentation_mask) and performs the isinstance check and all equality assertions; then replace the duplicated blocks in this test with calls like assert_segmentation_annotation(annotations[0], label="bg", x=0, y=0, width=4, height=3, segmentation_mask=[...]) and similarly for other annotations and samples[1].annotations[0]; ensure the helper references SemanticSegmentationAnnotation and uses the same assertion semantics so behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py`:
- Around line 200-237: Extract the repeated assertion blocks into a small helper
function (e.g., assert_segmentation_annotation) that accepts the annotation
object (or its fields: label, x, y, width, height, segmentation_mask) and
performs the isinstance check and all equality assertions; then replace the
duplicated blocks in this test with calls like
assert_segmentation_annotation(annotations[0], label="bg", x=0, y=0, width=4,
height=3, segmentation_mask=[...]) and similarly for other annotations and
samples[1].annotations[0]; ensure the helper references
SemanticSegmentationAnnotation and uses the same assertion semantics so behavior
is unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0fc75f37-296b-412b-9c9c-b098d0129243
⛔ Files ignored due to path filters (1)
lightly_studio/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
lightly_studio/pyproject.tomllightly_studio/tests/core/image/test_image_dataset__pascal_voc.py
🚧 Files skipped from review as they are similar to previous changes (1)
- lightly_studio/pyproject.toml
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py (1)
200-237: Consider extracting shared annotation assertions into a helper.The annotation assertions (lines 200-237) duplicate those in the first test (lines 67-108). While test duplication is acceptable for readability, extracting a helper could reduce maintenance burden if assertion logic changes.
def _assert_expected_annotations(samples: list) -> None: """Verify semantic segmentation annotations for standard test masks.""" annotations = sorted(samples[0].annotations, key=lambda ann: ann.label) # ... shared assertion logicThis is optional since test independence is also valuable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py` around lines 200 - 237, Extract the repeated annotation assertion logic into a single helper function (e.g., _assert_expected_annotations) that takes the samples list and performs the shared checks currently duplicated in the block that inspects samples[0].annotations and samples[1].annotations; move the sorted(samples[0].annotations, key=lambda ann: ann.label) logic and all assertions about SemanticSegmentationAnnotation, .label, .x, .y, .width, .height, and .segmentation_mask into that helper and call it from both test locations to eliminate duplication while keeping test behavior identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py`:
- Around line 200-237: Extract the repeated annotation assertion logic into a
single helper function (e.g., _assert_expected_annotations) that takes the
samples list and performs the shared checks currently duplicated in the block
that inspects samples[0].annotations and samples[1].annotations; move the
sorted(samples[0].annotations, key=lambda ann: ann.label) logic and all
assertions about SemanticSegmentationAnnotation, .label, .x, .y, .width,
.height, and .segmentation_mask into that helper and call it from both test
locations to eliminate duplication while keeping test behavior identical.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 569af699-563a-48fe-993b-dcdedc6b52fd
📒 Files selected for processing (1)
lightly_studio/tests/core/image/test_image_dataset__pascal_voc.py
|
/review |
MalteEbner
left a comment
There was a problem hiding this comment.
Code mostly looks good to me, but some details should be improved.
Will you update the docs in a follow-up PR?
Co-authored-by: MalteEbner <malte.ebner@gmail.com>
…gmentation' of https://github.com/lightly-ai/lightly-studio into leonardo-lig-8784-support-cloud-storage-for-semantic-segmentation
…semantic-segmentation
What has changed and why?
This PR enables Pascal VOC semantic segmentation imports from cloud/object storage URIs.
How has it been tested?
Unit tests
Did you update CHANGELOG.md?
Summary by CodeRabbit
New Features
Tests
Chores