Success only filter for lerobot dataset conversion. by k1000dai · Pull Request #22 · enactic/openarm_dataset

k1000dai · 2026-05-11T10:10:52Z

Summary

Adds --success-only to filter out failed episodes when converting to lerobot_v2.1. Because LeRobot v2.1 requires contiguous episode_index and task_index
values starting from 0, the conversion now remaps original sparse indices to a contiguous range whenever episodes are filtered out (and falls back to the identity map when nothing is filtered).
Switches train_end = int(num_episodes * train_split) to round(...) .

Copilot

Pull request overview

Adds options and logic to support filtering to successful episodes during LeRobot v2.1 conversion, while maintaining LeRobot’s requirement for contiguous episode_index/task_index and enabling safe re-runs into an existing output directory.

Changes:

Add success_only filtering during conversion and remap episode/task indices to contiguous ranges in the output.
Add overwrite support to clear prior LeRobot outputs safely when re-running conversion.
Adjust train split computation to avoid empty train splits on very small (filtered) datasets.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`tests/test_lerobot_v21.py`	Updates existing conversion fixture to use `overwrite`, and adds a new test covering `success_only` conversion output.
`src/openarm_dataset/lerobot_v21.py`	Implements `success_only` filtering, contiguous episode/task remapping, overwrite clearing logic, and updates split computation.
`src/openarm_dataset/convert.py`	Exposes `--success-only` and `--overwrite` CLI flags and passes them through to `Dataset.write(...)` for `lerobot_v2.1`.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

abetomo · 2026-05-11T10:43:33Z

--overwrite feels out of scope for this PR, which is focused on the success-only filter.
Would you mind splitting it into a separate PR?

k1000dai · 2026-05-11T10:44:03Z

OK, will separate the PR.

k1000dai · 2026-05-11T10:53:33Z

Remove --overwrite related code.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

+    num_episodes = len(records)
    total_chunks = max((num_episodes - 1) // CHUNK_SIZE + 1, 0) if num_episodes else 0
-    train_end = int(num_episodes * train_split)
+    train_end = round(num_episodes * train_split)


+def _build_remaps(dataset: Dataset, records):
+    """Build remapping dicts from original episode/task indices to contiguous indices.
+
+    When records is a filtered subset of episodes (e.g., success_only=True),
+    original indices may be sparse. LeRobot v2.1 expects episode/task indices
+    to be contiguous starting from 0. When records contains all episodes the
+    returned maps are the identity.
+    """
+    remap_episode_index = {original: new for new, (original, *_) in enumerate(records)}
+    seen = set()
+    used_task_indices = []
+    for original_episode_index, *_ in records:
+        original_task_index = int(
+            dataset.meta.episodes[original_episode_index]["task_index"]
+        )
+        if original_task_index not in seen:
+            seen.add(original_task_index)
+            used_task_indices.append(original_task_index)
+    used_task_indices.sort()
+    remap_task_index = {original: new for new, original in enumerate(used_task_indices)}
+    return remap_episode_index, remap_task_index


+    parser.add_argument(
+        "--success-only",
+        help="Include only successful episodes in the output dataset (default: False) if the output format is lerobot_v2.1",
+        action="store_true",
+        default=False,
+    )


kou

It may be better that we create a class instead of using module private functions to reduce arguments for module private functions.
Can we work on it as a separated task?

kou · 2026-05-12T00:52:11Z

+):
    records = []
    for episode_index in range(dataset.meta.num_episodes):
+        success = bool(dataset.meta.episodes[episode_index]["success"])


Why do we need bool() here?

kou · 2026-05-12T00:53:21Z

+    seen = set()
+    used_task_indices = []
+    for original_episode_index, *_ in records:
+        original_task_index = int(


Why do we need int() here?

kou

+1

abetomo · 2026-05-12T01:31:48Z


    # collect downsampled data for each episode
-    records = _collect_downsampled_data(dataset, fps, joint_keys)
+    records = _collect_downsampled_data(dataset, fps, joint_keys, success_only)


Should we return an error when there are 0 records?

Is empty LeRobotDataset v2.1 invalid?

k1000dai added 6 commits May 11, 2026 10:40

success only

864f769

Merge branch 'main' into success_only

80c9a00

fix merge problem

5e05143

nit pick

47142ec

remap ids

661143f

remap ids and format

d666e2d

Copilot AI review requested due to automatic review settings May 11, 2026 10:10

Copilot started reviewing on behalf of k1000dai May 11, 2026 10:11 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread src/openarm_dataset/lerobot_v21.py Outdated

Comment thread src/openarm_dataset/lerobot_v21.py Outdated

Comment thread src/openarm_dataset/lerobot_v21.py Outdated

ceil to round

4016c35

k1000dai requested a review from Copilot May 11, 2026 10:24

Copilot started reviewing on behalf of k1000dai May 11, 2026 10:24 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread src/openarm_dataset/lerobot_v21.py

Comment thread src/openarm_dataset/lerobot_v21.py Outdated

k1000dai added 2 commits May 11, 2026 19:37

validate out path

af6f141

use any, format

d11259a

k1000dai changed the title ~~Success only filter for lerobot dataset conversion.l~~ Success only filter for lerobot dataset conversion. May 11, 2026

remove overwrite

6428906

Copilot AI review requested due to automatic review settings May 11, 2026 10:52

Copilot started reviewing on behalf of k1000dai May 11, 2026 10:52 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread src/openarm_dataset/convert.py

kou reviewed May 12, 2026

View reviewed changes

Comment thread src/openarm_dataset/convert.py Outdated

Revert a needless change

7586d73

Copilot AI review requested due to automatic review settings May 12, 2026 00:49

Copilot started reviewing on behalf of kou May 12, 2026 00:50 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

kou reviewed May 12, 2026

View reviewed changes

kou approved these changes May 12, 2026

View reviewed changes

abetomo reviewed May 12, 2026

View reviewed changes

Conversation

k1000dai commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

abetomo commented May 11, 2026

Uh oh!

k1000dai commented May 11, 2026

Uh oh!

k1000dai commented May 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

kou May 12, 2026

Choose a reason for hiding this comment

Uh oh!

kou May 12, 2026

Choose a reason for hiding this comment

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

abetomo May 12, 2026

Choose a reason for hiding this comment

Uh oh!

kou May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

k1000dai commented May 11, 2026 •

edited

Loading