Skip to content

Success only filter for lerobot dataset conversion.#22

Open
k1000dai wants to merge 11 commits into
enactic:mainfrom
k1000dai:success_only
Open

Success only filter for lerobot dataset conversion.#22
k1000dai wants to merge 11 commits into
enactic:mainfrom
k1000dai:success_only

Conversation

@k1000dai
Copy link
Copy Markdown
Contributor

@k1000dai k1000dai commented May 11, 2026

Summary

  • Adds --success-only to filter out failed episodes when converting to lerobot_v2.1. Because LeRobot v2.1 requires contiguous episode_index and task_index
    values starting from 0, the conversion now remaps original sparse indices to a contiguous range whenever episodes are filtered out (and falls back to the identity map when nothing is filtered).
  • Switches train_end = int(num_episodes * train_split) to round(...) .

Copilot AI review requested due to automatic review settings May 11, 2026 10:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds options and logic to support filtering to successful episodes during LeRobot v2.1 conversion, while maintaining LeRobot’s requirement for contiguous episode_index/task_index and enabling safe re-runs into an existing output directory.

Changes:

  • Add success_only filtering during conversion and remap episode/task indices to contiguous ranges in the output.
  • Add overwrite support to clear prior LeRobot outputs safely when re-running conversion.
  • Adjust train split computation to avoid empty train splits on very small (filtered) datasets.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/test_lerobot_v21.py Updates existing conversion fixture to use overwrite, and adds a new test covering success_only conversion output.
src/openarm_dataset/lerobot_v21.py Implements success_only filtering, contiguous episode/task remapping, overwrite clearing logic, and updates split computation.
src/openarm_dataset/convert.py Exposes --success-only and --overwrite CLI flags and passes them through to Dataset.write(...) for lerobot_v2.1.

Comment thread src/openarm_dataset/lerobot_v21.py Outdated
Comment thread src/openarm_dataset/lerobot_v21.py Outdated
Comment thread src/openarm_dataset/lerobot_v21.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/openarm_dataset/lerobot_v21.py
Comment thread src/openarm_dataset/lerobot_v21.py Outdated
@abetomo
Copy link
Copy Markdown
Contributor

abetomo commented May 11, 2026

--overwrite feels out of scope for this PR, which is focused on the success-only filter.
Would you mind splitting it into a separate PR?

@k1000dai
Copy link
Copy Markdown
Contributor Author

OK, will separate the PR.

@k1000dai k1000dai changed the title Success only filter for lerobot dataset conversion.l Success only filter for lerobot dataset conversion. May 11, 2026
Copilot AI review requested due to automatic review settings May 11, 2026 10:52
@k1000dai
Copy link
Copy Markdown
Contributor Author

Remove --overwrite related code.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread src/openarm_dataset/convert.py
Comment thread src/openarm_dataset/convert.py Outdated
Copilot AI review requested due to automatic review settings May 12, 2026 00:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

num_episodes = len(records)
total_chunks = max((num_episodes - 1) // CHUNK_SIZE + 1, 0) if num_episodes else 0
train_end = int(num_episodes * train_split)
train_end = round(num_episodes * train_split)
Comment on lines +115 to +135
def _build_remaps(dataset: Dataset, records):
"""Build remapping dicts from original episode/task indices to contiguous indices.

When records is a filtered subset of episodes (e.g., success_only=True),
original indices may be sparse. LeRobot v2.1 expects episode/task indices
to be contiguous starting from 0. When records contains all episodes the
returned maps are the identity.
"""
remap_episode_index = {original: new for new, (original, *_) in enumerate(records)}
seen = set()
used_task_indices = []
for original_episode_index, *_ in records:
original_task_index = int(
dataset.meta.episodes[original_episode_index]["task_index"]
)
if original_task_index not in seen:
seen.add(original_task_index)
used_task_indices.append(original_task_index)
used_task_indices.sort()
remap_task_index = {original: new for new, original in enumerate(used_task_indices)}
return remap_episode_index, remap_task_index
Comment on lines +59 to +64
parser.add_argument(
"--success-only",
help="Include only successful episodes in the output dataset (default: False) if the output format is lerobot_v2.1",
action="store_true",
default=False,
)
Copy link
Copy Markdown
Contributor

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better that we create a class instead of using module private functions to reduce arguments for module private functions.
Can we work on it as a separated task?

):
records = []
for episode_index in range(dataset.meta.num_episodes):
success = bool(dataset.meta.episodes[episode_index]["success"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need bool() here?

seen = set()
used_task_indices = []
for original_episode_index, *_ in records:
original_task_index = int(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need int() here?

Copy link
Copy Markdown
Contributor

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


# collect downsampled data for each episode
records = _collect_downsampled_data(dataset, fps, joint_keys)
records = _collect_downsampled_data(dataset, fps, joint_keys, success_only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return an error when there are 0 records?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is empty LeRobotDataset v2.1 invalid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants