chore(regenerate-readme): aggregate rows by product name by mosheabr · Pull Request #200 · NVIDIA/skills

mosheabr · 2026-05-31T18:35:52Z

Onboarding type

New product onboarding (new `components.d/.yml` file)
Other (catalog change, README fix, infrastructure, etc.)

Other context

Today `regenerate-readme.sh` emits one README row per yml entry — one row from each `components.d/.yml` plus one row from each entry in `manual-components.yml`. When two entries share the same display `name` (synced + manual under the same product), they render as two separate rows with the same product name.

This blocks the upcoming Physical AI Data Factory onboarding (#198), which adds a `components.d` entry named "Physical AI" — same name as the existing manual entry that covers the 5 `omniverse-` / `physical-ai-` skills from the internal Skill Hub. Per the product owner direction, all those skills belong under one consolidated "Physical AI" row.

Aggregation logic

Both loops now emit structured TSV (one column per cell, plus an `is_manual` flag) rather than pre-formatted markdown rows.
An awk pass groups rows by lowercase name; for groups of 2+ entries:
- catalog cells are concatenated with the existing ` · ` separator
- skill counts are summed
- synced row's metadata (description, source, version, link cells) wins over the manual row's em-dash defaults
Single-entry rows pass through unchanged, so existing products with no name collision render byte-identically to today.

Verified locally

Current main state (only Physical AI in manual): renders one row, output identical to pre-aggregation behavior. (Local re-runs without `VERSIONS_FILE` present produce em-dash version cells — same as PR Manual catalog exception for Physical AI skills (temp until Computex) #156 behavior — but the sync workflow always populates `VERSIONS_FILE` first in production.)
Simulated PR Add physical-ai-data-factory component config #198 landing with `name: Physical AI` (5 manual + 2 synced): renders one consolidated row with 7 skills, synced source cell, real link cells from the synced repo (issues, discussions, contributing, security).

Sequencing

This should merge before PR #198. After this lands, PR #198 merges cleanly and the next sync emits a single "Physical AI" row with 7 skills.

All PRs

All commits signed off with DCO (`git commit -s`).

Today the script emits one README row per yml entry — one row from each components.d/<product>.yml plus one row from each entry in manual-components.yml. When two entries share the same display `name` (synced + manual under the same product), they render as two separate rows with the same product name. This blocks the upcoming Physical AI Data Factory onboarding (PR #198), which adds a components.d entry named "Physical AI" — same name as the existing manual entry that covers the 5 omniverse-* / physical-ai-* skills from the internal Skill Hub. Aggregation logic: - Both loops now emit structured TSV (one column per cell, plus an is_manual flag) rather than pre-formatted markdown rows. - An awk pass groups rows by lowercase name; for groups of 2+ entries: - catalog cells are concatenated with the existing " · " separator - skill counts are summed - synced row's metadata (description, source, version, link cells) wins over the manual row's em-dash defaults - Single-entry rows pass through unchanged, so existing products with no name collision render byte-identically to today. Verified locally: - Current main state (only Physical AI in manual): renders one row, identical output to pre-aggregation behavior. - Simulated PR #198 landing with `name: Physical AI` (5 manual + 2 synced): renders one consolidated row with 7 skills, synced source cell, real link cells from the synced repo (issues, discussions, contributing, security). Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>

The Catalog column was a one-link pointer into skills/<dir>/ that duplicated information already conveyed by the Skills count column and the Source column. For single-skill products it just pointed at the lone skill dir; for multi-skill synced products (cuOpt 12, NeMo MBridge 20, NemoClaw 10, etc.) it pointed at an arbitrary primary catalog_dir while the rest of the skills were invisible at the table level. That asymmetry surfaced as a visible bug after PR #200's aggregation work landed: aggregated rows showed "Physical AI | 7 skills" but the catalog cell only listed 6 of the 7 (manual side listed all 5, synced side listed only the primary). Removing the column resolves the asymmetry and matches how customers actually navigate the catalog — Skills tells them how many, Source goes to the upstream repo, install commands are in the Quickstart section, and the skills/ directory is one click away from the header navigation. Changes: - Drop the Catalog column from both the table header and per-row output in regenerate-readme.sh - Drop catalog_cell from the structured TSV emitted by both the synced and manual loops, and from the awk aggregation pass - Regenerate README to reflect the column removal across all rows Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>

mosheabr requested a review from sayalinvidia as a code owner May 31, 2026 18:35

mosheabr mentioned this pull request May 31, 2026

Add physical-ai-data-factory component config #198

Merged

12 tasks

mosheabr merged commit dadde58 into main May 31, 2026
3 checks passed

mosheabr deleted the chore/regenerate-readme-aggregate-by-name branch May 31, 2026 18:36

mosheabr mentioned this pull request May 31, 2026

chore: drop redundant Catalog column from README skills table #201

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(regenerate-readme): aggregate rows by product name#200

chore(regenerate-readme): aggregate rows by product name#200
mosheabr merged 1 commit into
mainfrom
chore/regenerate-readme-aggregate-by-name

mosheabr commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mosheabr commented May 31, 2026

Onboarding type

Other context

Aggregation logic

Verified locally

Sequencing

All PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant