Skip to content

chore(regenerate-readme): aggregate rows by product name#200

Merged
mosheabr merged 1 commit into
mainfrom
chore/regenerate-readme-aggregate-by-name
May 31, 2026
Merged

chore(regenerate-readme): aggregate rows by product name#200
mosheabr merged 1 commit into
mainfrom
chore/regenerate-readme-aggregate-by-name

Conversation

@mosheabr
Copy link
Copy Markdown
Collaborator

Onboarding type

  • New product onboarding (new `components.d/.yml` file)
  • Other (catalog change, README fix, infrastructure, etc.)

Other context

Today `regenerate-readme.sh` emits one README row per yml entry — one row from each `components.d/.yml` plus one row from each entry in `manual-components.yml`. When two entries share the same display `name` (synced + manual under the same product), they render as two separate rows with the same product name.

This blocks the upcoming Physical AI Data Factory onboarding (#198), which adds a `components.d` entry named "Physical AI" — same name as the existing manual entry that covers the 5 `omniverse-` / `physical-ai-` skills from the internal Skill Hub. Per the product owner direction, all those skills belong under one consolidated "Physical AI" row.

Aggregation logic

  • Both loops now emit structured TSV (one column per cell, plus an `is_manual` flag) rather than pre-formatted markdown rows.
  • An awk pass groups rows by lowercase name; for groups of 2+ entries:
    • catalog cells are concatenated with the existing ` · ` separator
    • skill counts are summed
    • synced row's metadata (description, source, version, link cells) wins over the manual row's em-dash defaults
  • Single-entry rows pass through unchanged, so existing products with no name collision render byte-identically to today.

Verified locally

  • Current main state (only Physical AI in manual): renders one row, output identical to pre-aggregation behavior. (Local re-runs without `VERSIONS_FILE` present produce em-dash version cells — same as PR Manual catalog exception for Physical AI skills (temp until Computex) #156 behavior — but the sync workflow always populates `VERSIONS_FILE` first in production.)
  • Simulated PR Add physical-ai-data-factory component config #198 landing with `name: Physical AI` (5 manual + 2 synced): renders one consolidated row with 7 skills, synced source cell, real link cells from the synced repo (issues, discussions, contributing, security).

Sequencing

This should merge before PR #198. After this lands, PR #198 merges cleanly and the next sync emits a single "Physical AI" row with 7 skills.

All PRs

  • All commits signed off with DCO (`git commit -s`).

Today the script emits one README row per yml entry — one row from each
components.d/<product>.yml plus one row from each entry in
manual-components.yml. When two entries share the same display `name`
(synced + manual under the same product), they render as two separate
rows with the same product name.

This blocks the upcoming Physical AI Data Factory onboarding (PR #198),
which adds a components.d entry named "Physical AI" — same name as the
existing manual entry that covers the 5 omniverse-* / physical-ai-*
skills from the internal Skill Hub.

Aggregation logic:
- Both loops now emit structured TSV (one column per cell, plus an
  is_manual flag) rather than pre-formatted markdown rows.
- An awk pass groups rows by lowercase name; for groups of 2+ entries:
  - catalog cells are concatenated with the existing " · " separator
  - skill counts are summed
  - synced row's metadata (description, source, version, link cells)
    wins over the manual row's em-dash defaults
- Single-entry rows pass through unchanged, so existing products with
  no name collision render byte-identically to today.

Verified locally:
- Current main state (only Physical AI in manual): renders one row,
  identical output to pre-aggregation behavior.
- Simulated PR #198 landing with `name: Physical AI` (5 manual +
  2 synced): renders one consolidated row with 7 skills, synced source
  cell, real link cells from the synced repo (issues, discussions,
  contributing, security).

Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>
@mosheabr mosheabr requested a review from sayalinvidia as a code owner May 31, 2026 18:35
@mosheabr mosheabr merged commit dadde58 into main May 31, 2026
3 checks passed
@mosheabr mosheabr deleted the chore/regenerate-readme-aggregate-by-name branch May 31, 2026 18:36
mosheabr added a commit that referenced this pull request May 31, 2026
The Catalog column was a one-link pointer into skills/<dir>/ that
duplicated information already conveyed by the Skills count column
and the Source column. For single-skill products it just pointed at
the lone skill dir; for multi-skill synced products (cuOpt 12, NeMo
MBridge 20, NemoClaw 10, etc.) it pointed at an arbitrary primary
catalog_dir while the rest of the skills were invisible at the table
level. That asymmetry surfaced as a visible bug after PR #200's
aggregation work landed: aggregated rows showed "Physical AI | 7
skills" but the catalog cell only listed 6 of the 7 (manual side
listed all 5, synced side listed only the primary).

Removing the column resolves the asymmetry and matches how customers
actually navigate the catalog — Skills tells them how many, Source
goes to the upstream repo, install commands are in the Quickstart
section, and the skills/ directory is one click away from the
header navigation.

Changes:
- Drop the Catalog column from both the table header and per-row
  output in regenerate-readme.sh
- Drop catalog_cell from the structured TSV emitted by both the
  synced and manual loops, and from the awk aggregation pass
- Regenerate README to reflect the column removal across all rows

Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant