Skip to content

Deduplicate TheRock CI#2771

Merged
jayhawk-commits merged 37 commits intomainfrom
users/jayhawk-commits/deduplicate-ci
Jan 15, 2026
Merged

Deduplicate TheRock CI#2771
jayhawk-commits merged 37 commits intomainfrom
users/jayhawk-commits/deduplicate-ci

Conversation

@jayhawk-commits
Copy link
Copy Markdown
Contributor

@jayhawk-commits jayhawk-commits commented Jan 5, 2026

CI Deduplication: Centralize configuration and simplify workflows

Summary

This PR implements a comprehensive CI deduplication initiative across TheRock, rocm-libraries, and rocm-systems repositories.

Key achievements:

  1. Single source of truth - GPU family matrix and CI logic centralized in TheRock, eliminating duplication
  2. Significant code reduction - Enables deletion of 11 duplicate CI files across rocm-libraries and rocm-systems (6 + 5 files)
  3. Unified workflows - External repos use TheRock's workflows on both Linux and Windows with automatic configuration

This PR enables:

Result: One place to maintain CI configuration instead of three.


Architecture: Before vs After

Before: Duplicated Configuration ❌

graph TB
    subgraph TheRock_Repo [TheRock Repository]
        TR_Matrix[amdgpu_family_matrix.py]
        TR_ConfigCI[configure_ci.py]
        TR_Workflows[CI Workflows]
    end
    
    subgraph RocmLibs [rocm-libraries Repository - DUPLICATES]
        RL_Matrix["therock_matrix.py<br/>⚠️ DUPLICATE"]
        RL_ConfigCI["therock_configure_ci.py<br/>⚠️ DUPLICATE"]
        RL_Workflows["therock-ci-*.yml<br/>⚠️ DUPLICATE"]
    end
    
    subgraph RocmSystems [rocm-systems Repository - DUPLICATES]
        RS_Matrix["therock_matrix.py<br/>⚠️ DUPLICATE"]
        RS_ConfigCI["therock_configure_ci.py<br/>⚠️ DUPLICATE"]
        RS_Workflows["therock-ci-*.yml<br/>⚠️ DUPLICATE"]
    end
    
    TR_Matrix -.manual copies.-> RL_Matrix
    TR_Matrix -.manual copies.-> RS_Matrix
    TR_ConfigCI -.manual copies.-> RL_ConfigCI
    TR_ConfigCI -.manual copies.-> RS_ConfigCI
    TR_Workflows -.manual copies.-> RL_Workflows
    TR_Workflows -.manual copies.-> RS_Workflows
Loading

After: Centralized Configuration ✅

graph TB
    subgraph TheRock_Repo ["TheRock Repository ✅ SINGLE SOURCE OF TRUTH"]
        TR_Matrix[amdgpu_family_matrix.py]
        TR_ConfigCI["configure_ci.py<br/>with external repo detection"]
        TR_Build_Linux["build_portable_linux_artifacts.yml"]
        TR_Build_Windows["build_windows_artifacts.yml"]
        TR_Test["test_component.yml<br/>test_artifacts.yml"]
    end
    
    subgraph RocmLibs [rocm-libraries Repository]
        RL_CI["therock-ci.yml<br/>simple caller"]
        RL_Nightly["therock-ci-nightly.yml<br/>simple caller"]
    end
    
    subgraph RocmSystems [rocm-systems Repository]
        RS_CI["therock-ci.yml<br/>simple caller"]
    end
    
    RL_CI -->|external_source_checkout=true| TR_ConfigCI
    RL_Nightly -->|external_source_checkout=true| TR_ConfigCI
    RS_CI -->|external_source_checkout=true| TR_ConfigCI
    
    TR_ConfigCI -->|detects rocm-libraries| TR_Build_Linux
    TR_ConfigCI -->|detects rocm-libraries| TR_Build_Windows
    TR_ConfigCI -->|detects rocm-systems| TR_Build_Linux
    TR_ConfigCI -->|detects rocm-systems| TR_Build_Windows
    
    TR_Build_Linux --> TR_Test
    TR_Build_Windows --> TR_Test
Loading

File Deletion Mapping

rocm-libraries - 6 files deleted

Deleted File Replaced By (TheRock) How It Works
.github/scripts/therock_matrix.py build_tools/github_actions/amdgpu_family_matrix.py GPU families defined once, used by all
.github/scripts/therock_configure_ci.py build_tools/github_actions/configure_ci.py Auto-detects repo, generates matrix
.github/workflows/therock-ci-linux.yml .github/workflows/build_portable_linux_artifacts.yml Detects rocm-libraries, checks out TheRock+CK
.github/workflows/therock-ci-windows.yml .github/workflows/build_windows_artifacts.yml Detects rocm-libraries, checks out TheRock+CK
.github/workflows/therock-test-component.yml .github/workflows/test_component.yml Works for all repos via workflow_call
.github/workflows/therock-test-packages.yml .github/workflows/test_artifacts.yml Works for all repos via workflow_call

rocm-systems - 5 files deleted

Deleted File Replaced By (TheRock) How It Works
.github/scripts/therock_matrix.py build_tools/github_actions/amdgpu_family_matrix.py GPU families defined once, used by all
.github/scripts/therock_configure_ci.py build_tools/github_actions/configure_ci.py Auto-detects repo, generates matrix
.github/workflows/therock-ci-linux.yml .github/workflows/build_portable_linux_artifacts.yml Detects rocm-systems, checks out TheRock
.github/workflows/therock-ci-windows.yml .github/workflows/build_windows_artifacts.yml Detects rocm-systems, checks out TheRock
.github/workflows/therock-test-packages.yml .github/workflows/test_artifacts.yml Works for all repos via workflow_call

Key Changes

New External Repo Infrastructure

.github/workflows/test_external_repo_integration.yml (NEW)

  • Tests external repository CI integration by overriding the github.repository value
  • Automatically runs on PRs that modify CI infrastructure files
  • Validates both rocm-libraries and rocm-systems scenarios from TheRock

build_tools/github_actions/detect_external_repo_config.py (NEW)

  • Detects external repository configuration for TheRock CI workflows
  • Determines build configuration settings (patches, DVC, composable_kernel, CMake vars)
  • Outputs GitHub Actions variables that control checkout steps and build options
  • Clean API with pre-flattened platform-specific configuration

build_tools/github_actions/external_repo_project_maps.py (NEW)

  • Project mapping configurations for external repositories
  • Defines how file changes map to build configurations (which projects, CMake options, tests)
  • Based on configurations originally in rocm-libraries and rocm-systems
  • collect_projects_to_run() function with clean 3-parameter API (subtrees, platform, repo_name)
  • Unit tests verify that referenced paths actually exist in external repos

build_tools/github_actions/configure_ci.py (ENHANCED)

  • Auto-detects external repos (rocm-libraries, rocm-systems) from working directory or override
  • Generates build matrices with external project cross-products
  • Refactored main() from 114 → 45 lines with helper functions:
    • _extract_event_flags() - Extract and log event type flags
    • _generate_base_matrices() - Generate GPU family matrices
    • _apply_external_project_cross_product() - Cross-product projects × GPU families
  • _add_gpu_families_from_input() helper eliminates duplicate GPU family parsing
  • Early exit when no projects detected for external repos

Workflow Files (10 files updated)

  • All workflows enhanced with external_source_checkout, therock_ref, repository_override, projects parameters
  • Dual checkout pattern: external repo to source-repo/, TheRock to tr/
  • Dynamic THEROCK_DIR environment variable (tr for external, . for TheRock)
  • Platform-specific DVC pull via detect_external_repo_config.py
  • Patch application from patches/amd-mainline/{repo}/
  • Job names include platform and GPU family for external repos

Unit Tests (3 new test files)

  • test_configure_ci_external.py - External repo detection and project exit logic
  • test_detect_external_repo_config.py - Repository detection and configuration (9 tests)
  • test_external_repo_project_maps.py - Project mappings with path validation (13 tests)

Testing

Verified CI Runs

All scenarios have been tested and verified:

  1. TheRock CI (internal): Run #10715

    • Validates TheRock's own CI still works correctly
  2. External repo integration test (TheRock): Run #50

    • Tests external repo scenarios from TheRock's perspective
    • Uses TheRock's internal S3 bucket (therock-ci role)
    • Validates the workflow_call mechanism
  3. TheRock CI from rocm-libraries: Run #18050

    • Actual workflow_dispatch from rocm-libraries repo
    • Uses external S3 permissions (therock-ci-external role)
    • Demonstrates real-world usage
  4. TheRock CI from rocm-systems: Run #14327

    • Actual workflow_dispatch from rocm-systems repo
    • Uses external S3 permissions (therock-ci-external role)
    • Demonstrates real-world usage

Key difference: The external integration test (#50) vs external repo triggers (#17546, #13905) differ only in S3 bucket permissions. The integration test uses TheRock's internal role since it runs from TheRock, while real external repo triggers use the external role. All other behavior is identical, ensuring the integration test accurately validates external repo functionality.


Related PRs

⚠️ Merge order: This PR → ROCm/rocm-libraries#3629ROCm/rocm-systems#2495

jayhawk-commits and others added 12 commits January 5, 2026 15:45
- Introduced logic to set the TheRock directory based on external source checkout input.
- Added auto-detection of external repository configurations for rocm-libraries and rocm-systems.
- Implemented conditional checkouts for TheRock and composable_kernel repositories.
- Enhanced the fetch_sources.py script to handle external source exclusions.
- Added a step to apply patches from TheRock to the external repository if applicable.
- Updated paths in various steps to utilize the dynamically set TheRock directory.
- Added a step to set the TheRock directory based on the external source checkout input.
- Updated various steps to utilize the dynamically set TheRock directory for Python dependencies, script execution, and patch application.
- Simplified conditional logic for script paths to enhance maintainability.
- Improved the logic for setting the TheRock directory across both Linux and Windows build workflows.
- Streamlined the integration of external source configurations and patching processes.
- Updated scripts to ensure consistent usage of the dynamically set TheRock directory for dependency management and execution.
- Moved the test execution step for build_tools to a consistent position in both Linux and Windows build workflows.
- Ensured that the test step is conditionally executed based on the external source checkout input.
- Improved maintainability by standardizing the test execution logic across different platforms.
Copy link
Copy Markdown
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great overall, thanks for this! some cleanup comments but no major concerns for how it is set up!

I remember the original concern for not combining was that updating in TheRock CI to update in rocm-libs/rocm-systems was really annoying (as it was being brought up), but now it's fairly stable and can be combined!

Comment thread .github/workflows/build_portable_linux_artifacts.yml Outdated
Comment thread .github/workflows/build_portable_linux_artifacts.yml Outdated
Comment thread .github/workflows/build_portable_linux_artifacts.yml Outdated
Comment thread .github/workflows/build_portable_linux_artifacts.yml Outdated
Comment thread .github/workflows/build_portable_linux_artifacts.yml Outdated
Comment thread build_tools/github_actions/configure_ci.py Outdated
Comment thread build_tools/github_actions/detect_external_projects.py Outdated
Comment thread build_tools/github_actions/external_repo_project_maps.py Outdated
Comment thread build_tools/github_actions/external_repo_project_maps.py Outdated
Comment thread build_tools/github_actions/external_repo_project_maps.py
jayhawk-commits and others added 3 commits January 8, 2026 17:32
This commit addresses all review feedback from PR #2771, improving code
quality, maintainability, and fixing a critical concurrency bug.

Review Comments Addressed:
1. Use env conditional for THEROCK_DIR instead of bash script
2. Move repo detection logic to Python script with full test coverage
3. Python script generates complete CMake options (cleaner YAML)
4. Add Windows composable_kernel support to miopen and hipdnn
5. Remove redundant ENABLE_ALL=OFF flags (auto-added by code)
6. Move inline comments above code blocks

Bug Fixes:
- Fix concurrency deadlock when multiple external repos call TheRock CI
  with same PR numbers by including repository name in concurrency group

Changes:
- build_portable_linux_artifacts.yml: Simplify THEROCK_DIR, move TheRock
  checkout before auto-detect, call Python script for repo detection
- build_windows_artifacts.yml: Same improvements as Linux workflow
- ci.yml: Add github.repository to concurrency group to prevent deadlocks
- configure_ci.py: Move comment above if statement per code style
- detect_external_repo_config.py: NEW - Centralized repo detection with
  --workspace flag to generate formatted CMake options
- test_detect_external_repo_config.py: NEW - 9 unit tests for repo detection
- external_repo_project_maps.py: Add Windows composable_kernel support,
  remove redundant ENABLE_ALL=OFF flags
- test_external_repo_project_maps.py: Update tests for Windows CK support

Testing:
- All 22 unit tests pass
- All pre-commit checks pass
- No linter errors

Reviewer: @geomin12
This commit addresses more review feedback from PR #2771.

Review Comments Addressed:
1. Use ternary operator in setup.yml to consolidate checkout steps
2. Combine duplicate functions - detect_external_projects.py now imports
   from configure_ci.py instead of duplicating code
3. Pass therock_ref through to Linux/Windows build workflows so external
   repos can specify which TheRock branch to use
4. Simplify configure_ci.py imports - remove unnecessary try/except since
   external repos always have TheRock checked out

Changes:
- setup.yml: Consolidated 3 checkout steps into 2 using ternary for path
- detect_external_projects.py: Import shared functions from configure_ci.py
- configure_ci.py: Removed try/except import pattern, simplified
- ci.yml: Pass therock_ref to Linux and Windows jobs
- ci_linux.yml: Accept and pass therock_ref to build workflow
- ci_windows.yml: Accept and pass therock_ref to build workflow
- build_portable_linux_artifacts.yml: Accept therock_ref, use in checkout
- build_windows_artifacts.yml: Accept therock_ref, use in checkout

Testing:
- All linter checks pass
- No import errors

Reviewer: @geomin12
…ting

This commit adds comprehensive support for testing external repository CI
integration directly from TheRock, enabling automated regression testing.

Key Features:
1. repository_override parameter - Allows testing external repo scenarios
   from TheRock by overriding github.repository detection
2. projects parameter - Allows overriding project detection for targeted
   testing (comma-separated format)
3. Automated PR testing - New workflow runs automatically on CI file changes
4. Input validation - Prevents misuse of repository_override

Changes:

Workflows:
- ci.yml: Added repository_override and projects inputs
- setup.yml: Added validation, fixed checkout to use override,
  passes parameters as env vars (GITHUB_REPOSITORY_OVERRIDE, PROJECTS)
- ci_linux.yml/ci_windows.yml: Pass through parameters
- build_portable_linux_artifacts.yml: Use override in detection,
  added repository_override input
- build_windows_artifacts.yml: Use override in detection,
  added repository_override input
- test_external_repo_integration.yml: NEW - Automated testing workflow
  with PR trigger and separate project inputs per repo

Scripts:
- configure_ci.py: Check GITHUB_REPOSITORY_OVERRIDE env var before
  path-based detection
- detect_external_projects.py: Changed to comma-separated format,
  projects input now acts as override (blank = auto-detect from files)

Other:
- .gitignore: Added CURSOR_SESSION.md exclusion

Testing:
- Automatic: Runs on PRs modifying CI infrastructure files
- Manual: workflow_dispatch with per-repo project overrides
- Projects input behavior:
  * Blank (default) = auto-detect from modified files
  * "all" = test all projects (override)
  * "proj1,proj2" = test specific projects (override)

This enables:
- Automated regression testing for external repo integration
- Targeted testing without CODEOWNERS issues
- Future PR automation for CI changes

Related: ROCm/rocm-libraries#3629, ROCm/rocm-systems#2495
jayhawk-commits added a commit to ROCm/rocm-systems that referenced this pull request Jan 9, 2026
Changed project input format from space-separated to comma-separated
and added pass-through to TheRock's CI workflows.

Changes:
- Updated projects input description to specify comma-separated format
- Clarified that projects input is an override (blank = auto-detect)
- Added projects parameter to workflow_call (passes to TheRock)

Example usage:
- Blank: Auto-detect projects from PR file changes (normal behavior)
- "all": Test all projects (override)
- "projects/clr,projects/rocminfo": Test specific projects (override)

This aligns with TheRock's updated external repo testing infrastructure
and enables targeted testing when needed.

Related: ROCm/TheRock#2771
jayhawk-commits added a commit to ROCm/rocm-libraries that referenced this pull request Jan 9, 2026
Changed project input format from space-separated to comma-separated
and added pass-through to TheRock's CI workflows.

Changes:
- Updated projects input description to specify comma-separated format
- Clarified that projects input is an override (blank = auto-detect)
- Added projects parameter to workflow_call (passes to TheRock)

Example usage:
- Blank: Auto-detect projects from PR file changes (normal behavior)
- "all": Test all projects (override)
- "projects/rocprim,projects/hipcub": Test specific projects (override)

This aligns with TheRock's updated external repo testing infrastructure
and enables targeted testing when needed.

Related: ROCm/TheRock#2771
@jayhawk-commits
Copy link
Copy Markdown
Contributor Author

Thanks for the initial feedback. Addressed the review comments. I also added override inputs for repository and projects affected so that testing of the external repo workflows can be done from those repos through workflow_dispatch and also within TheRock repo to verify against PRs that update CI workflows..

This commit consolidates multiple fixes to the external repository integration
testing infrastructure, addressing path resolution issues and workflow configuration
problems that prevented proper testing of rocm-libraries and rocm-systems.

Key Changes:

Workflow Configuration Fixes:
- Fixed git config operations for external repo checkouts (safe.directory, fetch.parallel)
- Corrected working-directory settings across Linux and Windows workflows
- Fixed duplicate TheRock checkout issue in external repo workflows
- Added proper concurrency group handling to prevent conflicts during parallel tests
- Improved PR_LABELS handling for empty label cases

Path Resolution Improvements:
- Corrected CMake source directory paths for external repos (container vs host paths)
- Fixed detect_external_repo_config.py path resolution using THEROCK_DIR
- Standardized relative path handling between external and TheRock repos
- Fixed build directory paths for proper artifact generation

External Repo Testing Enhancements:
- Added repository_override parameter for testing external repos from TheRock
- Added projects parameter for fine-grained project selection testing
- Improved diagnostic logging for external repo workflows
- Fixed Python setup timing in Windows workflows

Configuration Script Updates:
- Updated configure_ci.py to honor explicitly provided GPU families in PRs
- Applied code formatting (black) to Python scripts
- Enhanced external repo detection and configuration logic

The changes ensure that external repository CI integration tests can properly:
1. Checkout both the external repo and TheRock in correct directory structure
2. Run detection scripts with proper working directories and path references
3. Generate correct CMake configuration for external source builds
4. Handle git operations in the correct repository contexts
@jayhawk-commits jayhawk-commits force-pushed the users/jayhawk-commits/deduplicate-ci branch 2 times, most recently from 7b27e55 to 5e55941 Compare January 10, 2026 06:03
Enables rocm-libraries and rocm-systems to use TheRock's CI infrastructure
via workflow_call, eliminating duplicate workflow definitions.

Key fixes:
- Remove REPOSITORY_OVERRIDE from artifact downloads (artifacts accessed via
  S3 paths, not GitHub API)
- Remove project prefix from artifact_group for external repos (artifact
  names don't include project prefix)
- Enable builds for workflow_dispatch from external repos
- Generate platform-specific configs in single detection pass
- Use composablekernel from rocm-libraries instead of TheRock submodule
- Update test coverage for external repo integration

External repo workflows call TheRock's ci.yml with external_source_checkout=true,
allowing them to leverage TheRock's build/test infrastructure while building
from their own source repositories.
@marbre
Copy link
Copy Markdown
Member

marbre commented Jan 13, 2026

We had an issue with a history rewrite which confused the GH UI and the commits it is showing on your PR. Please make sure to update your branch e.g. in the GH UI via

image

Afterwards you need update your local branch via git pull.

Alternative git rebase solution

Alternatively you can manually update your branch locally by following the below instructions:

git checkout <yourbranch>
git fetch origin main
git rebase origin/main
git push --force-with-lease origin <yourbranch>

Copy link
Copy Markdown
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this work, overall looks great in TheRock! The sanity checks are flaky, fixing here #2910

obviously more to add here for llvm-project, etc, but this is HUGE!

Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had some issue with the github ui and needed to change to classic.
if there is anything odd e.g. same comment twice please ignore.

Comment thread .github/workflows/build_portable_linux_artifacts.yml
Comment thread .github/workflows/build_portable_linux_artifacts.yml
Comment thread .github/workflows/test_artifacts.yml Outdated
Comment thread .github/workflows/test_component.yml
Comment thread build_tools/github_actions/configure_ci.py Outdated
Comment thread build_tools/github_actions/external_repo_project_maps.py
Comment thread build_tools/github_actions/external_repo_project_maps.py
Comment thread .github/workflows/build_windows_artifacts.yml
Comment thread build_tools/github_actions/tests/test_configure_ci_external.py Outdated
- Stop shelling out for external project detection; centralize logic in external_repo_project_maps

- Accept canonical GPU family suffixes in overrides; update integration workflow overrides

- Standardize env var PROJECT_TO_TEST and tighten detect_external_repo_config CLI/validation

- Replace placeholder tests with real configure_ci/detect_external_repo_config unit tests

- Improve external repo detection to use path components instead of substring matching

- Add matrix size logging to help detect potential matrix explosion early
- Extract _add_gpu_families_from_input() helper to deduplicate GPU family parsing logic between workflow_dispatch and pull_request paths

- Refactor main() in configure_ci.py into smaller functions:
  * _extract_event_flags(): Extract and log event type flags
  * _generate_base_matrices(): Generate GPU family matrices for both platforms
  * _apply_external_project_cross_product(): Cross-product external projects with GPU families
  Main function reduced from 114 lines to ~45 lines

- Improve detect_external_repo_config.py:
  * Enhance output_github_actions_vars() docstring with Args and Returns sections
  * Pre-flatten platform-specific config values in main() for cleaner separation of concerns
  * Remove platform parameter from output_github_actions_vars() (now takes pre-resolved config)
  * Improve ArgumentParser description with output format examples and better epilog
  * Remove duplicate repository argument (positional + --repository)

- Refactor external_repo_project_maps.py:
  * Change collect_projects_to_run() to take repo_name instead of 4 separate dict fields
  * Cleaner API - function internally calls get_repo_config() rather than requiring caller to unpack config
  * Update all tests to use new simpler signature

All tests pass (22/22). Pre-commit hooks pass.
@jayhawk-commits
Copy link
Copy Markdown
Contributor Author

Applied feedback and CI runs look OK.

TheRock CI (internal): Run #10715 passes

External repo integration test (TheRock): Run #50 passes

TheRock CI from rocm-libraries: Run #18050 passes

TheRock CI from rocm-systems: Run #14327 compiled, and as of this message hip-tests started on gfx94X Linux to match inputs to the workflow, gfx120X Windows was skipped because there are no runners set for this gfx-OS combination.

@jayhawk-commits jayhawk-commits marked this pull request as ready for review January 15, 2026 03:24
@jayhawk-commits jayhawk-commits merged commit a1898df into main Jan 15, 2026
116 of 119 checks passed
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Jan 15, 2026
@jayhawk-commits jayhawk-commits deleted the users/jayhawk-commits/deduplicate-ci branch January 15, 2026 03:24
@mgehre-amd
Copy link
Copy Markdown
Contributor

Since today, I cannot manually run CI Nightly jobs anymore. They behave as-if I didn't specify any amdgpu family in the workflow dispatch on the github UI even though I did. Example Could this be caused by this PR?

@HereThereBeDragons
Copy link
Copy Markdown
Contributor

@mgehre-amd can you please provide which inputs you set?

@mgehre-amd
Copy link
Copy Markdown
Contributor

@HereThereBeDragons,

branch: users/mgehre-amd/gfx1153_tests
linux_variants: gfx1153
linux_test_labels: test:rocblas,test:hipblas,test:hipblaslt,test:rocsolver,test:rocprim,test:hipcub,test:rocthrust,test:hipsparse,test:rocsparse,test:rocrand,test:hiprand,test:rocfft,test:hipfft,test:miopen,test:hipdn,test:rocwmma

the rest stays at its defaults.

@HereThereBeDragons
Copy link
Copy Markdown
Contributor

@jayhawk-commits looking at the different runs from matthias,i see that setup.yml does not get amdgpu_family set.
ci_nightly.yml does not give the parameter to setup.yml.

so i think the change from
INPUT_LINUX_AMDGPU_FAMILIES: ${{ github.event.inputs.linux_amdgpu_families }}
to INPUT_LINUX_AMDGPU_FAMILIES: ${{ inputs.linux_amdgpu_families }} might be the reason. trying to understand it it seems more like a bug how it worked in the first place.

recommendation is to add these params to the with: statement when setup.yml is being called from ci_nightly.yml and whereever else

@mgehre-amd
Copy link
Copy Markdown
Contributor

Adding linux_amdgpu_families: ${{ inputs.linux_amdgpu_families }} to the setup.yml invocation looks a bit better.
@jayhawk-commits, hope you can make a PR to forward all relevant variables.

@jayhawk-commits
Copy link
Copy Markdown
Contributor Author

Taking a look.

jayhawk-commits added a commit that referenced this pull request Jan 15, 2026
…#2951)

After #2771, these workflows were calling setup.yml but not forwarding
their workflow_dispatch inputs (linux_amdgpu_families, test_labels,
etc). This caused the setup job to use default empty values instead of
the user-provided inputs or the workflow's own defaults.

Forward all relevant inputs from workflow_dispatch to setup.yml's
workflow_call to fix this.

**Affected workflows:**
- ci_nightly.yml
- ci_asan.yml
- multi_arch_ci.yml
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many design issues here. Please revert.

Comment on lines +154 to +165
- name: Patch external repo
if: ${{ inputs.external_source_checkout }}
working-directory: source-repo
run: |
# Apply patches from TheRock to the external repo
PATCHES_DIR="../$THEROCK_DIR/patches/amd-mainline/${{ steps.detect.outputs.patches_dir }}"
if [ -d "$PATCHES_DIR" ] && [ "$(ls -A $PATCHES_DIR/*.patch 2>/dev/null)" ]; then
git -c user.name="therockbot" -c "user.email=therockbot@amd.com" \
am --whitespace=nowarn $PATCHES_DIR/*.patch
else
echo "No patches found in $PATCHES_DIR or directory doesn't exist"
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work? If the external repository changes patches, will it have any way to edit this code?

This will need documentation updates in https://github.com/ROCm/TheRock/tree/main/patches#using-patches-from-rocm-libraries-rocm-systems-and-other-repositories.

Comment on lines +104 to +108
path: ${{ inputs.external_source_checkout && 'source-repo' || '.' }}

# safe.directory must be set before Runner Health Status
- name: Adjust git config
working-directory: ${{ inputs.external_source_checkout && 'source-repo' || '.' }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is duplicated a few times. We may want to pull that up into env and then use ${{ env. }} throughout to minimize how much logic there is after job startup time.

- name: Detect external source configuration
if: ${{ inputs.external_source_checkout }}
id: detect
run: python3 $THEROCK_DIR/build_tools/github_actions/detect_external_repo_config.py --repository "${{ inputs.repository_override || github.repository }}" --platform linux
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please brace-delimit all bash variables. Either use ${THEROCK_DIR} (common in scripts) or ${{ env.THEROCK_DIR }} (common in github actions workflows)

Just typing $THEROCK_DIR does not make it clear when the variable name ends

Comment on lines +177 to +182
extra_cmake_options: >-
${{ inputs.external_source_checkout && format('-D{0}=../source-repo', steps.detect.outputs.cmake_source_var) || '' }}
${{ inputs.extra_cmake_options }}
BUILD_DIR: build
run: |
python3 build_tools/github_actions/build_configure.py --manylinux
python3 $THEROCK_DIR/build_tools/github_actions/build_configure.py ${{ inputs.external_source_checkout && '' || '--manylinux' }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much magic here with format('-D{0} and ${{ inputs.external_source_checkout && '' || '--manylinux' }}

At a minimum this needs comments explaining the intent.

I think changes to the build_configure.py script may be more appropriate though.

Comment on lines 188 to +191
- name: Test Packaging
if: ${{ github.event.repository.name == 'TheRock' }}
if: ${{ github.event.repository.name == 'TheRock' && !inputs.external_source_checkout }}
run: |
ctest --test-dir build --output-on-failure
ctest --test-dir $THEROCK_DIR/build --output-on-failure
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip packaging tests if there is an external source checkout? Is that ever set when the event repository name is not TheRock?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the GitHub unicorn right now when trying to look up the existing code on the super-repos, but the behaviour for this PR is to first match what is being done on the CI workflows on the super-repos before adjusting/improving them. This would apply for many of your commented sections. I'll find the lines of code when GitHub is loading for me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctest step after the build is not run on the super-repos: https://github.com/ROCm/rocm-libraries/blob/develop/.github/workflows/therock-ci-linux.yml#L111

# If the dependent job failed/cancelled, this job will not be run
# The use_prebuilt_artifacts "or" statement ensures that tests will run if
# previous build step is run or skipped.concurrency.
# Skip for external repos (they don't need Python packages)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? They absolutely do need Python packages.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording on 'need' is incorrect in the comment, and more that we are matching current state as external repo workflows do not build python packages as-is. See https://github.com/ROCm/rocm-libraries/actions/runs/21027099988 as example.

Comment on lines +140 to +146
run: |
if [[ "${{ inputs.external_source_checkout }}" == "true" ]]; then
cd source-repo
python ../TheRock/build_tools/github_actions/configure_ci.py
else
./build_tools/github_actions/configure_ci.py
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a different configure script in rocm-libraries: https://github.com/ROCm/rocm-libraries/blob/develop/.github/scripts/therock_configure_ci.py. Will this stop using that? The logic has diverged quite a bit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configure_ci.py logic from the super-repos is captured now in TheRock through its configure_ci.py and the new python scripts added in this PR.

Comment on lines +150 to +156
run: |
if [[ "${{ inputs.external_source_checkout }}" == "true" ]]; then
# Use ADHOCBUILD for external repos to avoid building Python packages
echo "rocm_package_version=ADHOCBUILD" >> $GITHUB_OUTPUT
else
python ./build_tools/compute_rocm_package_version.py --release-type=dev
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! This is all wrong - we need to build all packages in all projects. Don't use ADHOCBUILD as the version.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's wrong there. I fixed it here. Please don't generate more work to clean up later.

@ScottTodd
Copy link
Copy Markdown
Member

Please revert - https://llvm.org/docs/CodeReview.html#can-code-be-reviewed-after-it-is-committed explains better than I can:

If a community member expresses a concern about a recent commit, and this concern would have been significant enough to warrant a conversation during pre-commit review (including around the need for more design discussions), they may ask for a revert to the original author who is responsible to revert the patch promptly. Developers often disagree, and erring on the side of the developer asking for more review prevents any lingering disagreement over code in the tree. This does not indicate any fault from the patch author, this is inherent to our post-commit review practices. Reverting a patch ensures that design discussions can happen without blocking other development; it’s entirely possible the patch will end up being reapplied essentially as-is once concerns have been resolved.

@jayhawk-commits
Copy link
Copy Markdown
Contributor Author

Please revert - https://llvm.org/docs/CodeReview.html#can-code-be-reviewed-after-it-is-committed explains better than I can:

If a community member expresses a concern about a recent commit, and this concern would have been significant enough to warrant a conversation during pre-commit review (including around the need for more design discussions), they may ask for a revert to the original author who is responsible to revert the patch promptly. Developers often disagree, and erring on the side of the developer asking for more review prevents any lingering disagreement over code in the tree. This does not indicate any fault from the patch author, this is inherent to our post-commit review practices. Reverting a patch ensures that design discussions can happen without blocking other development; it’s entirely possible the patch will end up being reapplied essentially as-is once concerns have been resolved.

Will do. Was waiting for unicorn issues to be resolved.

jayhawk-commits added a commit that referenced this pull request Jan 15, 2026
jayhawk-commits added a commit that referenced this pull request Jan 15, 2026
This PR reverts the CI deduplication work from PR #2771 and its
follow-up fix #2951.

## Reverted commits
- a1898df - "Deduplicate TheRock CI (#2771)"
- f6b0e0e - "[GitHub Actions] Update workflows calling setup.yml with
their inputs (#2951)"

## Reason for Revert

Deduplication design requires further changes.

---------

Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants