Skip to content

[build] Adding fixes to allow full ASAN pass#2785

Merged
geomin12 merged 39 commits intomainfrom
users/geomin12/try-fft-type
Jan 9, 2026
Merged

[build] Adding fixes to allow full ASAN pass#2785
geomin12 merged 39 commits intomainfrom
users/geomin12/try-fft-type

Conversation

@geomin12
Copy link
Copy Markdown
Contributor

@geomin12 geomin12 commented Jan 6, 2026

Motivation

For some time, ASAN builds have been failing due to variety of build and Test Packaging errors. With these variety of fixes and conditionals, the CI ASAN pipeline will now pass properly

Closes #1990
Related to #2632

Technical Details

  • After a RCCL bump, the timeout extends up to 15 hours based on tests, so extending timeout
  • Most of the third-party libraries did not support ASAN, causing test failures
  • rocprofiler-systems doesn't support clang, thus adding an conditional
  • rocprofiler-sdk requires amd-hip as compiler
  • fftw3 requires a new cmake arg

Test Plan

Testing via CI ASAN

Test Result

Build passes here: https://github.com/ROCm/TheRock/actions/runs/20722419413/job/59489435646

Build without RCCL running here: https://github.com/ROCm/TheRock/actions/runs/20827525680/job/59832753950

Submission Checklist

@geomin12 geomin12 marked this pull request as ready for review January 6, 2026 20:52
Comment thread .github/workflows/build_portable_linux_artifacts.yml
@geomin12 geomin12 requested a review from marbre January 9, 2026 00:02
Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check if other third-party deps also need -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}?

"LIB_NAMES"
)

# Skip shared-library dlopen validation for sanitizer builds.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we skip it? because we do those testings on the normal release build or because something would be breaking? take too much time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests (as far as i understood) were not compatible with ASAN and caused some errors during my attempts to build

i tried an argument to add the correct path to find these via clang but it was not a fan

Comment thread comm-libs/CMakeLists.txt
Comment on lines +1 to +2
# TODO(#2632): Re-enable RCCL for sanitizer builds once RCCL build time is acceptable
if(THEROCK_ENABLE_RCCL AND THEROCK_SANITIZER STREQUAL "")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my previous comment I rather had in mind to disable building RCCL with ASAN not to disable building RCCL at all. This might be acceptable for now but only if we do not need to build anything that depends on RCCL.

Copy link
Copy Markdown
Contributor Author

@geomin12 geomin12 Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this if statement indicates that RCCL will build for normal builds, but for any sanitizer build (asan right now), it will not build RCCL!

normal builds:

2026-01-08T19:16:45.1607958Z [566/633] Building sub-project rccl (in background)
2026-01-08T19:16:45.7777997Z [567/633] Stage installing sub-project rccl
2026-01-08T19:16:48.3538093Z [568/633] Configure sub-project rccl-tests (in background)
2026-01-08T19:16:51.1596643Z [569/633] Building sub-project rccl-tests (in background)
2026-01-08T19:16:51.7753864Z [570/633] Stage installing sub-project rccl-tests
2026-01-08T19:16:51.8910293Z [571/633] Populate artifact rccl
2026-01-08T19:16:51.9454460Z [572/633] Creating archive /__w/TheRock/TheRock/build/artifacts/rccl_dbg_gfx94X-dcgpu.tar.xz
2026-01-08T19:16:51.9472601Z [573/633] Creating archive /__w/TheRock/TheRock/build/artifacts/rccl_doc_gfx94X-dcgpu.tar.xz
2026-01-08T19:16:51.9511184Z [574/633] Creating archive /__w/TheRock/TheRock/build/artifacts/rccl_dev_gfx94X-dcgpu.tar.xz
2026-01-08T19:16:51.9640894Z [575/633] Creating archive /__w/TheRock/TheRock/build/artifacts/rccl_run_gfx94X-dcgpu.tar.xz
2026-01-08T19:16:52.3899141Z [576/633] Creating archive /__w/TheRock/TheRock/build/artifacts/rccl_test_gfx94X-dcgpu.tar.xz
2026-01-08T19:16:54.0375044Z [577/633] Creating archive /__w/TheRock/TheRock/build/artifacts/rccl_lib_gfx94X-dcgpu.tar.xz

while the ASAN build does not build RCCL

@geomin12
Copy link
Copy Markdown
Contributor Author

geomin12 commented Jan 9, 2026

did you check if other third-party deps also need -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}?

the other third-party dependencies don't fail at all! so I assume it is not needed

@geomin12 geomin12 merged commit f7add77 into main Jan 9, 2026
69 checks passed
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Jan 9, 2026
@geomin12 geomin12 deleted the users/geomin12/try-fft-type branch January 9, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Enable ASAN builds for TheRock

3 participants