Skip to content

[ROCm] Changes to enable build for ROCm platform#401

Draft
pruthvistony wants to merge 9 commits into
pytorch:mainfrom
ROCm:tp_rocm_build_1
Draft

[ROCm] Changes to enable build for ROCm platform#401
pruthvistony wants to merge 9 commits into
pytorch:mainfrom
ROCm:tp_rocm_build_1

Conversation

@pruthvistony
Copy link
Copy Markdown
Contributor

@pruthvistony pruthvistony commented Jul 1, 2021

  • Build is currently enabled for only hip_basic(cuda_basic)
  • Sample build command for reference
    cmake ../ -DCMAKE_C_FLAGS="-Werror -Wno-deprecated-declarations -D__HIP_PLATFORM_HCC__=1" -DCMAKE_CXX_FLAGS="-Werror -Wno-deprecated-declarations -D__HIP_PLATFORM_HCC__=1" -DTP_ENABLE_SHM=OFF -DTP_ENABLE_CMA=OFF -DTP_USE_ROCM=ON -DTP_ENABLE_HIP_XTH=OFF -DTP_ENABLE_HIP_IPC=OFF -DTP_ENABLE_HIP_GDR=OFF -DTP_ENABLE_IBV=OFF -DTP_BUILD_TESTING=ON

@pruthvistony
Copy link
Copy Markdown
Contributor Author

PR 398 needs to be merged before these changes.
Changes will be updated after rebase.

@pruthvistony
Copy link
Copy Markdown
Contributor Author

@lw @jeffdaily @jithunnair-amd
Please review these changes


args = parser.parse_args()

dict_file_name = args.dump_dict_directory + "/hipify_output_dict_dump.txt"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better if this script doesn't assume the name of the hipified dict file, instead the filename itself should be passed in as an argument, not the directory name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name for the output hipified dict file is parameterized and is now internal handled inside hipify-torch module.
repo

Comment thread tools/amd_build/build_amd.py Outdated
required=True)

parser.add_argument(
'--dump-dict-directory',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we foresee a use-case where this directory would contain multiple dict files at the same time? It doesn't look like it here, in which case, the dict file name should be passed in as an argument instead?

Copy link
Copy Markdown
Contributor Author

@pruthvistony pruthvistony Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont foresee a use case like above. Currently recommended way to trigger hipify is

  • Call hipify for the whole project files.
  • Call get_hipified_list() API to get updated hipified file list, if required.

All the functionality of hipify is handled within hipify-torch repo

Comment thread cmake/Hip.cmake
find_package(hip REQUIRED)

set(TP_HIP_INCLUDE ${ROCM_PATH}/include ${TP_HIP_INCLUDE})
set(TP_HIP_INCLUDE ${hip_INCLUDE_DIRS} $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}> $<INSTALL_INTERFACE:include> ${TP_HIP_INCLUDE})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the only reason for this entire file to exist is to be able to provide ${hip_INCLUDE_DIRS}? If so, why don't we just use ${HIP_PATH}/include? It'd reduce a lot of the seemingly-unrelated code here. @jeffdaily for comment

Comment thread cmake/Hipify.cmake Outdated
@@ -0,0 +1,62 @@
# Copyright (c) Facebook, Inc. and its affiliates.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions seem generic enough to be valuable as a part of hipify-torch itself, since any CMake-based hipify flow would likely need these functions. Can we move them to a CMake file in hipify-torch and include that CMake file here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is moved to hipify-torch repo

Comment thread tensorpipe/CMakeLists.txt Outdated
channel/cuda_xth/factory.cc)
list(APPEND TP_CUDA_PUBLIC_HDRS
channel/cuda_xth/factory.h)
tp_conditional_backend(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we use uppercase for this macro everywhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to uppercase.
@lw
Please let me know if it is breaking any convention followed in tensorpipe. I checked the pyTorch code for any hint, but there is no convention followed there.

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but @beauby is much more familiar with the TensorPipe code base and might offer better reviews.

One general question, how do we test this works for ROCm devices?

Comment thread tensorpipe/CMakeLists.txt
list(APPEND TP_CUDA_INCLUDE_DIRS ${CUDA_INCLUDE_DIRS})
elseif (TP_USE_ROCM)
set(TP_GPU_LIB_NAME "tensorpipe_hip")
# Finding of HIP package is already before hipifying the files
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, any reason not looking for packages here as existing code did for CUDA?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread tensorpipe/CMakeLists.txt
Comment on lines +246 to +247
list(APPEND TP_CUDA_LINK_LIBRARIES ${TP_HIP_HCC_LIBRARIES})
list(APPEND TP_CUDA_INCLUDE_DIRS ${TP_HIP_INCLUDE})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the naming, any reason they don't follow the CUDA ones, i.e., HIP_LIBRARIES and HIP_INCLUDE_DIRS?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check with the HIP team, if there is any particular reason for keeping this name, i.e., hip_INCLUDE_DIRS or different with CUDA, and get back if any reason.

list(APPEND TP_TEST_SRCS
channel/cuda_gdr/cuda_gdr_test.cc
)
if((TP_ENABLE_CUDA_GDR AND TP_USE_CUDA) OR (TP_ENABLE_HIP_GDR AND TP_USE_ROCM))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beauby do you know why we don't need this if-clause in the existing code?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we manually exclude Gdr tests from our CI, but it would not hurt to avoid building the tests if support for CudaGdr is not built.

@pruthvistony
Copy link
Copy Markdown
Contributor Author

This looks good to me, but @beauby is much more familiar with the TensorPipe code base and might offer better reviews.

One general question, how do we test this works for ROCm devices?

As soon as we are able to get the tensorpipe building and running for ROCm on AMD GPU, we will set up a CI job to do this. PyTorch CI for ROCm runs on a good pool of CI machines. Currently checking if it can be used for Tensorpipe.

Comment thread CMakeLists.txt
set(CMAKE_CXX_STANDARD 14)

list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")
list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake" "${PROJECT_SOURCE_DIR}/third_party/hipify/cmake")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beauby Added a comment. Should we see if we can guard it with TP_USE_ROCM as well?

Comment thread cmake/Options.cmake Outdated

# if both TP_USE_CUDA and TP_USE_ROCM is set then break
if(TP_USE_CUDA AND TP_USE_ROCM)
message(FATAL_ERROR "Tensorpipe can be built either for CUDA or ROCm, TP_USE_CUDA and TP_USE_ROCM both are set, erroring out!!!!")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested modification:

"TensorPipe does not support building for CUDA and ROCM at the same time. Please unset either TP_USE_CUDA or TP_USE_ROCM."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread tensorpipe/CMakeLists.txt
### cma

tp_conditional_backend(
TP_CONDITIONAL_BACKEND(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far we've been using this macro lower case, let's keep it consistent (i.e. tp_conditional_backend()).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beauby Sorry, that was I who suggested changing it to uppercase, to keep it consistent with the case in Options.cmake: https://github.com/pytorch/tensorpipe/blob/master/cmake/Options.cmake#L13
Would you still like us to change it back to lowercase?

Comment thread tensorpipe/CMakeLists.txt
### shm

tp_conditional_backend(
TP_CONDITIONAL_BACKEND(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment thread tensorpipe/CMakeLists.txt
### ibv

tp_conditional_backend(
TP_CONDITIONAL_BACKEND(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment thread tensorpipe/CMakeLists.txt
channel/cuda_xth/factory.cc)
list(APPEND TP_CUDA_PUBLIC_HDRS
channel/cuda_xth/factory.h)
TP_CONDITIONAL_BACKEND(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark regarding case.

Comment thread tensorpipe/CMakeLists.txt
### cuda_ipc

tp_conditional_backend(
TP_CONDITIONAL_BACKEND(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case.

Comment thread tensorpipe/CMakeLists.txt
TP_CONDITIONAL_BACKEND(
TP_ENABLE_CUDA_IPC "Enable CUDA inter-process communication channel" "TP_USE_CUDA")
if(TP_ENABLE_CUDA_IPC)
TP_CONDITIONAL_BACKEND(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case.

Comment thread tensorpipe/CMakeLists.txt
### cuda_gdr

tp_conditional_backend(
TP_CONDITIONAL_BACKEND(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case.

Comment thread tensorpipe/CMakeLists.txt
TP_CONDITIONAL_BACKEND(
TP_ENABLE_CUDA_GDR "Enable CUDA GpuDirect (InfiniBand) channel" "LINUX")
if(TP_ENABLE_CUDA_GDR)
TP_CONDITIONAL_BACKEND(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case.

@beauby
Copy link
Copy Markdown
Contributor

beauby commented Jul 26, 2021

Looks good apart from minor nits – are there some early results of it working on AMD GPUs?

@Madouura
Copy link
Copy Markdown

Is this and #398 still being worked on?

@pruthvistony
Copy link
Copy Markdown
Contributor Author

Is this and #398 still being worked on?

We have tensorpipe with cuda_basic(hip_basic) working on ROCm (AMD gpus). We are in process to upstream all the changes currently and this is the initial PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants