-
Notifications
You must be signed in to change notification settings - Fork 438
Introduce AMD GPU support with ROCm HIP #1989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The distribution part is tricky for rocm. My recommendation from minimum to best:
I will start with 1 and 2. |
|
Added docker and windows whls (artifacts). I give up fixing linux whl, it's dependency hell between cibw/rocm. Edited see new instructions below |
|
I lied, I think I figured out linux whls, will add soon |
|
Ok actually done now. Currently building for rocm 7.2 Linux: Windows: |
|
Hello @sssshhhhhh! For the next release I plan to publish the roc wheel as part of the release: https://github.com/jordimas/CTranslate2/releases And also the docker images: https://github.com/jordimas/CTranslate2/pkgs/container/ctranslate2 These will be published in https://github.com/OpenNMT/CTranslate2, these tests are in my fork just to test the process. Let me know if these wheels and Docker work |
|
All works but linux needs some env variables set, new commits should remove that |
|
Thanks, @sssshhhhhh This is an outstanding work! Thanks |
Closes #1072
Thanks to the work of everyone at arlo-phoenix/CTranslate2-rocm and the linked issue.
Windows can be compiled with this script: https://github.com/sssshhhhhh/CTranslate2/blob/745f0b46aea94acef514185ed5facbb3fecd6dcd/python/tools/prepare_build_environment_windows.ps1
Linux can follow instructions at: https://github.com/arlo-phoenix/CTranslate2-rocm/blob/rocm/README_ROCM.md
Currently targeting rocm 7.1.1. Passes all tests and successfully outputs for whisper and gemma3. For now, just enough changes to build for amd, specific optimisations like flash attention for the future.
Some questions:
Should having prebuilt whls be a goal or would letting people build themselves be fine?
How should packaging be handled? My windows whls currently need the separate install of rocm_sdk_libraries_custom and include amdhip64_7.dll/amd_comgr0701.dll. Whls are 58MB each, removing the 2 dlls drops it to 12MB.
What should be targeted? Currently I'm doing rocm 7 supported rdna, cdna should work but wave size isn't optimal (nvidia/rdna uses 32). Also unsure about rdna2, this pr should work but its support seems bad and I don't have any to test.