diff --git a/CHANGELOG.md b/CHANGELOG.md index 741ee37f1..44940b6d3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,35 @@ ### Fixes and improvements -## [v4.6.3](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-01-XX) +## [v4.7.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.7.0) (2026-02-03) + +### New features + +* Introduce AMD GPU support with ROCm HIP (#1989) [@sssshhhhhh](https://github.com/sssshhhhhh) +* Compatibility with Transformers v5 (#1999) by [@jordimas](https://github.com/jordimas) + +## Fixes and improvements + +* Assume less about whisper vocab (#2000) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Use LLVM ThreadSanitizer instead of Google (#1993) by [@3manifold](https://github.com/3manifold) +* Optimize all builds with parallel execution (#1992) by [@3manifold](https://github.com/3manifold) +* Remove unecessary zero init from conv1d (#1990) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Integrate Clang AddressSanitizer in tests (#1903) by [@3manifold](https://github.com/3manifold) +* Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@Purfview](https://github.com/Purfview) +* Add activation and dilation to conv1d (#1979) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Minor refactor to CMakeLists.txt (#1980) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Remove unnecessary check from wav2vec2 (#1977) by [@plan9better](https://github.com/plan9better) +* Add optional residual add to gemm op (#1975) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Implement cuda layernorm axis (#1971) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Fix Eole conversion (#1998) by [@vince62s](https://github.com/vince62s) +* Gemma 3 conversion improvements (#1991) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Add causal flag to fa2 (#1976) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Fixes cross attention tests and refactors code (#1974) by [@jordimas](https://github.com/jordimas) +* Fix CUDA bf16 median filter (#1972) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Fix various compiler warnings (#1970) by [@sssshhhhhh](https://github.com/sssshhhhhh) + + +## [v4.6.3](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-01-06) ### New features diff --git a/CMakeLists.txt b/CMakeLists.txt index 82ddf0a55..cf80e37b5 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -14,7 +14,7 @@ option(WITH_OPENBLAS "Compile with OpenBLAS backend" OFF) option(WITH_RUY "Compile with Ruy backend" OFF) option(WITH_CUDA "Compile with CUDA backend" OFF) option(WITH_CUDNN "Compile with cuDNN backend" OFF) -option(WITH_HIP "Compile with HIP backend" OFF) +option(WITH_HIP "Compile with AMD HIP GPU backend" OFF) option(CUDA_DYNAMIC_LOADING "Dynamically load CUDA libraries at runtime" OFF) option(ENABLE_CPU_DISPATCH "Compile CPU kernels for multiple ISA and dispatch at runtime" ON) option(ENABLE_PROFILING "Compile with profiling support" OFF) diff --git a/README.md b/README.md index 215249c86..2deb30255 100644 --- a/README.md +++ b/README.md @@ -58,6 +58,8 @@ generator.generate_batch(start_tokens) See the [documentation](https://opennmt.net/CTranslate2) for more information and examples. +If you have an AMD ROCm GPU, we provide specific Python wheels on the [releases page](https://github.com/OpenNMT/CTranslate2/releases/). + ## Benchmarks We translate the En->De test set *newstest2014* with multiple models: diff --git a/docs/faq.md b/docs/faq.md index 35f7ad3da..bcf7a8ef7 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -16,7 +16,7 @@ CTranslate2 addresses these issues in several ways: Here are some scenarios where this project could be used: -* You want to accelarate Transformer models for production usage, especially on CPUs. +* You want to accelerate Transformer models for production usage, especially on CPUs. * You need to embed models in an existing C++ application without adding large dependencies. * Your application requires custom threading and memory usage control. * You want to reduce the model size on disk and/or memory. diff --git a/docs/installation.md b/docs/installation.md index 2efa0fc39..0fc4263b7 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -11,7 +11,7 @@ pip install ctranslate2 The Python wheels have the following requirements: * OS: Linux (x86-64, AArch64), macOS (x86-64, ARM64), Windows (x86-64) -* Python version: >= 3.7 +* Python version: >= 3.9 * pip version: >= 19.3 to support `manylinux2014` wheels ```{admonition} GPU support @@ -29,7 +29,7 @@ On Windows [the Visual C++ runtime](https://www.microsoft.com/en-US/download/det Docker images can be downloaded from the [GitHub Container registry](https://github.com/OpenNMT/CTranslate2/pkgs/container/ctranslate2): ```bash -docker pull ghcr.io/opennmt/ctranslate2:latest-ubuntu22.04-cuda11.2 +docker pull ghcr.io/opennmt/ctranslate2:latest-ubuntu22.04-cuda12.8 ``` The images include: @@ -114,6 +114,7 @@ The following options can be set with `-DOPTION=VALUE` during the CMake configur | WITH_ACCELERATE | **OFF**, ON | Compiles with the Apple Accelerate backend | | WITH_OPENBLAS | **OFF**, ON | Compiles with the OpenBLAS backend | | WITH_RUY | **OFF**, ON | Compiles with the Ruy backend | +| WITH_HIP | **OFF**, ON | Compiles with the AMD HIP GPU backend | Some build options require additional dependencies. See their respective documentation for installation instructions. @@ -123,6 +124,7 @@ Some build options require additional dependencies. See their respective documen * `-DWITH_DNNL=ON` requires [oneDNN](https://github.com/oneapi-src/oneDNN) >= 3.0 * `-DWITH_ACCELERATE=ON` requires [Accelerate](https://developer.apple.com/documentation/accelerate) * `-DWITH_OPENBLAS=ON` requires [OpenBLAS](https://github.com/xianyi/OpenBLAS) +* `-DWITH_HIP=ON` requires [ROCm libraries](https://rocm.docs.amd.com/en/latest/reference/api-libraries.html) Multiple backends can be enabled for a single build, for example: diff --git a/docs/translation.md b/docs/translation.md index 9ff8e8d1d..617805202 100644 --- a/docs/translation.md +++ b/docs/translation.md @@ -81,7 +81,7 @@ It is a text file where each line has the following format: src_1 src_2 ... src_Ntgt_1 tgt_2 ... tgt_K ``` -If the source N-gram is empty (N = 0), the assiocated target tokens will always be included in the reduced vocabulary. +If the source N-gram is empty (N = 0), the associated target tokens will always be included in the reduced vocabulary. ```{hint} See [here](https://github.com/OpenNMT/papers/tree/master/WNMT2018/vmap) for an example on how to generate this file. diff --git a/python/ctranslate2/version.py b/python/ctranslate2/version.py index 07f3bbd42..6eb24ab72 100644 --- a/python/ctranslate2/version.py +++ b/python/ctranslate2/version.py @@ -1,3 +1,3 @@ """Version information.""" -__version__ = "4.6.3" +__version__ = "4.7.0"