From 4c5b256e536c3494a94109b53de81982a8d5df70 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Sun, 1 Feb 2026 22:13:22 +0100 Subject: [PATCH 01/12] Initial version --- CHANGELOG.md | 29 ++++++++++++++++++++++++++++- python/ctranslate2/version.py | 2 +- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 741ee37f1..70cb99793 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,34 @@ ### Fixes and improvements -## [v4.6.3](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-01-XX) +## [v4.7.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-02-02) + +### New features + +* Compatibility with Transformers v5 (#1999) by [@JordiMas](https://github.com/JordiMas) +* Add causal flag to fa2 (#1976) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) + +## Fixes and improvements + +* Assume less about whisper vocab (#2000) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Use LLVM ThreadSanitizer instead of Google (#1993) by [@22544721+3manifold](https://github.com/22544721+3manifold) +* Optimize all builds with parallel execution (#1992) by [@22544721+3manifold](https://github.com/22544721+3manifold) +* Remove unecessary zero init from conv1d (#1990) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Integrate Clang AddressSanitizer in tests (#1903) by [@22544721+3manifold](https://github.com/22544721+3manifold) +* Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@69023953+Purfview](https://github.com/69023953+Purfview) +* Add activation and dilation to conv1d (#1979) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Minor refactor to CMakeLists.txt (#1980) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Remove unnecessary check from wav2vec2 (#1977) by [@plan9better](https://github.com/plan9better) +* Add optional residual add to gemm op (#1975) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Implement cuda layernorm axis (#1971) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* fix conversion (#1998) by [@VincentNguyen](https://github.com/VincentNguyen) +* Gemma 3 conversion improvements (#1991) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Fixes cross attention tests and refactors code (#1974) by [@JordiMas](https://github.com/JordiMas) +* Fix CUDA bf16 median filter (#1972) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Fix various compiler warnings (#1970) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) + + +## [v4.6.3](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-01-06) ### New features diff --git a/python/ctranslate2/version.py b/python/ctranslate2/version.py index 07f3bbd42..6eb24ab72 100644 --- a/python/ctranslate2/version.py +++ b/python/ctranslate2/version.py @@ -1,3 +1,3 @@ """Version information.""" -__version__ = "4.6.3" +__version__ = "4.7.0" From 6e393decf785d70c9fe50f69bb2e7d7abc068281 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Sun, 1 Feb 2026 22:16:19 +0100 Subject: [PATCH 02/12] Fixes names --- CHANGELOG.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 70cb99793..1142ab9db 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,27 +8,27 @@ ### New features -* Compatibility with Transformers v5 (#1999) by [@JordiMas](https://github.com/JordiMas) -* Add causal flag to fa2 (#1976) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Compatibility with Transformers v5 (#1999) by [@jordimas](https://github.com/jordimas) +* Add causal flag to fa2 (#1976) by [@sssshhhhhh](https://github.com/sssshhhhhh) ## Fixes and improvements -* Assume less about whisper vocab (#2000) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) -* Use LLVM ThreadSanitizer instead of Google (#1993) by [@22544721+3manifold](https://github.com/22544721+3manifold) -* Optimize all builds with parallel execution (#1992) by [@22544721+3manifold](https://github.com/22544721+3manifold) -* Remove unecessary zero init from conv1d (#1990) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) -* Integrate Clang AddressSanitizer in tests (#1903) by [@22544721+3manifold](https://github.com/22544721+3manifold) -* Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@69023953+Purfview](https://github.com/69023953+Purfview) -* Add activation and dilation to conv1d (#1979) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) -* Minor refactor to CMakeLists.txt (#1980) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Assume less about whisper vocab (#2000) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Use LLVM ThreadSanitizer instead of Google (#1993) by [@3manifold](https://github.com/22544721+3manifold) +* Optimize all builds with parallel execution (#1992) by [@3manifold](https://github.com/22544721+3manifold) +* Remove unecessary zero init from conv1d (#1990) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Integrate Clang AddressSanitizer in tests (#1903) by [@3manifold](https://github.com/22544721+3manifold) +* Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@6Purfview](https://github.com/Purfview) +* Add activation and dilation to conv1d (#1979) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Minor refactor to CMakeLists.txt (#1980) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Remove unnecessary check from wav2vec2 (#1977) by [@plan9better](https://github.com/plan9better) -* Add optional residual add to gemm op (#1975) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) -* Implement cuda layernorm axis (#1971) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Add optional residual add to gemm op (#1975) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Implement cuda layernorm axis (#1971) by [@sssshhhhhh](https://github.com/sssshhhhhh) * fix conversion (#1998) by [@VincentNguyen](https://github.com/VincentNguyen) -* Gemma 3 conversion improvements (#1991) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) -* Fixes cross attention tests and refactors code (#1974) by [@JordiMas](https://github.com/JordiMas) -* Fix CUDA bf16 median filter (#1972) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) -* Fix various compiler warnings (#1970) by [@193317444+sssshhhhhh](https://github.com/193317444+sssshhhhhh) +* Gemma 3 conversion improvements (#1991) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Fixes cross attention tests and refactors code (#1974) by [@jordimas](https://github.com/jordimas) +* Fix CUDA bf16 median filter (#1972) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Fix various compiler warnings (#1970) by [@sssshhhhhh](https://github.com/sssshhhhhh) ## [v4.6.3](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-01-06) From cc475a563c0b15617a66cd09ad2fcc2728976aa1 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Sun, 1 Feb 2026 22:18:16 +0100 Subject: [PATCH 03/12] Fix more names --- CHANGELOG.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1142ab9db..cd840cb68 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,23 +9,23 @@ ### New features * Compatibility with Transformers v5 (#1999) by [@jordimas](https://github.com/jordimas) -* Add causal flag to fa2 (#1976) by [@sssshhhhhh](https://github.com/sssshhhhhh) ## Fixes and improvements * Assume less about whisper vocab (#2000) by [@sssshhhhhh](https://github.com/sssshhhhhh) -* Use LLVM ThreadSanitizer instead of Google (#1993) by [@3manifold](https://github.com/22544721+3manifold) -* Optimize all builds with parallel execution (#1992) by [@3manifold](https://github.com/22544721+3manifold) +* Use LLVM ThreadSanitizer instead of Google (#1993) by [@3manifold](https://github.com/3manifold) +* Optimize all builds with parallel execution (#1992) by [@3manifold](https://github.com/3manifold) * Remove unecessary zero init from conv1d (#1990) by [@sssshhhhhh](https://github.com/sssshhhhhh) -* Integrate Clang AddressSanitizer in tests (#1903) by [@3manifold](https://github.com/22544721+3manifold) +* Integrate Clang AddressSanitizer in tests (#1903) by [@3manifold](https://github.com/3manifold) * Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@6Purfview](https://github.com/Purfview) * Add activation and dilation to conv1d (#1979) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Minor refactor to CMakeLists.txt (#1980) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Remove unnecessary check from wav2vec2 (#1977) by [@plan9better](https://github.com/plan9better) * Add optional residual add to gemm op (#1975) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Implement cuda layernorm axis (#1971) by [@sssshhhhhh](https://github.com/sssshhhhhh) -* fix conversion (#1998) by [@VincentNguyen](https://github.com/VincentNguyen) +* fix conversion (#1998) by [@vince62s](https://github.com/vince62s) * Gemma 3 conversion improvements (#1991) by [@sssshhhhhh](https://github.com/sssshhhhhh) +* Add causal flag to fa2 (#1976) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Fixes cross attention tests and refactors code (#1974) by [@jordimas](https://github.com/jordimas) * Fix CUDA bf16 median filter (#1972) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Fix various compiler warnings (#1970) by [@sssshhhhhh](https://github.com/sssshhhhhh) From 26d4f285d36757df1bb139c0e50e40eb41053062 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Mon, 2 Feb 2026 17:40:17 +0100 Subject: [PATCH 04/12] Notes --- CHANGELOG.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index cd840cb68..ab4abbb4d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,13 +8,14 @@ ### New features +* Introduce AMD GPU support with ROCm HIP (#1989) [@sssshhhhhh](https://github.com/sssshhhhhh) * Compatibility with Transformers v5 (#1999) by [@jordimas](https://github.com/jordimas) ## Fixes and improvements * Assume less about whisper vocab (#2000) by [@sssshhhhhh](https://github.com/sssshhhhhh) -* Use LLVM ThreadSanitizer instead of Google (#1993) by [@3manifold](https://github.com/3manifold) -* Optimize all builds with parallel execution (#1992) by [@3manifold](https://github.com/3manifold) +* Use LLVM ThreadSanitizer instead of Google (#1993) by [@3manifold](https://github.com/3manifold) +* Optimize all builds with parallel execution (#1992) by [@3manifold](https://github.com/3manifold) * Remove unecessary zero init from conv1d (#1990) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Integrate Clang AddressSanitizer in tests (#1903) by [@3manifold](https://github.com/3manifold) * Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@6Purfview](https://github.com/Purfview) From 2174c6b70e192d36f57f19c3fcd659ba97031d71 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Mon, 2 Feb 2026 17:43:19 +0100 Subject: [PATCH 05/12] Fix --- CHANGELOG.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ab4abbb4d..b0b0031ad 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,13 +18,13 @@ * Optimize all builds with parallel execution (#1992) by [@3manifold](https://github.com/3manifold) * Remove unecessary zero init from conv1d (#1990) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Integrate Clang AddressSanitizer in tests (#1903) by [@3manifold](https://github.com/3manifold) -* Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@6Purfview](https://github.com/Purfview) +* Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@Purfview](https://github.com/Purfview) * Add activation and dilation to conv1d (#1979) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Minor refactor to CMakeLists.txt (#1980) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Remove unnecessary check from wav2vec2 (#1977) by [@plan9better](https://github.com/plan9better) * Add optional residual add to gemm op (#1975) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Implement cuda layernorm axis (#1971) by [@sssshhhhhh](https://github.com/sssshhhhhh) -* fix conversion (#1998) by [@vince62s](https://github.com/vince62s) +* Fix Eole conversion (#1998) by [@vince62s](https://github.com/vince62s) * Gemma 3 conversion improvements (#1991) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Add causal flag to fa2 (#1976) by [@sssshhhhhh](https://github.com/sssshhhhhh) * Fixes cross attention tests and refactors code (#1974) by [@jordimas](https://github.com/jordimas) From b3adf191a42cc436a603773e9ed7f3afcec31645 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Mon, 2 Feb 2026 20:09:44 +0100 Subject: [PATCH 06/12] version --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b0b0031ad..5dbed8e49 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,7 @@ ### Fixes and improvements -## [v4.7.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-02-02) +## [v4.7.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.7.0) (2026-02-02) ### New features From 241c72fad2699287b80db03ad3e4c1d0d11480dd Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Mon, 2 Feb 2026 20:15:39 +0100 Subject: [PATCH 07/12] ROC releases --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 215249c86..a00c2c3ae 100644 --- a/README.md +++ b/README.md @@ -58,6 +58,8 @@ generator.generate_batch(start_tokens) See the [documentation](https://opennmt.net/CTranslate2) for more information and examples. +If you have an AMD ROCm GPU, we provide specific Python wheels and builds on the [releases page](https://github.com/OpenNMT/CTranslate2/releases/). + ## Benchmarks We translate the En->De test set *newstest2014* with multiple models: From c0b630af267ab14d6dc04cd6329128f0917ec5c5 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Tue, 3 Feb 2026 01:39:25 +0100 Subject: [PATCH 08/12] Update date --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5dbed8e49..44940b6d3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,7 @@ ### Fixes and improvements -## [v4.7.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.7.0) (2026-02-02) +## [v4.7.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.7.0) (2026-02-03) ### New features From b5a43ce66d753f19972292a9bbf7c2e3070719e3 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Tue, 3 Feb 2026 01:44:16 +0100 Subject: [PATCH 09/12] typos --- docs/faq.md | 2 +- docs/translation.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/faq.md b/docs/faq.md index 35f7ad3da..bcf7a8ef7 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -16,7 +16,7 @@ CTranslate2 addresses these issues in several ways: Here are some scenarios where this project could be used: -* You want to accelarate Transformer models for production usage, especially on CPUs. +* You want to accelerate Transformer models for production usage, especially on CPUs. * You need to embed models in an existing C++ application without adding large dependencies. * Your application requires custom threading and memory usage control. * You want to reduce the model size on disk and/or memory. diff --git a/docs/translation.md b/docs/translation.md index 9ff8e8d1d..617805202 100644 --- a/docs/translation.md +++ b/docs/translation.md @@ -81,7 +81,7 @@ It is a text file where each line has the following format: src_1 src_2 ... src_Ntgt_1 tgt_2 ... tgt_K ``` -If the source N-gram is empty (N = 0), the assiocated target tokens will always be included in the reduced vocabulary. +If the source N-gram is empty (N = 0), the associated target tokens will always be included in the reduced vocabulary. ```{hint} See [here](https://github.com/OpenNMT/papers/tree/master/WNMT2018/vmap) for an example on how to generate this file. From d1ec84a4a9a783c3d353b12def683fa816f61302 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Tue, 3 Feb 2026 01:53:27 +0100 Subject: [PATCH 10/12] docs --- CMakeLists.txt | 2 +- docs/installation.md | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index 82ddf0a55..cf80e37b5 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -14,7 +14,7 @@ option(WITH_OPENBLAS "Compile with OpenBLAS backend" OFF) option(WITH_RUY "Compile with Ruy backend" OFF) option(WITH_CUDA "Compile with CUDA backend" OFF) option(WITH_CUDNN "Compile with cuDNN backend" OFF) -option(WITH_HIP "Compile with HIP backend" OFF) +option(WITH_HIP "Compile with AMD HIP GPU backend" OFF) option(CUDA_DYNAMIC_LOADING "Dynamically load CUDA libraries at runtime" OFF) option(ENABLE_CPU_DISPATCH "Compile CPU kernels for multiple ISA and dispatch at runtime" ON) option(ENABLE_PROFILING "Compile with profiling support" OFF) diff --git a/docs/installation.md b/docs/installation.md index 2efa0fc39..6d59fdb93 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -11,7 +11,7 @@ pip install ctranslate2 The Python wheels have the following requirements: * OS: Linux (x86-64, AArch64), macOS (x86-64, ARM64), Windows (x86-64) -* Python version: >= 3.7 +* Python version: >= 3.9 * pip version: >= 19.3 to support `manylinux2014` wheels ```{admonition} GPU support @@ -114,6 +114,7 @@ The following options can be set with `-DOPTION=VALUE` during the CMake configur | WITH_ACCELERATE | **OFF**, ON | Compiles with the Apple Accelerate backend | | WITH_OPENBLAS | **OFF**, ON | Compiles with the OpenBLAS backend | | WITH_RUY | **OFF**, ON | Compiles with the Ruy backend | +| WITH_HIP | **OFF**, ON | Compiles with the AMD HIP GPU backend | Some build options require additional dependencies. See their respective documentation for installation instructions. @@ -123,6 +124,7 @@ Some build options require additional dependencies. See their respective documen * `-DWITH_DNNL=ON` requires [oneDNN](https://github.com/oneapi-src/oneDNN) >= 3.0 * `-DWITH_ACCELERATE=ON` requires [Accelerate](https://developer.apple.com/documentation/accelerate) * `-DWITH_OPENBLAS=ON` requires [OpenBLAS](https://github.com/xianyi/OpenBLAS) +* `-DWITH_HIP=ON` requires [ROCm libraries](https://rocm.docs.amd.com/en/latest/reference/api-libraries.html) Multiple backends can be enabled for a single build, for example: From 3ca8287f65dc89304ca0aad04d285db1c4ec4de2 Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Tue, 3 Feb 2026 01:58:44 +0100 Subject: [PATCH 11/12] Update link --- docs/installation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/installation.md b/docs/installation.md index 6d59fdb93..0fc4263b7 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -29,7 +29,7 @@ On Windows [the Visual C++ runtime](https://www.microsoft.com/en-US/download/det Docker images can be downloaded from the [GitHub Container registry](https://github.com/OpenNMT/CTranslate2/pkgs/container/ctranslate2): ```bash -docker pull ghcr.io/opennmt/ctranslate2:latest-ubuntu22.04-cuda11.2 +docker pull ghcr.io/opennmt/ctranslate2:latest-ubuntu22.04-cuda12.8 ``` The images include: From ba446ce4281ae1d299f2dc7da940c34afebf3c7f Mon Sep 17 00:00:00 2001 From: Jordi Mas Date: Tue, 3 Feb 2026 02:01:34 +0100 Subject: [PATCH 12/12] Docs --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a00c2c3ae..2deb30255 100644 --- a/README.md +++ b/README.md @@ -58,7 +58,7 @@ generator.generate_batch(start_tokens) See the [documentation](https://opennmt.net/CTranslate2) for more information and examples. -If you have an AMD ROCm GPU, we provide specific Python wheels and builds on the [releases page](https://github.com/OpenNMT/CTranslate2/releases/). +If you have an AMD ROCm GPU, we provide specific Python wheels on the [releases page](https://github.com/OpenNMT/CTranslate2/releases/). ## Benchmarks