Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 29 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,35 @@

### Fixes and improvements

## [v4.6.3](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-01-XX)
## [v4.7.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.7.0) (2026-02-03)

### New features

* Introduce AMD GPU support with ROCm HIP (#1989) [@sssshhhhhh](https://github.com/sssshhhhhh)
* Compatibility with Transformers v5 (#1999) by [@jordimas](https://github.com/jordimas)

## Fixes and improvements

* Assume less about whisper vocab (#2000) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Use LLVM ThreadSanitizer instead of Google (#1993) by [@3manifold](https://github.com/3manifold)
* Optimize all builds with parallel execution (#1992) by [@3manifold](https://github.com/3manifold)
* Remove unecessary zero init from conv1d (#1990) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Integrate Clang AddressSanitizer in tests (#1903) by [@3manifold](https://github.com/3manifold)
* Enable multiple of 16 padding for INT8 Tensor Cores (#1982) by [@Purfview](https://github.com/Purfview)
* Add activation and dilation to conv1d (#1979) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Minor refactor to CMakeLists.txt (#1980) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Remove unnecessary check from wav2vec2 (#1977) by [@plan9better](https://github.com/plan9better)
* Add optional residual add to gemm op (#1975) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Implement cuda layernorm axis (#1971) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Fix Eole conversion (#1998) by [@vince62s](https://github.com/vince62s)
* Gemma 3 conversion improvements (#1991) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Add causal flag to fa2 (#1976) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Fixes cross attention tests and refactors code (#1974) by [@jordimas](https://github.com/jordimas)
* Fix CUDA bf16 median filter (#1972) by [@sssshhhhhh](https://github.com/sssshhhhhh)
* Fix various compiler warnings (#1970) by [@sssshhhhhh](https://github.com/sssshhhhhh)


## [v4.6.3](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.3) (2026-01-06)

### New features

Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ option(WITH_OPENBLAS "Compile with OpenBLAS backend" OFF)
option(WITH_RUY "Compile with Ruy backend" OFF)
option(WITH_CUDA "Compile with CUDA backend" OFF)
option(WITH_CUDNN "Compile with cuDNN backend" OFF)
option(WITH_HIP "Compile with HIP backend" OFF)
option(WITH_HIP "Compile with AMD HIP GPU backend" OFF)
option(CUDA_DYNAMIC_LOADING "Dynamically load CUDA libraries at runtime" OFF)
option(ENABLE_CPU_DISPATCH "Compile CPU kernels for multiple ISA and dispatch at runtime" ON)
option(ENABLE_PROFILING "Compile with profiling support" OFF)
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ generator.generate_batch(start_tokens)

See the [documentation](https://opennmt.net/CTranslate2) for more information and examples.

If you have an AMD ROCm GPU, we provide specific Python wheels on the [releases page](https://github.com/OpenNMT/CTranslate2/releases/).

## Benchmarks

We translate the En->De test set *newstest2014* with multiple models:
Expand Down
2 changes: 1 addition & 1 deletion docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ CTranslate2 addresses these issues in several ways:

Here are some scenarios where this project could be used:

* You want to accelarate Transformer models for production usage, especially on CPUs.
* You want to accelerate Transformer models for production usage, especially on CPUs.
* You need to embed models in an existing C++ application without adding large dependencies.
* Your application requires custom threading and memory usage control.
* You want to reduce the model size on disk and/or memory.
Expand Down
6 changes: 4 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ pip install ctranslate2
The Python wheels have the following requirements:

* OS: Linux (x86-64, AArch64), macOS (x86-64, ARM64), Windows (x86-64)
* Python version: >= 3.7
* Python version: >= 3.9
* pip version: >= 19.3 to support `manylinux2014` wheels

```{admonition} GPU support
Expand All @@ -29,7 +29,7 @@ On Windows [the Visual C++ runtime](https://www.microsoft.com/en-US/download/det
Docker images can be downloaded from the [GitHub Container registry](https://github.com/OpenNMT/CTranslate2/pkgs/container/ctranslate2):

```bash
docker pull ghcr.io/opennmt/ctranslate2:latest-ubuntu22.04-cuda11.2
docker pull ghcr.io/opennmt/ctranslate2:latest-ubuntu22.04-cuda12.8
```

The images include:
Expand Down Expand Up @@ -114,6 +114,7 @@ The following options can be set with `-DOPTION=VALUE` during the CMake configur
| WITH_ACCELERATE | **OFF**, ON | Compiles with the Apple Accelerate backend |
| WITH_OPENBLAS | **OFF**, ON | Compiles with the OpenBLAS backend |
| WITH_RUY | **OFF**, ON | Compiles with the Ruy backend |
| WITH_HIP | **OFF**, ON | Compiles with the AMD HIP GPU backend |

Some build options require additional dependencies. See their respective documentation for installation instructions.

Expand All @@ -123,6 +124,7 @@ Some build options require additional dependencies. See their respective documen
* `-DWITH_DNNL=ON` requires [oneDNN](https://github.com/oneapi-src/oneDNN) >= 3.0
* `-DWITH_ACCELERATE=ON` requires [Accelerate](https://developer.apple.com/documentation/accelerate)
* `-DWITH_OPENBLAS=ON` requires [OpenBLAS](https://github.com/xianyi/OpenBLAS)
* `-DWITH_HIP=ON` requires [ROCm libraries](https://rocm.docs.amd.com/en/latest/reference/api-libraries.html)

Multiple backends can be enabled for a single build, for example:

Expand Down
2 changes: 1 addition & 1 deletion docs/translation.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ It is a text file where each line has the following format:
src_1 src_2 ... src_N<TAB>tgt_1 tgt_2 ... tgt_K
```

If the source N-gram is empty (N = 0), the assiocated target tokens will always be included in the reduced vocabulary.
If the source N-gram is empty (N = 0), the associated target tokens will always be included in the reduced vocabulary.

```{hint}
See [here](https://github.com/OpenNMT/papers/tree/master/WNMT2018/vmap) for an example on how to generate this file.
Expand Down
2 changes: 1 addition & 1 deletion python/ctranslate2/version.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Version information."""

__version__ = "4.6.3"
__version__ = "4.7.0"