Support large input texts with more than 250 words

### How to use GitHub

* Please use the 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to show that you are interested into the same feature.
* Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
* Subscribe to receive notifications on status change and new comments.

---

## Feature request

**Which Nextcloud Version are you currently using:** v32.0.0

**Is your feature request related to a problem? Please describe.**
Large input texts get cut off at some point in the corresponing translation/output text. This seems to vary based on the target language chosen and the `max_decoding_length` param does not help here much even with high values.

**Describe the solution you'd like**
Chunking of the input text, maybe in around 100 words, to keep the translation input chunks small and digestable by the model.
Note: split and join of the texts will need some special care depending on the language of the input text, for different separators, RTL languages and no-space languages.

**Describe alternatives you've considered**
Split the input text by hand.

**Additional context**
- related: #68 
- this function can be used for this purpose. It uses the same `translate_batch` function under the hood that we use here now: https://opennmt.net/CTranslate2/python/ctranslate2.Translator.html#ctranslate2.Translator.translate_iterable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support large input texts with more than 250 words #71

How to use GitHub

Feature request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support large input texts with more than 250 words #71

Description

How to use GitHub

Feature request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions