Skip to content

Support large input texts with more than 250 words #71

@kyteinsky

Description

@kyteinsky

How to use GitHub

  • Please use the 👍 reaction to show that you are interested into the same feature.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Feature request

Which Nextcloud Version are you currently using: v32.0.0

Is your feature request related to a problem? Please describe.
Large input texts get cut off at some point in the corresponing translation/output text. This seems to vary based on the target language chosen and the max_decoding_length param does not help here much even with high values.

Describe the solution you'd like
Chunking of the input text, maybe in around 100 words, to keep the translation input chunks small and digestable by the model.
Note: split and join of the texts will need some special care depending on the language of the input text, for different separators, RTL languages and no-space languages.

Describe alternatives you've considered
Split the input text by hand.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions