Skip to content

Conversation

@benITo47
Copy link
Contributor

@benITo47 benITo47 commented Feb 3, 2026

Description

This PR migrates us from tokenisers-cpp to PyTorch tokenisers bundled with executorch

Introduces a breaking change?

  • Yes
  • No - User faces no changes

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

This changes need to be tested manually. Try running all our apps that consume tokenizers and see whether the output is ok.
By "OK" I mean

  • no unwanted special tokens
  • response that is related to the question (if model start answering about aquariums after being asked about ketchup, then it's not ok)
  • Output has proper punctuation, no missing or added spaces.

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

@benITo47 benITo47 requested review from chmjkb and msluszniak and removed request for chmjkb February 3, 2026 22:04
@benITo47 benITo47 force-pushed the @bo/change_tokenizers branch from e896b8e to cc00c58 Compare February 3, 2026 22:07
@benITo47 benITo47 changed the title Migrate to PyTorch tokenisers Migrate to PyTorch tokenizers Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants