Skip to content

Commit 8dd9c6d

Browse files
committed
feat(tokenizer): fix decoding and normalization for BERT, BGE, GTE, and Albert
1 parent 76136b8 commit 8dd9c6d

4 files changed

Lines changed: 580 additions & 80 deletions

File tree

include/tokenizer.hpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,9 @@ class PreTrainedTokenizer {
5858
// --- Loading ---
5959
bool load_from_json_str(const std::string& json_content);
6060

61+
// --- Configuration ---
62+
void set_clean_up_tokenization_spaces(bool clean);
63+
6164
private:
6265
struct Impl; // Forward declaration
6366
std::unique_ptr<Impl> impl_;

0 commit comments

Comments
 (0)