Dear Author,
I encountered the below error when I tried to run the script file token_grammar_recognize.py
Traceback (most recent call last):
File "d:\transformers-CFG\transformers_cfg\token_grammar_recognizer.py", line 288, in <module>
input_text = file.read()
File "C:\Users\dhana\miniconda3\envs\decoding\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 163: character maps to <undefined>
This is the main file:
if __name__ == "__main__":
from transformers import AutoTokenizer
with open("D:/transformers-CFG/examples/grammars/japanese.ebnf", "r") as file:
input_text = file.read()
parsed_grammar = parse_ebnf(input_text)
parsed_grammar.print()
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenRecognizer = IncrementalTokenRecognizer(
grammar_str=input_text,
start_rule_name="root",
tokenizer=tokenizer,
unicode=True,
)
japanese = "トリーム" # "こんにちは"
token_ids = tokenizer.encode(japanese)
# 13298, 12675, 12045, 254
init_state = None
state = tokenRecognizer._consume_token_ids(token_ids, init_state, as_string=False)
if state.stacks:
print("The Japanese input is accepted")
else:
print("The Japanese input is not accepted")
Please could you help me regarding this issue.
Dear Author,
I encountered the below error when I tried to run the script file token_grammar_recognize.py
This is the main file:
Please could you help me regarding this issue.