UnicodeDecodeError: 'charmap' codec can't decode byte 0x81

Dear Author,
   I encountered the below error when I tried to run the script file token_grammar_recognize.py
   
```
Traceback (most recent call last):
  File "d:\transformers-CFG\transformers_cfg\token_grammar_recognizer.py", line 288, in <module>
    input_text = file.read()
  File "C:\Users\dhana\miniconda3\envs\decoding\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 163: character maps to <undefined>
```
This is the main file:

```
if __name__ == "__main__":
    from transformers import AutoTokenizer

    with open("D:/transformers-CFG/examples/grammars/japanese.ebnf", "r") as file:
        input_text = file.read()
    parsed_grammar = parse_ebnf(input_text)
    parsed_grammar.print()

    tokenizer = AutoTokenizer.from_pretrained("gpt2")

    tokenRecognizer = IncrementalTokenRecognizer(
        grammar_str=input_text,
        start_rule_name="root",
        tokenizer=tokenizer,
        unicode=True,
    )



    japanese = "トリーム"  # "こんにちは"
    token_ids = tokenizer.encode(japanese)
    # 13298, 12675, 12045, 254
    init_state = None
    state = tokenRecognizer._consume_token_ids(token_ids, init_state, as_string=False)

    if state.stacks:
        print("The Japanese input is accepted")
    else:
        print("The Japanese input is not accepted")
```
Please could you help me regarding this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 #52

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 #52

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions