Skip to content

Panic (stack overflow) when encoding a certain string #245

@Crazytieguy

Description

@Crazytieguy

Hi, I'm getting a panic when trying to encode the attached file with the gpt-4 tokenizer. This is from the AMPS dataset that was published along with the MATH dataset. Backtrace:

called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)
stack backtrace:
   0: rust_begin_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
   1: core::panicking::panic_fmt
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/result.rs:1652:5
   3: _tiktoken::CoreBPE::_encode_native
   4: _tiktoken::_::<impl _tiktoken::CoreBPE>::__pymethod_encode__
   5: pyo3::impl_::trampoline::fastcall_with_keywords
   6: _PyEval_EvalFrameDefault
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5258:29
   7: _PyEval_EvalFrame
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
   8: _PyEval_Vector
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
   9: _PyFunction_Vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
  10: _PyObject_VectorcallTstate
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_call.h:92:11
  11: method_vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/classobject.c:89:18
  12: do_call_core
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:7357:12
  13: _PyEval_EvalFrameDefault
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5381:22
  14: _PyEval_EvalFrame
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
  15: _PyEval_Vector
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
  16: _PyFunction_Vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
  17: do_call_core
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:7357:12
  18: _PyEval_EvalFrameDefault
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:5381:22
  19: _PyEval_EvalFrame
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_ceval.h:73:16
  20: _PyEval_Vector
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/ceval.c:6439:24
  21: _PyFunction_Vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/call.c:393:16
  22: _PyObject_VectorcallTstate
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Include/internal/pycore_call.h:92:11
  23: method_vectorcall
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Objects/classobject.c:67:20
  24: thread_run
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/./Modules/_threadmodule.c:1092
  25: pythread_wrapper
             at /tmp/python-build.20230808162458.7883/Python-3.11.4/Python/thread_pthread.h:241:5
  26: <unknown>
  27: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions