Skip to content

Spacy's nlp_maxlength #6

@D0cRandom

Description

@D0cRandom

With the CrazyTokenizer (excellent results, btw, thanks!) I am running into an issue with a maximum character length for SpaCY: "[E088] Text of length 3029371 exceeds maximum of 1000000."
You can change nlp.max_length , but for that you have to load spacy itself.
Is there a way that nlp.max_length can be set. when loading the CrazyTokenizer?
(I know I could simply cut the file in 3, but I'd rather avoid that as I'd have to manually stitch the resulting token set back together again and I'll have to do this for various files).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions