Skip to content

Lemmatizing documents and keyphrases #9

@hboisgibault

Description

@hboisgibault

Using lemmatization can result in better quality keyphrases, since similar keyphrases we will be grouped together.
Adding lemmatization as an option could be a great feature.

If the option is activated, the 'lemmatizer' component will be added to the spacy pipeline, and the lemma of words will be used instead of raw text to build keyphrases.
There should also be a function to retrieve lemmatized documents. They will be built and stored during the pipeline process. This is necessary to calculate tf-idf.

I started a branch to build this feature : https://github.com/Logora/KeyphraseVectorizers/tree/use_lemmatizer

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions