Skip to content

Use list of POS patterns to reduce runtime #30

@saied71

Description

@saied71

Hi, Thanks for this great package.
right now I use KeyphraseCountVectorizer method to extract keywords based on different POS patterns.
Here is my code:

def kph_extr(docs:list, patt:str) -> list :
    vectorizer = KeyphraseCountVectorizer(custom_pos_tagger=custom_pos,stop_words=stop_words, pos_pattern=patt)
    vectorizer.fit(docs)
    return list(vectorizer.get_feature_names_out())

and here is my post patterns:

pos_patterns = ['<NOUN><NOUN><NOUN>', "<NOUN><NOUN>", "<NOUN><ADJ>", "<NOUN><ADJ><NOUN>", "<NOUN><NOUN><NOUN><NOUN>", "<NOUN><NOUN><NOUN><NOUN><NOUN>", 
                "<ADJ><NOUN><ADJ><NOUN>", "<NOUN><NOUN><ADJ>", "<NOUN><NOUN><NOUN><ADJ>"]

I wanted to know if is there a way to pass a list of pos patterns since I want to do this on a large data set and this takes a long time.
I think the POS protection took a long time and if I can do that once on each document, it reduces the runtime.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions