Skip to content

Correction for text with punctuation and dash#33

Open
hyunjoolee wants to merge 2 commits intoNVIDIA:masterfrom
hyunjoolee:master
Open

Correction for text with punctuation and dash#33
hyunjoolee wants to merge 2 commits intoNVIDIA:masterfrom
hyunjoolee:master

Conversation

@hyunjoolee
Copy link

I have found that if there are punctuation and dash characters in the text, they are not converted to clean text in text/init.py get_arpabet().

For examples, words like "recommendations.", "fbi," and "policy-making" are not searchable in the cmu_dict.
I think these will reduce model performance.

So I suggest some code as attached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant