Description
I work in a field where "R&D" (research and development) and "SMEs" (small and medium enterprises) are important concepts. If I tokenize myself, then Word Cloud displays "r&d" correctly but does not display "smes" even though I have verified they are prominent in my frequency count. If I let Word Cloud tokenize, then it displays "smes" correctly but renders "R&D" as "r d" (i.e., with a space where there should be an ampersand). Is there anything I can do?
Steps/Code to Reproduce
Example:
with open(text_path, 'r', encoding='utf-8') as file:
text = file.read()
frequencies = collections.Counter()
for word in text.split(" "):
frequencies[word] += 1
frequencies = dict(frequencies)
# This shows "smes" and "r d" (but no ampersand)
cloud = wordcloud.WordCloud(width=1920, height=1080,
background_color='white',
stopwords=stop_words.STOP_WORDS,
font_path="./assets/fonts/roboto/Roboto-Regular.ttf").generate(text)
# This shows "r&d" but not "smes"
cloud = wordcloud.WordCloud(width=1920, height=1080,
background_color='white',
stopwords=set(),
font_path="./assets/fonts/roboto/Roboto-Regular.ttf").generate_from_frequencies(frequencies)
Expected Results
Either way, I should be able to get both "smes" and "r&d" on the same Word Cloud.
Actual Results
As described above, in one case I get "smes" and "r d", and in the other case, I get "r&d" but no "smes".
Versions
Windows-11-10.0.22631-SP0
Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun 6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)]
NumPy 1.26.4
matplotlib 3.9.0
wordcoud 1.9.3
Description
I work in a field where "R&D" (research and development) and "SMEs" (small and medium enterprises) are important concepts. If I tokenize myself, then Word Cloud displays "r&d" correctly but does not display "smes" even though I have verified they are prominent in my frequency count. If I let Word Cloud tokenize, then it displays "smes" correctly but renders "R&D" as "r d" (i.e., with a space where there should be an ampersand). Is there anything I can do?
Steps/Code to Reproduce
Example:
Expected Results
Either way, I should be able to get both "smes" and "r&d" on the same Word Cloud.
Actual Results
As described above, in one case I get "smes" and "r d", and in the other case, I get "r&d" but no "smes".
Versions
Windows-11-10.0.22631-SP0
Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun 6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)]
NumPy 1.26.4
matplotlib 3.9.0
wordcoud 1.9.3