-
Notifications
You must be signed in to change notification settings - Fork 6
Expand file tree
/
Copy pathexercise_2b_wordcloud_app.qmd
More file actions
174 lines (134 loc) · 6.63 KB
/
exercise_2b_wordcloud_app.qmd
File metadata and controls
174 lines (134 loc) · 6.63 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# Exercise 2b: Wordcloudy with a chance of meatballs
A few weeks ago you created some excellent wordclouds.
I imagine all your friends now want to create wordclouds too - but they don’t have any Python knowledge.
You are going to make a web app that allows people to
- Enter some text
- Generate a wordcloud and display it to the user
- Download the resulting wordcloud
You can then add in some extra features:
- A way to allow them to select the colourscheme
- A default wordcloud (hint: think about setting a default value in your text input)
- An option to upload a text file instead of pasting the text into an input
- A way to upload an image to use as a mask (I’ve provided a sample image that is in the right format to work)
- A way to remove words they want excluded (this one’s a bit trickier!)
Some code snippets to help you build this are in [exercises/2b](https://github.com/hsma-programme/h6_7b_web_apps_1/tree/main/exercises/exercise_2b) - but you can also take a look at them below.
```{python}
#| eval: false
from wordcloud import WordCloud, STOPWORDS
import string
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
def make_wordcloud(text_input, filename="wordcloud.png"):
stopwords = set(STOPWORDS)
tokens = text_input.split()
punctuation_mapping_table = str.maketrans('', '', string.punctuation)
tokens_stripped_of_punctuation = [token.translate(punctuation_mapping_table)
for token in tokens]
lower_tokens = [token.lower() for token in tokens_stripped_of_punctuation]
joined_string = (" ").join(lower_tokens)
wordcloud = WordCloud(width=1800,
height=1800,
stopwords=stopwords,
min_font_size=20).generate(joined_string)
plt.figure(figsize=(30,40))
# Turn off axes
plt.axis("off")
# Display (essential to actually get the wordcloud in the image)
plt.imshow(wordcloud)
# Save the wordcloud to a file
plt.savefig(filename)
###################################################
# EXAMPLE USE: With string, which will be the output
# of st.text_input() or st.text_area()
###################################################
penguin_text = """
Penguins are a group of aquatic flightless birds from the family Spheniscidae
of the order Sphenisciformes.
They live almost exclusively in the Southern Hemisphere: only one species,
the Galapagos penguin, is found north of the Equator. Highly adapted for life in the ocean water,
penguins have countershaded dark and white plumage and flippers for swimming. Most penguins feed
on krill, fish, squid and other forms of sea life which they catch with their bills and swallow
whole while swimming. A penguin has a spiny tongue and powerful jaws to grip slippery prey.
They spend about half of their lives on land and the other half in the sea.
The largest living species is the emperor penguin (Aptenodytes forsteri):
on average, adults are about 1.1 m (3 ft 7 in) tall and weigh 35 kg (77 lb).
The smallest penguin species is the little blue penguin (Eudyptula minor),
also known as the fairy penguin, which stands around 30–33 cm (12–13 in) tall and
weighs 1.2–1.3 kg (2.6–2.9 lb).
Today, larger penguins generally inhabit colder regions, and smaller penguins inhabit regions
with temperate or tropical climates. Some prehistoric penguin species were enormous:
as tall or heavy as an adult human.There was a great diversity of species in subantarctic regions,
and at least one giant species in a region around 2,000 km south of the equator 35 mya, during
the Late Eocene, a climate decidedly warmer than today.
"""
make_wordcloud(penguin_text, "penguin_sample_wordcloud.png")
###################################################
# EXAMPLE USE: With .txt file
###################################################
# Read text in for which we want to generate word cloud
# The read() method of the file object simply reads in the contents of the file
# as one, single continuous string of text.
with open("bttf_reviews.txt", "r") as f:
bttf_text = f.read()
make_wordcloud(bttf_text, "bttf_sample_wordcloud.png")
###################################################
###################################################
# Advanced wordcloud function
# This accepts an optional additional image to act
# as a 'mask'
# It also allows users to pass in additional
# parameters that are accepted by the wordcloud
# function itself
###################################################
###################################################
def make_wordcloud_with_image_mask(
text_input,
filename="wordcloud.png",
mask_image=None,
**kwargs
):
stopwords = set(STOPWORDS)
tokens = text_input.split()
punctuation_mapping_table = str.maketrans('', '', string.punctuation)
tokens_stripped_of_punctuation = [token.translate(punctuation_mapping_table)
for token in tokens]
lower_tokens = [token.lower() for token in tokens_stripped_of_punctuation]
joined_string = (" ").join(lower_tokens)
plt.figure(figsize=(30,40))
plt.axis("off")
if mask_image is not None:
mask_image_opened = Image.open(mask_image)
mask_array = np.array(mask_image_opened)
wordcloud = WordCloud(width=mask_array.shape[1],
height=mask_array.shape[0],
stopwords=stopwords,
mask=mask_array,
**kwargs).generate(joined_string)
plt.imshow(wordcloud, interpolation='bilinear')
else:
wordcloud = WordCloud(width=1800,
height=1800,
stopwords=stopwords,
**kwargs).generate(joined_string)
plt.imshow(wordcloud)
plt.savefig(filename)
make_wordcloud_with_image_mask(penguin_text,
"penguin_sample_wordcloud_mask.png",
mask_image="penguin.jpg"
)
make_wordcloud_with_image_mask(penguin_text,
"penguin_sample_wordcloud_mask_smaller_text.png",
mask_image="penguin.jpg",
min_font_size=6
)
make_wordcloud_with_image_mask(bttf_text,
"bttf_sample_wordcloud_blue.png",
colormap='Blues'
)
make_wordcloud_with_image_mask(bttf_text,
"bttf_sample_wordcloud_pink_background_blue_text.png",
colormap='Blues',
background_color='pink'
)
```