|
1 | 1 | --- |
2 | | -title: Week 3 Graded Assignment Solution |
| 2 | +title: Week 3 Graded Assignment |
3 | 3 | weight: 3 |
4 | | -tags: |
5 | | -- statistics |
6 | 4 | categories: |
7 | 5 | - Statistics Graded Assignment |
8 | | -series: |
9 | | -- Statistics Graded Assignment |
10 | | -excludeSearch: false |
11 | | -width: wide |
12 | 6 | --- |
13 | 7 |
|
14 | | -Here are all the questions and their solutions from the PDF **Statistics for Data Science-1, Week-2 Graded Assignment Solution**[^1]: |
15 | | - |
16 | 8 | --- |
17 | 9 |
|
18 | | -## 1. Which of the following statements is/are incorrect? |
| 10 | +**1. The numbers a, b, c, d have frequencies (x + 6), (x + 2), (x − 3) and x respectively. If their mean is m, find the value of x. (Enter the value as next highest integer)** |
19 | 11 |
|
20 | | -**Options:** |
21 | | -(a) To represent the share of a particular category, bar chart is the most appropriate graphical representation. |
22 | | -(b) The multiplication of the total number of observations and relative frequency of a particular observation should be equal to the frequency of that observation. |
23 | | -(c) Mean can be defined for a categorical variable. |
24 | | -(d) Mode of a categorical variable is the widest slice in a pie chart. |
25 | | - |
26 | | -**Answer:** a, c |
27 | 12 | **Solution:** |
28 | | -To show the share of a particular category, a pie chart is the most appropriate graphical representation. Thus, option (a) is incorrect. |
29 | | -Relative frequency for the ith observation is \$ Rf_i = f_i / N \$, so \$ f_i = Rf_i \times N \$. Thus, option (b) is correct. |
30 | | -Mean cannot be defined for categorical data as meaningful mathematical operations are not possible. Thus, option (c) is incorrect. |
31 | | -In a pie chart, the widest slice corresponds to the mode (highest frequency). Thus, option (d) is correct. |
32 | | -Therefore, options (a) and (c) are correct (as the question asks for incorrect statements). |
33 | 13 |
|
34 | | ---- |
| 14 | +$$ |
| 15 | +\frac{a(x + 6) + b(x + 2) + c(x − 3) + dx}{(x + 6) + (x + 2) + (x − 3) + x} = m |
| 16 | +$$ |
35 | 17 |
|
36 | | -## 2. If the exam is for a total of 500 marks, then what is the aggregate distribution of marks in Physics, Maths and Biology? |
| 18 | +$$ |
| 19 | +\frac{ax + 6a + bx + 2b + cx − 3c + dx}{4x + 5} = m |
| 20 | +$$ |
37 | 21 |
|
38 | | -(Refer to Figure 2.1.G, which shows: Physics 35%, Maths 18%, Biology 10%) |
| 22 | +$$ |
| 23 | +ax + bx + cx + dx + 6a + 2b − 3c = m(4x + 5) = (4m)x + 5m |
| 24 | +$$ |
39 | 25 |
|
40 | | -**Answer:** 315 |
41 | | -**Solution:** |
42 | | -Physics: \$ 500 \times 0.35 = 175 \$ |
43 | | -Maths: \$ 500 \times 0.18 = 90 \$ |
44 | | -Biology: \$ 500 \times 0.10 = 50 \$ |
45 | | -Aggregate: \$ 175 + 90 + 50 = 315 \$ |
| 26 | +$$ |
| 27 | +(a + b + c + d − 4m)x = 5m − 6a − 2b + 3c |
| 28 | +$$ |
46 | 29 |
|
47 | | ---- |
| 30 | +$$ |
| 31 | +x = \frac{5m − 6a − 2b + 3c}{a + b + c + d − 4m} |
| 32 | +$$ |
48 | 33 |
|
49 | | -## 3. Choose the correct statement(s): |
| 34 | +Suppose, we substitute values of a, b, c, d and m as 2, 7, 9, 17 and 6.88 respectively, |
50 | 35 |
|
51 | | -**Options:** |
52 | | -(a) The pie chart is misleading because it does not obey the area principle. |
53 | | -(b) The pie chart has round off errors. |
54 | | -(c) The pie chart is not a misleading graph. |
55 | | -(d) The slices of pie chart adds up to 100%. |
| 36 | +$$ |
| 37 | +x = \frac{(5 \times 6.88) − (6 \times 2) − (2 \times 7) + (3 \times 9)}{2 + 7 + 9 + 17 − (4 \times 6.88)} = 4.73 |
| 38 | +$$ |
56 | 39 |
|
57 | | -**Answer:** c, d |
58 | | -**Solution:** |
59 | | -The pie chart obeys the area principle and the slices add up to 100%. Thus, options (c) and (d) are correct. |
| 40 | +Hence, x = 5[^1]. |
60 | 41 |
|
61 | 42 | --- |
62 | 43 |
|
63 | | -## 4. What is the combined relative frequency of the academy A, B and D? |
| 44 | +**2. What is the mean of the original dataset? (Correct up to 2 decimal place accuracy)** |
64 | 45 |
|
65 | | -(Refer to Table 2.1.G: Academy C has 50 players, E has 75 players; total 200 players.) |
66 | | - |
67 | | -**Answer:** 0.375 (Range: 0.370, 0.380) |
68 | 46 | **Solution:** |
69 | | -Relative frequency for C: \$ 50/200 = 0.25 \$ |
70 | | -Relative frequency for E: \$ 75/200 = 0.375 \$ |
71 | | -Combined relative frequency for A, B, D: \$ 1 - (0.25 + 0.375) = 0.375 \$ |
| 47 | +Let the sum of all the observations of noted dataset be $T$ and for the original dataset be $T'$. |
72 | 48 |
|
73 | | ---- |
| 49 | +$$ |
| 50 | +\text{Mean} = \frac{T}{N} = m \implies T = m \times N |
| 51 | +$$ |
74 | 52 |
|
75 | | -## 5. Median of the given data is: |
| 53 | +$$ |
| 54 | +T' = T - p + x |
| 55 | +$$ |
76 | 56 |
|
77 | | -**Options:** |
78 | | -(a) Academy C |
79 | | -(b) Academy E |
80 | | -(c) Academy D |
81 | | -(d) Median is not defined for the given data |
82 | | -(e) Insufficient data |
| 57 | +$$ |
| 58 | +\text{Mean for original dataset} = \frac{T'}{N} |
| 59 | +$$ |
83 | 60 |
|
84 | | -**Answer:** d |
85 | | -**Solution:** |
86 | | -The data is nominal and cannot be ordered, so median is not defined. |
| 61 | +Suppose, N = 8, m = 13, s = 8, x = 18, p = 13: |
87 | 62 |
|
88 | | ---- |
| 63 | +$$ |
| 64 | +T = 13 \times 8 = 104 |
| 65 | +$$ |
89 | 66 |
|
90 | | -## 6. Mode of the given data is: |
| 67 | +$$ |
| 68 | +T' = 104 - 13 + 18 = 109 |
| 69 | +$$ |
91 | 70 |
|
92 | | -**Options:** |
93 | | -(a) Academy C |
94 | | -(b) Academy E |
95 | | -(c) Academy D |
96 | | -(d) Mode is not defined for the given data |
97 | | -(e) Insufficient data |
| 71 | +$$ |
| 72 | +\text{Mean for original dataset} = \frac{109}{8} = 13.625 |
| 73 | +$$ |
98 | 74 |
|
99 | | -**Answer:** b |
100 | | -**Solution:** |
101 | | -Academy E has the highest frequency (75), so it is the mode. |
| 75 | +[^1] |
102 | 76 |
|
103 | 77 | --- |
104 | 78 |
|
105 | | -## 7. Which of the following graphical representations is appropriate for the number of players in each academy for the given data in Table 2.1.G? |
106 | | - |
107 | | -**Options:** |
108 | | -(a) Bar chart |
109 | | -(b) Pie chart |
110 | | -(c) Pareto chart |
111 | | -(d) Both bar chart and pareto chart |
| 79 | +**3. What is the sample variance of the original dataset? (Correct up to 2 decimal place accuracy)** |
112 | 80 |
|
113 | | -**Answer:** d |
114 | 81 | **Solution:** |
115 | | -Bar chart and Pareto chart are both appropriate for showing counts. Pie chart is for proportions. |
| 82 | +Sample variance, |
116 | 83 |
|
117 | | ---- |
| 84 | +$$ |
| 85 | +s^2 = \frac{\sum(x_i - \bar{x})^2}{N-1} |
| 86 | +$$ |
118 | 87 |
|
119 | | -## 8. The data of number of students sharing the same rank is collected. Which of the following is/are suitable to represent the collected data? |
| 88 | +Let $\sum x_i^2 = A$ for noted dataset and for the original dataset be $B$. |
120 | 89 |
|
121 | | -**Options:** |
122 | | -(a) (plot with missing baseline) |
123 | | -(b) (plot with correct baseline and order) |
124 | | -(c) (plot with incorrect order of categories) |
| 90 | +$$ |
| 91 | +B = A - p^2 + x^2 |
| 92 | +$$ |
125 | 93 |
|
126 | | -**Answer:** b |
127 | | -**Solution:** |
128 | | -Option (b) correctly preserves the order and is not misleading. |
| 94 | +where, |
| 95 | + |
| 96 | +$$ |
| 97 | +A = \left(\frac{s^2 + N m^2}{N-1}\right) \times (N-1) |
| 98 | +$$ |
| 99 | + |
| 100 | +$$ |
| 101 | +\text{Sample variance for the original dataset} = \frac{B}{N-1} - \frac{(T')^2}{N(N-1)} |
| 102 | +$$ |
| 103 | + |
| 104 | +Suppose, N = 8, m = 13, s = 8, x = 18, p = 13: |
| 105 | + |
| 106 | +$$ |
| 107 | +A = \left(\frac{8^2 + 8 \times 13^2}{7}\right) \times 7 = 1800 |
| 108 | +$$ |
| 109 | + |
| 110 | +$$ |
| 111 | +B = 1800 - 13^2 + 18^2 = 1955 |
| 112 | +$$ |
| 113 | + |
| 114 | +$$ |
| 115 | +\text{Sample variance} = \frac{1955}{7} - \frac{109^2}{8 \times 7} = 67.125 |
| 116 | +$$ |
| 117 | + |
| 118 | +[^1] |
129 | 119 |
|
130 | 120 | --- |
131 | 121 |
|
132 | | -## 9. Choose the correct statement about categorical data: |
| 122 | +**4. Let the data $x_1, x_2, ..., x_n$ represent the retail prices in rupees of a certain commodity in n randomly selected shops in a particular city. What will be the sample variance in the retail prices, if c rupees is added to all the retail prices? (Correct up to 2 decimal place accuracy)** |
| 123 | + |
| 124 | +**Solution:** |
| 125 | +If $c$ rupees is added to all retail prices, new prices $y_i = x_i + c$. |
| 126 | + |
| 127 | +$$ |
| 128 | +\text{New variance} = \text{Old variance} |
| 129 | +$$ |
133 | 130 |
|
134 | | -**Options:** |
135 | | -(a) Categorical data have measurement units. |
136 | | -(b) Categorical data can take numerical values, but no meaningful mathematical operations can be performed on it. |
137 | | -(c) Categorical data is quantitative in nature. |
138 | | -(d) All of the above |
| 131 | +Example: n = 6, observations = 46, 34, 82, 37, 83, 66 |
| 132 | + |
| 133 | +$$ |
| 134 | +\text{Mean} = \frac{46 + 34 + 82 + 37 + 83 + 66}{6} = 58 |
| 135 | +$$ |
| 136 | + |
| 137 | +$$ |
| 138 | +\text{Sample variance} = \frac{(46-58)^2 + (34-58)^2 + (82-58)^2 + (37-58)^2 + (83-58)^2 + (66-58)^2}{5} = 485.2 |
| 139 | +$$ |
| 140 | + |
| 141 | +[^1] |
| 142 | + |
| 143 | +--- |
| 144 | + |
| 145 | +**5. Suppose, we have n observations such that $x_1, x_2, ..., x_n$. Calculate 10th, 50th and 100th percentiles?** |
139 | 146 |
|
140 | | -**Answer:** b |
141 | 147 | **Solution:** |
142 | | -Categorical data can be coded numerically, but no meaningful mathematical operations can be performed. |
| 148 | +To find the sample 100p percentile of a dataset of size n: |
| 149 | + |
| 150 | +1. Arrange the data in ascending order. |
| 151 | +2. If np is not integer, take the smallest integer greater than np. The data value in that position is the sample 100p percentile. |
| 152 | +3. If np is integer, take the average of values in positions np and np+1. |
| 153 | + |
| 154 | +Example: n = 7, observations = 31, 36, 25, 34, 115, 108, 88 |
| 155 | +Ascending order: 25, 31, 34, 36, 88, 108, 115 |
| 156 | + |
| 157 | +- 10th percentile: np = 0.7 → 1st observation = 25 |
| 158 | +- 50th percentile: np = 3.5 → 4th observation = 36 |
| 159 | +- 100th percentile: np = 7 → last observation = 115[^1] |
143 | 160 |
|
144 | 161 | --- |
145 | 162 |
|
146 | | -## 10. How many students have secured B grade? |
| 163 | +**6. Calculate the Inter Quartile Range (IQR) of the data.** |
| 164 | + |
| 165 | +**Solution:** |
| 166 | +IQR = Q3 − Q1 |
| 167 | + |
| 168 | +- Q1: p = 0.25, np = 1.75 → Q1 = 31 |
| 169 | +- Q3: p = 0.75, np = 5.25 → Q3 = 108 |
147 | 170 |
|
148 | | -(Refer to Figure 2.2.G: B grade 32.5% of 80 students.) |
| 171 | +IQR = 108 − 31 = 77[^1] |
| 172 | + |
| 173 | +--- |
| 174 | + |
| 175 | +**7. How many outliers are there?** |
149 | 176 |
|
150 | | -**Answer:** 26 |
151 | 177 | **Solution:** |
152 | | -\$ 80 \times 0.325 = 26 \$ |
| 178 | +Outliers < Q1 − 1.5 × IQR or > Q3 + 1.5 × IQR |
| 179 | + |
| 180 | +- Q1 = 31, Q3 = 108, IQR = 77 |
| 181 | +- Lower bound: 31 − (1.5 × 77) = −84.5 |
| 182 | +- Upper bound: 108 + (1.5 × 77) = 223.5 |
| 183 | + |
| 184 | +No observations outside these bounds. Hence, no outliers[^1]. |
153 | 185 |
|
154 | 186 | --- |
155 | 187 |
|
156 | | -## 11. What is the ratio of the students secured C grade to the students secured A grade? |
| 188 | +**8. In a deck, there are cards numbered 1 to n such that the number of cards of a given number is the same as the number on the card. Which of the following statement(s) is/are true about the mean and mode of the numbers on this deck of card?** |
| 189 | + |
| 190 | +a. Mode is n. |
| 191 | +b. Mean is $\frac{2n + 1}{3}$. |
| 192 | +c. Mode is n − 1. |
| 193 | +d. Mean is n. |
| 194 | +e. Mean is $\frac{n + 1}{2}$. |
| 195 | +f. Mode is not defined for this data. |
157 | 196 |
|
158 | | -(Figure 2.2.G: C grade 22.5%, A grade 25% of 80 students.) |
| 197 | +**Answer:** a, b |
159 | 198 |
|
160 | | -**Answer:** 0.9 |
161 | 199 | **Solution:** |
162 | | -C grade: \$ 80 \times 0.225 = 18 \$ |
163 | | -A grade: \$ 80 \times 0.25 = 20 \$ |
164 | | -Ratio: \$ 18/20 = 0.9 \$ |
| 200 | +Number (xi), Frequency (fi): 1:1, 2:2, ..., n:n |
165 | 201 |
|
166 | | ---- |
| 202 | +- Mode = n |
| 203 | +- Total observations = $1 + 2 + ... + n = \frac{n(n+1)}{2}$ |
| 204 | +- Sum = $1^2 + 2^2 + ... + n^2 = \frac{n(n+1)(2n+1)}{6}$ |
| 205 | +- Mean = $\frac{n(n+1)(2n+1)/6}{n(n+1)/2} = \frac{2n+1}{3}$ |
167 | 206 |
|
168 | | -This is the complete set of questions and solutions from the PDF[^1]. |
| 207 | +Example for n = 42: Mode = 42, Mean = 28.33[^1]. |
169 | 208 |
|
170 | | -<div style="text-align: center">⁂</div> |
| 209 | +--- |
171 | 210 |
|
172 | | -[^1]: Week_2_Graded_Solution.pdf |
| 211 | +**9. Figure 3.1.G shows a stem and leaf plot of the ratings (out of 100) of an actor’s performance in different movies. What is the Inter Quartile Range (IQR) (Correct up to 1 decimal point accuracy)?** |
173 | 212 |
|
174 | | -[^2]: https://www.scribd.com/document/687483981/Week-2-Graded-Assignment-Solution |
| 213 | +**Solution:** |
| 214 | +n = 10 |
175 | 215 |
|
176 | | -[^3]: https://www.scribd.com/document/768404514/IIT-Madras-Week-2-Graded-Assignments |
| 216 | +- Q1 = 3rd observation = 72 |
| 217 | +- Q3 = 8th observation = 87 |
| 218 | +- IQR = 87 − 72 = 15[^1] |
177 | 219 |
|
178 | | -[^4]: https://www.studocu.com/in/document/indian-institute-of-technology-madras/programming-and-data-science/week-2-graded-solution-bs-ds/82822211 |
| 220 | +--- |
179 | 221 |
|
180 | | -[^5]: https://gradedassignments.github.io/iit-madras-graded-assignments/ |
| 222 | +**10. What is the median rating, if x points are added to all of his ratings and then converted to y points? (Correct up to 2 decimal point accuracy)** |
181 | 223 |
|
182 | | -[^6]: https://www.youtube.com/watch?v=aI1a91rzTrs |
| 224 | +**Solution:** |
| 225 | +Median of original data (10 observations) = mean of 5th and 6th = (75 + 78)/2 = 76.5 |
183 | 226 |
|
184 | | -[^7]: https://groups.google.com/a/nptel.iitm.ac.in/g/ma1001-discuss/c/_lVR3xXnj5M |
| 227 | +- If x points added: median = 76.5 + x |
| 228 | +- If then converted to y points: median = $(76.5 + x) \times \frac{y}{100}$ |
185 | 229 |
|
186 | | -[^8]: https://iitmdatascience.com/term2 |
| 230 | +Example: x = 3, y = 40 |
| 231 | +Median = (76.5 + 3) × 0.4 = 31.8[^1] |
187 | 232 |
|
188 | | -[^9]: https://www.studocu.com/in/document/indian-institute-of-technology-madras/iitm-online-degree-data-science-and-programming/week-2-graded-assignment/105815343 |
| 233 | +<div style="text-align: center">⁂</div> |
189 | 234 |
|
190 | | -[^10]: https://www.youtube.com/watch?v=6EPGq4-zDV8 |
| 235 | +[^1]: Week_3_Graded_Solution.pdf |
191 | 236 |
|
0 commit comments