|
| 1 | +# Play along |
| 2 | + |
| 3 | +For testing online, visit [this site](https://datawithdev.com/sql-playground/) and import the data from [the sqlite DB](/athlete_data_after2000.db). |
| 4 | + |
| 5 | +Or use [DB Browsaer for SQLite](https://sqlitebrowser.org/) locally. |
| 6 | + |
| 7 | +Data source: [Kaggle](https://www.kaggle.com/datasets/krishd123/olympics-legacy-1896-2020) |
| 8 | + |
| 9 | +# DQL |
| 10 | + |
| 11 | +## Basic commands |
| 12 | + |
| 13 | +### Get all the data |
| 14 | +```sql |
| 15 | +SELECT * FROM athlete_results; |
| 16 | +``` |
| 17 | + |
| 18 | +### Get only specific columnns |
| 19 | +Name and team for all of the results: |
| 20 | +```sql |
| 21 | +SELECT Name,Team FROM athlete_results; |
| 22 | +``` |
| 23 | + |
| 24 | +> [!NOTE] |
| 25 | +> Keywords can be in any case, `SELECT`, `select` and `sEleCt` does the same. |
| 26 | +> But stick to one (of the first 2). |
| 27 | +
|
| 28 | +### Filter by condition |
| 29 | +Get the name and sport for each Austrian athlete: |
| 30 | +```sql |
| 31 | +SELECT Name,Sport FROM athlete_results WHERE NOC = 'AUT'; |
| 32 | +``` |
| 33 | +> [!IMPORTANT] |
| 34 | +> Double qutes are **NOT** used for string literals, they have to be used for table/column names if they contain spaces. |
| 35 | +
|
| 36 | +> [!NOTE] |
| 37 | +> `=` is the "is equal" operator, not `==`. |
| 38 | +> But it is also accepted by a lot of DBMSs. |
| 39 | +
|
| 40 | + |
| 41 | +### Filter by multiple conditions |
| 42 | +Get the name, exact event and medal of Austrian athletes who ended up on the podium: |
| 43 | + |
| 44 | +```sql |
| 45 | +SELECT Name, Event, Medal FROM athlete_results WHERE NOC = 'AUT' AND Medal != NULL; |
| 46 | +``` |
| 47 | + |
| 48 | +> [!Caution] |
| 49 | +> 0 results, because `NULL` is special. |
| 50 | +> |
| 51 | +> `anything = NULL` -> `false` |
| 52 | +> |
| 53 | +> `anything != NULL` -> `false` |
| 54 | +> |
| 55 | +> Even: |
| 56 | +> |
| 57 | +> `NULL = NULL` -> `false` |
| 58 | +> |
| 59 | +> `NULL != NULL` -> `false` |
| 60 | +
|
| 61 | +Always use `IS NULL` or `IS NOT NULL`: |
| 62 | + |
| 63 | +```sql |
| 64 | +SELECT Name, Event, Medal FROM athlete_results WHERE NOC = 'AUT' AND Medal IS NOT NULL; |
| 65 | +``` |
| 66 | + |
| 67 | +> [!NOTE] |
| 68 | +> SQLite is lenient in many things, `IS` can be left out for example, but not proper SQL. |
| 69 | +
|
| 70 | + |
| 71 | + |
| 72 | +### Filter comparing columns |
| 73 | +List where Country and Team differ: |
| 74 | +```sql |
| 75 | +SELECT Country,Team FROM athlete_results WHERE Country != Team; |
| 76 | +``` |
| 77 | + |
| 78 | +### Remove duplicates |
| 79 | +Same, but list each only once: |
| 80 | +```sql |
| 81 | +SELECT DISTINCT Country,Team FROM athlete_results WHERE Country != Team; |
| 82 | +``` |
| 83 | + |
| 84 | +> [!Important] |
| 85 | +> `DISTINCT` applies to the whole `Country,Team` tuple, so the same `Country` could appear twice if it had different `Team`s. |
| 86 | +
|
| 87 | +### Get the number of results |
| 88 | +How many Gold medals did Austrian athletes won? |
| 89 | +```sql |
| 90 | +SELECT COUNT(*) FROM athlete_results where NOC = 'AUT' AND Medal = 'Gold'; |
| 91 | +``` |
| 92 | +> [!Note] |
| 93 | +> `COUNT()` is also often accepted, again, not proper. |
| 94 | +
|
| 95 | + |
| 96 | +How many Austrian athletes won Gold medals? |
| 97 | +```sql |
| 98 | +SELECT COUNT(DISTINCT Name) FROM athlete_results where NOC = 'AUT' AND Medal = 'Gold'; |
| 99 | +``` |
| 100 | + |
| 101 | +### Order results |
| 102 | +Order the countries by name: |
| 103 | +```sql |
| 104 | +SELECT DISTINCT Country FROM athlete_results ORDER BY Country; |
| 105 | +``` |
| 106 | + |
| 107 | +## Grouping and nesting |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | +### Group by attribute |
| 112 | + |
| 113 | +Group Austrian athletes by Sport: |
| 114 | +```sql |
| 115 | +SELECT Name, Sport, Year FROM athlete_results WHERE NOC = 'AUT' GROUP BY Sport; |
| 116 | +``` |
| 117 | +Which values are selected for `Name` and `Year` for each `Sport` group?! |
| 118 | + |
| 119 | +> [!TIP] |
| 120 | +> When using `GROUP BY`, it is a code smell if any non-groupping column remains as is without any aggregate function. |
| 121 | +
|
| 122 | +Count how many Gold medals each Country had: |
| 123 | +```sql |
| 124 | +SELECT Country, COUNT(Medal) FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country; |
| 125 | +``` |
| 126 | + |
| 127 | +Little bit nicer: |
| 128 | +```sql |
| 129 | +SELECT Country, COUNT(*) AS "Gold Medals" FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country; |
| 130 | +``` |
| 131 | + |
| 132 | +And sort it then: |
| 133 | +```SQL |
| 134 | +SELECT Country, COUNT(*) AS "Gold Medals" FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country ORDER BY "Gold Medals"; |
| 135 | +``` |
| 136 | + |
| 137 | +Descending... |
| 138 | +```SQL |
| 139 | +SELECT Country, COUNT(*) AS "Gold Medals" FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country ORDER BY "Gold Medals" DESC; |
| 140 | +``` |
| 141 | + |
| 142 | +Only top 20: |
| 143 | +```sql |
| 144 | +SELECT Country, COUNT(*) AS "Gold Medals" FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country ORDER BY "Gold Medals" DESC LIMIT 20; |
| 145 | +``` |
| 146 | + |
| 147 | +Probably it is time to break it up to multiple lines for readibility: |
| 148 | +```sql |
| 149 | +SELECT |
| 150 | + Country, |
| 151 | + COUNT(*) AS "Gold Medals" |
| 152 | +FROM athlete_results |
| 153 | +WHERE Medal = 'Gold' |
| 154 | +GROUP BY Country |
| 155 | +ORDER BY "Gold Medals" DESC |
| 156 | +LIMIT 20; |
| 157 | +``` |
| 158 | + |
| 159 | +### Nesting |
| 160 | +Sometimes question cannot (easily) be answered in one go, an intermediate table is needed/useful. |
| 161 | + |
| 162 | +What is the average number of Gold medals won by a Country in each event it participates? |
| 163 | + |
| 164 | +Distinct participation of countries easy: |
| 165 | +```sql |
| 166 | +SELECT DISTINCT Country, Year, FROM athlete_results; |
| 167 | +``` |
| 168 | + |
| 169 | +Count the medals: |
| 170 | +```sql |
| 171 | +SELECT |
| 172 | + Country, |
| 173 | + Year, |
| 174 | + COUNT(Medal) AS Gold |
| 175 | +FROM athlete_results |
| 176 | +GROUP BY Country, Year; |
| 177 | +``` |
| 178 | + |
| 179 | +This result as an "intermediate table" can be used for a subsequent query: |
| 180 | +```sql |
| 181 | +SELECT Country, AVG(Gold) AS "Average gold" |
| 182 | +FROM ( |
| 183 | + SELECT |
| 184 | + Country, |
| 185 | + Year, |
| 186 | + COUNT(Medal) AS Gold |
| 187 | + FROM athlete_results |
| 188 | + GROUP BY Country, Year; |
| 189 | +) t |
| 190 | +GROUP BY Country; |
| 191 | +ORDER BY "Average gold" |
| 192 | +``` |
| 193 | + |
| 194 | + |
| 195 | +## Practice questions: |
| 196 | + - How many countries won a gold medal in the 2020 Olympics? |
| 197 | + - Who participated at the most olympic games? |
| 198 | + - Who won the most Gold medals in a single event? |
| 199 | + - Was there an athlete who changed citizenship? |
| 200 | + - Who are the two athletes that shared the podium the most times? |
| 201 | + |
| 202 | +# DDL |
| 203 | + |
| 204 | +## Views |
| 205 | + |
| 206 | +### Create (virtual) view |
| 207 | +```sql |
| 208 | +CREATE VIEW most_participations AS |
| 209 | +SELECT Name, COUNT(*) AS Participation FROM ( |
| 210 | + SELECT DISTINCT Name, Year, Season |
| 211 | + FROM athlete_results |
| 212 | +) t |
| 213 | +GROUP BY Name |
| 214 | +ORDER BY Participation DESC |
| 215 | +``` |
| 216 | + |
| 217 | +> [If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.](https://en.wikipedia.org/wiki/Duck_test) |
| 218 | +
|
| 219 | +Views... |
| 220 | + - ✅ look like tables |
| 221 | + - ✅ can be queried like a table |
| 222 | + - ❌ cannot be altered (DML) as tables (usually). |
| 223 | + - ❌ are not stored like tables |
| 224 | + |
| 225 | +Views are not tables. |
| 226 | +The results of the "construction query" is not stored, only the "recipe query". |
| 227 | +When querying a view, the recipe query is executed first to get the view. |
| 228 | + |
| 229 | +### Querying View |
| 230 | +Just like a table: |
| 231 | +```sql |
| 232 | +SELECT * from most_participations LIMIT 10; |
| 233 | +``` |
| 234 | + |
| 235 | +### Create materialized View |
| 236 | +Kinda like tables, data is cached also, but needs to be updated (mostly manually). |
| 237 | + |
| 238 | +Not standard! |
| 239 | + |
| 240 | +```sql |
| 241 | +CREATE MATERIALIZED VIEW most_participations_materialized AS |
| 242 | +SELECT Name, COUNT(*) AS Participation FROM ( |
| 243 | + SELECT DISTINCT Name, Year, Season |
| 244 | + FROM athlete_results |
| 245 | +) t |
| 246 | +GROUP BY Name |
| 247 | +ORDER BY Participation DESC |
| 248 | +``` |
| 249 | + |
| 250 | +### Virtual vs. Material views |
| 251 | + |
| 252 | +| | Virtual | Material| |
| 253 | +| --- | --- | --- | |
| 254 | +| Pro | small memory footrpint, always-up-to-date | faster to query, saves CPU | |
| 255 | +| Con | extra computation for every query | not up-to-date necessarily, additional memory need | |
| 256 | + |
| 257 | +## Dropping |
| 258 | + |
| 259 | +### Drop entire table or view |
| 260 | +```sql |
| 261 | +DROP VIEW most_participations; |
| 262 | +``` |
| 263 | + |
| 264 | + |
| 265 | + |
| 266 | +### Delete column/attribute |
| 267 | +```sql |
| 268 | +ALTER TABLE athlete_results DROP COLUMN A; |
| 269 | +``` |
0 commit comments