Skip to content

Commit 08c6def

Browse files
Mate HegyhatiMate Hegyhati
authored andcommitted
FHWN DB Test,DQL,Views
1 parent 2029be3 commit 08c6def

6 files changed

Lines changed: 530 additions & 0 deletions

File tree

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# Play along
2+
3+
For testing online, visit [this site](https://datawithdev.com/sql-playground/) and import the data from [the sqlite DB](/athlete_data_after2000.db).
4+
5+
Or use [DB Browsaer for SQLite](https://sqlitebrowser.org/) locally.
6+
7+
Data source: [Kaggle](https://www.kaggle.com/datasets/krishd123/olympics-legacy-1896-2020)
8+
9+
# DQL
10+
11+
## Basic commands
12+
13+
### Get all the data
14+
```sql
15+
SELECT * FROM athlete_results;
16+
```
17+
18+
### Get only specific columnns
19+
Name and team for all of the results:
20+
```sql
21+
SELECT Name,Team FROM athlete_results;
22+
```
23+
24+
> [!NOTE]
25+
> Keywords can be in any case, `SELECT`, `select` and `sEleCt` does the same.
26+
> But stick to one (of the first 2).
27+
28+
### Filter by condition
29+
Get the name and sport for each Austrian athlete:
30+
```sql
31+
SELECT Name,Sport FROM athlete_results WHERE NOC = 'AUT';
32+
```
33+
> [!IMPORTANT]
34+
> Double qutes are **NOT** used for string literals, they have to be used for table/column names if they contain spaces.
35+
36+
> [!NOTE]
37+
> `=` is the "is equal" operator, not `==`.
38+
> But it is also accepted by a lot of DBMSs.
39+
40+
41+
### Filter by multiple conditions
42+
Get the name, exact event and medal of Austrian athletes who ended up on the podium:
43+
44+
```sql
45+
SELECT Name, Event, Medal FROM athlete_results WHERE NOC = 'AUT' AND Medal != NULL;
46+
```
47+
48+
> [!Caution]
49+
> 0 results, because `NULL` is special.
50+
>
51+
> `anything = NULL` -> `false`
52+
>
53+
> `anything != NULL` -> `false`
54+
>
55+
> Even:
56+
>
57+
> `NULL = NULL` -> `false`
58+
>
59+
> `NULL != NULL` -> `false`
60+
61+
Always use `IS NULL` or `IS NOT NULL`:
62+
63+
```sql
64+
SELECT Name, Event, Medal FROM athlete_results WHERE NOC = 'AUT' AND Medal IS NOT NULL;
65+
```
66+
67+
> [!NOTE]
68+
> SQLite is lenient in many things, `IS` can be left out for example, but not proper SQL.
69+
70+
71+
72+
### Filter comparing columns
73+
List where Country and Team differ:
74+
```sql
75+
SELECT Country,Team FROM athlete_results WHERE Country != Team;
76+
```
77+
78+
### Remove duplicates
79+
Same, but list each only once:
80+
```sql
81+
SELECT DISTINCT Country,Team FROM athlete_results WHERE Country != Team;
82+
```
83+
84+
> [!Important]
85+
> `DISTINCT` applies to the whole `Country,Team` tuple, so the same `Country` could appear twice if it had different `Team`s.
86+
87+
### Get the number of results
88+
How many Gold medals did Austrian athletes won?
89+
```sql
90+
SELECT COUNT(*) FROM athlete_results where NOC = 'AUT' AND Medal = 'Gold';
91+
```
92+
> [!Note]
93+
> `COUNT()` is also often accepted, again, not proper.
94+
95+
96+
How many Austrian athletes won Gold medals?
97+
```sql
98+
SELECT COUNT(DISTINCT Name) FROM athlete_results where NOC = 'AUT' AND Medal = 'Gold';
99+
```
100+
101+
### Order results
102+
Order the countries by name:
103+
```sql
104+
SELECT DISTINCT Country FROM athlete_results ORDER BY Country;
105+
```
106+
107+
## Grouping and nesting
108+
109+
110+
111+
### Group by attribute
112+
113+
Group Austrian athletes by Sport:
114+
```sql
115+
SELECT Name, Sport, Year FROM athlete_results WHERE NOC = 'AUT' GROUP BY Sport;
116+
```
117+
Which values are selected for `Name` and `Year` for each `Sport` group?!
118+
119+
> [!TIP]
120+
> When using `GROUP BY`, it is a code smell if any non-groupping column remains as is without any aggregate function.
121+
122+
Count how many Gold medals each Country had:
123+
```sql
124+
SELECT Country, COUNT(Medal) FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country;
125+
```
126+
127+
Little bit nicer:
128+
```sql
129+
SELECT Country, COUNT(*) AS "Gold Medals" FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country;
130+
```
131+
132+
And sort it then:
133+
```SQL
134+
SELECT Country, COUNT(*) AS "Gold Medals" FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country ORDER BY "Gold Medals";
135+
```
136+
137+
Descending...
138+
```SQL
139+
SELECT Country, COUNT(*) AS "Gold Medals" FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country ORDER BY "Gold Medals" DESC;
140+
```
141+
142+
Only top 20:
143+
```sql
144+
SELECT Country, COUNT(*) AS "Gold Medals" FROM athlete_results WHERE Medal = 'Gold' GROUP BY Country ORDER BY "Gold Medals" DESC LIMIT 20;
145+
```
146+
147+
Probably it is time to break it up to multiple lines for readibility:
148+
```sql
149+
SELECT
150+
Country,
151+
COUNT(*) AS "Gold Medals"
152+
FROM athlete_results
153+
WHERE Medal = 'Gold'
154+
GROUP BY Country
155+
ORDER BY "Gold Medals" DESC
156+
LIMIT 20;
157+
```
158+
159+
### Nesting
160+
Sometimes question cannot (easily) be answered in one go, an intermediate table is needed/useful.
161+
162+
What is the average number of Gold medals won by a Country in each event it participates?
163+
164+
Distinct participation of countries easy:
165+
```sql
166+
SELECT DISTINCT Country, Year, FROM athlete_results;
167+
```
168+
169+
Count the medals:
170+
```sql
171+
SELECT
172+
Country,
173+
Year,
174+
COUNT(Medal) AS Gold
175+
FROM athlete_results
176+
GROUP BY Country, Year;
177+
```
178+
179+
This result as an "intermediate table" can be used for a subsequent query:
180+
```sql
181+
SELECT Country, AVG(Gold) AS "Average gold"
182+
FROM (
183+
SELECT
184+
Country,
185+
Year,
186+
COUNT(Medal) AS Gold
187+
FROM athlete_results
188+
GROUP BY Country, Year;
189+
) t
190+
GROUP BY Country;
191+
ORDER BY "Average gold"
192+
```
193+
194+
195+
## Practice questions:
196+
- How many countries won a gold medal in the 2020 Olympics?
197+
- Who participated at the most olympic games?
198+
- Who won the most Gold medals in a single event?
199+
- Was there an athlete who changed citizenship?
200+
- Who are the two athletes that shared the podium the most times?
201+
202+
# DDL
203+
204+
## Views
205+
206+
### Create (virtual) view
207+
```sql
208+
CREATE VIEW most_participations AS
209+
SELECT Name, COUNT(*) AS Participation FROM (
210+
SELECT DISTINCT Name, Year, Season
211+
FROM athlete_results
212+
) t
213+
GROUP BY Name
214+
ORDER BY Participation DESC
215+
```
216+
217+
> [If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.](https://en.wikipedia.org/wiki/Duck_test)
218+
219+
Views...
220+
- ✅ look like tables
221+
- ✅ can be queried like a table
222+
- ❌ cannot be altered (DML) as tables (usually).
223+
- ❌ are not stored like tables
224+
225+
Views are not tables.
226+
The results of the "construction query" is not stored, only the "recipe query".
227+
When querying a view, the recipe query is executed first to get the view.
228+
229+
### Querying View
230+
Just like a table:
231+
```sql
232+
SELECT * from most_participations LIMIT 10;
233+
```
234+
235+
### Create materialized View
236+
Kinda like tables, data is cached also, but needs to be updated (mostly manually).
237+
238+
Not standard!
239+
240+
```sql
241+
CREATE MATERIALIZED VIEW most_participations_materialized AS
242+
SELECT Name, COUNT(*) AS Participation FROM (
243+
SELECT DISTINCT Name, Year, Season
244+
FROM athlete_results
245+
) t
246+
GROUP BY Name
247+
ORDER BY Participation DESC
248+
```
249+
250+
### Virtual vs. Material views
251+
252+
| | Virtual | Material|
253+
| --- | --- | --- |
254+
| Pro | small memory footrpint, always-up-to-date | faster to query, saves CPU |
255+
| Con | extra computation for every query | not up-to-date necessarily, additional memory need |
256+
257+
## Dropping
258+
259+
### Drop entire table or view
260+
```sql
261+
DROP VIEW most_participations;
262+
```
263+
![SQL Injection](Injection.PNG)
264+
265+
266+
### Delete column/attribute
267+
```sql
268+
ALTER TABLE athlete_results DROP COLUMN A;
269+
```
80.7 KB
Loading
11.8 MB
Binary file not shown.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Test
2+
3+
## Description
4+
5+
Draw the ER diagram for the following use case and write the SQL statements to create the corresponding schema.
6+
7+
> There are many airports around the world and each has its 3-letter identifier. For example VIE for Vienna - Schwechat International Airport. About each airport we know the number of both runways and gates, moreover whether it is international or not. Each airport is located in a country, that has a unique 2 digit (e.g., AT) and 3 digit (AUT) code. We also want to store the population of each country, and whether they are in Schengen or not. Finally, we would like to store, which airports are connected by at least one carrier.
8+
9+
`NOT NULL` and `ON DELETE` statements can be omitted.
10+
11+
## Solution
12+
13+
**One** good solution, there are more.
14+
15+
### ERD
16+
17+
![ERD of test](./test_erd.svg)
18+
19+
### SQL DDL
20+
21+
```sql
22+
CREATE TABLE Airports (
23+
iata VARCHAR(3) PRIMARY KEY,
24+
runway_count INT,
25+
gate_count INT,
26+
is_international BOOLEAN,
27+
country VARCHAR(3) REFERENCES Countries(alpha3)
28+
);
29+
30+
CREATE TABLE Countries (
31+
alpha2 VARCHAR(2) UNIQUE,
32+
alpha3 VARCHAR(3) PRIMARY KEY,
33+
population INT,
34+
in_schengen BOOLEAN
35+
);
36+
37+
CREATE TABLE Connections (
38+
iata1 VARCHAR(3) REFERENCES Airports(iata),
39+
iata2 VARCHAR(3) REFERENCES Airports(iata),
40+
PRIMARY KEY(iata1,iata2)
41+
);
42+
```
27.6 KB
Binary file not shown.

0 commit comments

Comments
 (0)