Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions 02_activities/assignments/DC_Cohort/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@ The store wants to keep customer addresses. Propose two architectures for the CU
**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...
Type 1 - overwritting addresses
Type 2 - will retain changes in customer addresses
```

***
Expand Down Expand Up @@ -183,5 +184,12 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c


```
Your thoughts...
In the article “Neural nets are just people all the way down,” Boykis argues that although machine-learning and AI systems are often presented as automated or self-sufficient technologies, they rely heavily on human labour, judgment, and interpretation at their creation. She illustrates this through the history of ImageNet, a large-scale image-recognition dataset that became foundational to computer visualization. Boykis explains that ImageNet was constructed by gathering millions of images and using thousands of unfairly compensated human annotators to manually label them. Subsequent AI models were generated based on the decisions that these works made regarding objects in each image, how they should be categorized and what they deemed as meaningful labels.

She further notes that WordNet, which is used to determine linguistic taxonomies and categories, were also rooted in human labour and human decisions. Decisions about how concepts are grouped, where boundaries lie between categories, and which distinctions matter are not necessarily objective and are influenced by cultural, linguistic and political factors.

The article highlights several ethical issues in AI modeling, including the hidden human labour required to train AI models, and the illusion of neutrality. AI-based decisions are often assumed to be unbiased, but because the data they are trained on reflects human categories and human judgments, they inherit human biases as well. This has real consequences: biased datasets can perpetuate stereotypes, misclassify certain groups, reinforce inequities, and amplify existing social and political power structures.

While AI tools are undeniably powerful and widely integrated into today’s society, Boykis reminds readers that these systems are not separate from the world that shapes them.

```
131 changes: 123 additions & 8 deletions 02_activities/assignments/DC_Cohort/assignment2.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,15 @@
/* 1. Our favourite manager wants a detailed long list of products, but is afraid of tables!
We tell them, no problem! We can produce a list with all of the appropriate details.

Using the following syntax you create our super cool and not at all needy manager a list:
Using the following syntax you create our super cool and not at all needy manager a list: */


SELECT
product_name || ', ' || product_size|| ' (' || product_qty_type || ')'
FROM product
product_name || ', ' ||coalesce(product_size, '')|| ' (' || coalesce(product_qty_type, 'unit') || ')'
FROM product;


/*
But wait! The product table has some bad data (a few NULL values).
Find the NULLs and then using COALESCE, replace the NULL with a
blank for the first problem, and 'unit' for the second problem.
Expand All @@ -21,7 +24,6 @@ Edit the appropriate columns -- you're making two edits -- and the NULL rows wil
All the other rows will remain the same.) */



--Windowed Functions
/* 1. Write a query that selects from the customer_purchases table and numbers each customer’s
visits to the farmer’s market (labeling each market date with a different number).
Expand All @@ -32,18 +34,46 @@ each new market date for each customer, or select only the unique market dates p
(without purchase details) and number those visits.
HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */

SELECT DISTINCT*

FROM (
SELECT
customer_id,
market_date,
DENSE_RANK() OVER(PARTITION BY customer_id ORDER BY market_date ASC) as visit_order
FROM customer_purchases
) x;



/* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1,
then write another query that uses this one as a subquery (or temp table) and filters the results to
only the customer’s most recent visit. */

SELECT DISTINCT*

FROM (
SELECT
customer_id,
market_date,
DENSE_RANK() OVER(PARTITION BY customer_id ORDER BY market_date DESC) as visit_order
FROM customer_purchases
) x
WHERE visit_order =1;



/* 3. Using a COUNT() window function, include a value along with each row of the
customer_purchases table that indicates how many different times that customer has purchased that product_id. */


SELECT DISTINCT
customer_id,
product_id,
COUNT() OVER(PARTITION BY customer_id ORDER BY product_id) as times_purchased
FROM customer_purchases;



-- String manipulations
/* 1. Some product names in the product table have descriptions like "Jar" or "Organic".
Expand All @@ -57,10 +87,24 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for

Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */

SELECT
product_name,
CASE WHEN product_name LIKE '%-%'
THEN SUBSTR(product_name, INSTR(product_name, '-')+2)
ELSE 'NULL'
END as product_details


FROM product;


/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */

SELECT
product_id,
product_size
FROM product
WHERE product_size REGEXP '[0-9]';


-- UNION
Expand All @@ -74,6 +118,37 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
with a UNION binding them. */


SELECT
market_date,
total_sales,
'best' AS sales_day
FROM (
SELECT
market_date,
SUM(quantity * cost_to_customer_per_qty) AS total_sales,
ROW_NUMBER() OVER (ORDER BY SUM(quantity * cost_to_customer_per_qty) DESC) AS best_day
FROM customer_purchases
GROUP BY market_date
) ranked
WHERE best_day = 1


UNION

SELECT
market_date,
total_sales,
'worst' AS sales_day
FROM (
SELECT
market_date,
SUM(quantity * cost_to_customer_per_qty) AS total_sales,
ROW_NUMBER() OVER (ORDER BY SUM(quantity * cost_to_customer_per_qty) ASC) AS worst_day
FROM customer_purchases
GROUP BY market_date
) ranked
WHERE worst_day = 1;



/* SECTION 3 */
Expand All @@ -90,34 +165,63 @@ How many customers are there (y).
Before your final group by you should have the product of those two queries (x*y). */


DROP TABLE IF EXISTS temp.customer_vendors_products;

CREATE TEMP TABLE IF NOT EXISTS temp.customer_vendors_products (
vendor_id TEXT,
product_id TEXT
);

INSERT INTO temp.customer_vendors_products (vendor_id, product_id)
SELECT DISTINCT
vendor_id,
product_id
FROM vendor_inventory;


SELECT
customer_id,
vendor_id,
product_id
FROM customer
CROSS JOIN temp.customer_vendors_products;

-- INSERT
/*1. Create a new table "product_units".
This table will contain only products where the `product_qty_type = 'unit'`.
It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.
Name the timestamp column `snapshot_timestamp`. */

DROP TABLE IF EXISTS temp.product_units; -- this resets everything
CREATE TEMP TABLE product_units AS
SELECT * ,
CURRENT_TIMESTAMP AS snapshot_timestamp
FROM product
WHERE product_qty_type = 'unit';

-- SELECT * FROM product_units


/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp).
This can be any product you desire (e.g. add another record for Apple Pie). */


INSERT INTO product_units
VALUES(24, 'New apple pie','10 in', 2, 'unit', CURRENT_TIMESTAMP);

-- DELETE
/* 1. Delete the older record for the whatever product you added.

HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/


DELETE from product_units
WHERE product_id=24;


-- UPDATE
/* 1.We want to add the current_quantity to the product_units table.
First, add a new column, current_quantity to the table using the following syntax.

ALTER TABLE product_units
ADD current_quantity INT;

Then, using UPDATE, change the current_quantity equal to the last quantity value from the vendor_inventory details.

HINT: This one is pretty hard.
Expand All @@ -128,6 +232,17 @@ Finally, make sure you have a WHERE statement to update the right row,
you'll need to use product_units.product_id to refer to the correct row within the product_units table.
When you have all of these components, you can run the update statement. */

ALTER TABLE product_units
ADD current_quantity INT; -- ensures that this table only produces int

UPDATE product_units
SET current_quantity = (
SELECT COALESCE(SUM(quantity), 0)
FROM vendor_inventory
WHERE vendor_inventory.product_id = product_units.product_id
);


-- SELECT * FROM product_units


Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.