diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md index 9b804e9ee..b8a307eab 100644 --- a/02_activities/assignments/DC_Cohort/Assignment2.md +++ b/02_activities/assignments/DC_Cohort/Assignment2.md @@ -54,7 +54,8 @@ The store wants to keep customer addresses. Propose two architectures for the CU **HINT:** search type 1 vs type 2 slowly changing dimensions. ``` -Your answer... +Type 1 - overwritting addresses +Type 2 - will retain changes in customer addresses ``` *** @@ -183,5 +184,12 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c ``` -Your thoughts... +In the article “Neural nets are just people all the way down,” Boykis argues that although machine-learning and AI systems are often presented as automated or self-sufficient technologies, they rely heavily on human labour, judgment, and interpretation at their creation. She illustrates this through the history of ImageNet, a large-scale image-recognition dataset that became foundational to computer visualization. Boykis explains that ImageNet was constructed by gathering millions of images and using thousands of unfairly compensated human annotators to manually label them. Subsequent AI models were generated based on the decisions that these works made regarding objects in each image, how they should be categorized and what they deemed as meaningful labels. + +She further notes that WordNet, which is used to determine linguistic taxonomies and categories, were also rooted in human labour and human decisions. Decisions about how concepts are grouped, where boundaries lie between categories, and which distinctions matter are not necessarily objective and are influenced by cultural, linguistic and political factors. + +The article highlights several ethical issues in AI modeling, including the hidden human labour required to train AI models, and the illusion of neutrality. AI-based decisions are often assumed to be unbiased, but because the data they are trained on reflects human categories and human judgments, they inherit human biases as well. This has real consequences: biased datasets can perpetuate stereotypes, misclassify certain groups, reinforce inequities, and amplify existing social and political power structures. + +While AI tools are undeniably powerful and widely integrated into today’s society, Boykis reminds readers that these systems are not separate from the world that shapes them. + ``` diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql index 5ad40748a..5e4afd9bd 100644 --- a/02_activities/assignments/DC_Cohort/assignment2.sql +++ b/02_activities/assignments/DC_Cohort/assignment2.sql @@ -5,12 +5,15 @@ /* 1. Our favourite manager wants a detailed long list of products, but is afraid of tables! We tell them, no problem! We can produce a list with all of the appropriate details. -Using the following syntax you create our super cool and not at all needy manager a list: +Using the following syntax you create our super cool and not at all needy manager a list: */ + SELECT -product_name || ', ' || product_size|| ' (' || product_qty_type || ')' -FROM product +product_name || ', ' ||coalesce(product_size, '')|| ' (' || coalesce(product_qty_type, 'unit') || ')' +FROM product; + +/* But wait! The product table has some bad data (a few NULL values). Find the NULLs and then using COALESCE, replace the NULL with a blank for the first problem, and 'unit' for the second problem. @@ -21,7 +24,6 @@ Edit the appropriate columns -- you're making two edits -- and the NULL rows wil All the other rows will remain the same.) */ - --Windowed Functions /* 1. Write a query that selects from the customer_purchases table and numbers each customer’s visits to the farmer’s market (labeling each market date with a different number). @@ -32,18 +34,46 @@ each new market date for each customer, or select only the unique market dates p (without purchase details) and number those visits. HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */ +SELECT DISTINCT* + +FROM ( + SELECT + customer_id, + market_date, + DENSE_RANK() OVER(PARTITION BY customer_id ORDER BY market_date ASC) as visit_order + FROM customer_purchases +) x; + /* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to only the customer’s most recent visit. */ +SELECT DISTINCT* + +FROM ( + SELECT + customer_id, + market_date, + DENSE_RANK() OVER(PARTITION BY customer_id ORDER BY market_date DESC) as visit_order + FROM customer_purchases +) x +WHERE visit_order =1; + /* 3. Using a COUNT() window function, include a value along with each row of the customer_purchases table that indicates how many different times that customer has purchased that product_id. */ +SELECT DISTINCT +customer_id, +product_id, +COUNT() OVER(PARTITION BY customer_id ORDER BY product_id) as times_purchased +FROM customer_purchases; + + -- String manipulations /* 1. Some product names in the product table have descriptions like "Jar" or "Organic". @@ -57,10 +87,24 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ +SELECT +product_name, +CASE WHEN product_name LIKE '%-%' + THEN SUBSTR(product_name, INSTR(product_name, '-')+2) + ELSE 'NULL' + END as product_details + + +FROM product; /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ +SELECT +product_id, +product_size +FROM product +WHERE product_size REGEXP '[0-9]'; -- UNION @@ -74,6 +118,37 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling with a UNION binding them. */ +SELECT +market_date, +total_sales, +'best' AS sales_day +FROM ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales, + ROW_NUMBER() OVER (ORDER BY SUM(quantity * cost_to_customer_per_qty) DESC) AS best_day + FROM customer_purchases + GROUP BY market_date +) ranked +WHERE best_day = 1 + + +UNION + +SELECT +market_date, +total_sales, +'worst' AS sales_day +FROM ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales, + ROW_NUMBER() OVER (ORDER BY SUM(quantity * cost_to_customer_per_qty) ASC) AS worst_day + FROM customer_purchases + GROUP BY market_date +) ranked +WHERE worst_day = 1; + /* SECTION 3 */ @@ -90,6 +165,26 @@ How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ +DROP TABLE IF EXISTS temp.customer_vendors_products; + +CREATE TEMP TABLE IF NOT EXISTS temp.customer_vendors_products ( + vendor_id TEXT, + product_id TEXT +); + +INSERT INTO temp.customer_vendors_products (vendor_id, product_id) +SELECT DISTINCT +vendor_id, +product_id +FROM vendor_inventory; + + +SELECT +customer_id, +vendor_id, +product_id +FROM customer +CROSS JOIN temp.customer_vendors_products; -- INSERT /*1. Create a new table "product_units". @@ -97,12 +192,21 @@ This table will contain only products where the `product_qty_type = 'unit'`. It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. Name the timestamp column `snapshot_timestamp`. */ +DROP TABLE IF EXISTS temp.product_units; -- this resets everything +CREATE TEMP TABLE product_units AS + SELECT * , + CURRENT_TIMESTAMP AS snapshot_timestamp + FROM product + WHERE product_qty_type = 'unit'; + +-- SELECT * FROM product_units /*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). This can be any product you desire (e.g. add another record for Apple Pie). */ - +INSERT INTO product_units + VALUES(24, 'New apple pie','10 in', 2, 'unit', CURRENT_TIMESTAMP); -- DELETE /* 1. Delete the older record for the whatever product you added. @@ -110,14 +214,14 @@ This can be any product you desire (e.g. add another record for Apple Pie). */ HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ +DELETE from product_units +WHERE product_id=24; + -- UPDATE /* 1.We want to add the current_quantity to the product_units table. First, add a new column, current_quantity to the table using the following syntax. -ALTER TABLE product_units -ADD current_quantity INT; - Then, using UPDATE, change the current_quantity equal to the last quantity value from the vendor_inventory details. HINT: This one is pretty hard. @@ -128,6 +232,17 @@ Finally, make sure you have a WHERE statement to update the right row, you'll need to use product_units.product_id to refer to the correct row within the product_units table. When you have all of these components, you can run the update statement. */ +ALTER TABLE product_units +ADD current_quantity INT; -- ensures that this table only produces int + +UPDATE product_units +SET current_quantity = ( + SELECT COALESCE(SUM(quantity), 0) + FROM vendor_inventory + WHERE vendor_inventory.product_id = product_units.product_id +); + +-- SELECT * FROM product_units diff --git a/02_activities/assignments/DC_Cohort/bookstore_logicalmodel.jpg b/02_activities/assignments/DC_Cohort/bookstore_logicalmodel.jpg new file mode 100644 index 000000000..0080ca0f5 Binary files /dev/null and b/02_activities/assignments/DC_Cohort/bookstore_logicalmodel.jpg differ diff --git a/02_activities/assignments/DC_Cohort/bookstore_logicalmodel_noshifts.jpg b/02_activities/assignments/DC_Cohort/bookstore_logicalmodel_noshifts.jpg new file mode 100644 index 000000000..40b2d8d39 Binary files /dev/null and b/02_activities/assignments/DC_Cohort/bookstore_logicalmodel_noshifts.jpg differ